Patent Number: 6,167,368

Title: Method and system for indentifying significant topics of a document

Abstract: A "domain-general" method for representing the "sense" of a document includes the steps of extracting a list of simplex noun phrases representing candidate significant topics in the document, clustering the simplex noun phrases by head, and ranking the simplex noun phrases according to a significance measure to indicate the relative importance of the simplex noun phrases as significant topics of the document. Furthermore, the output can be filtered in a variety of ways, both for automatic processing and for presentation to users.

Inventors: Wacholder; Faye (Nina) P. (Roslyn Heights, NY)

Assignee: The Trustees of Columbia University in the City of New York

International Classification: G06F 17/30 (20060101); G06F 17/27 (20060101); G06F 017/27 (); G06F 015/00 ()

Expiration Date: 12/26/2017