Patent Number: 8,880,526

Title: Phrase clustering

Abstract: Systems and associated methods for enhanced concept understanding in large document collections through phrase clustering are described. Embodiments take as input an initial set of phrases and estimate centroids using a clustering process. Embodiments then generate new phrases around each of the current centroids using the current phrases. These new phrases are added to the current set, and the clustering process is iterated. Upon convergence, embodiments finalize clusters based on phrases of any given length.

Inventors: Bhattacharya; Indrajit (Bangalore, IN), Godbole; Shantanu Ravindra (New Delhi, IN), Sharma; Akshit (New Delhi, IN)

Assignee: International Business Machines Corporation

International Classification: G06F 17/30 (20060101)

Expiration Date: 2019-11-04 0:00:00