Patent Number: 6,167,398

Title: Information retrieval system and method that generates weighted comparison results to analyze the degree of dissimilarity between a reference corpus and a candidate document

Abstract: An internet information agent accepts a reference document, performs an analysis upon it in accordance with metrics defined by its analysis algorithm and obtains respective lists (word, character-level n-gram, word-level n-gram), derives weights corresponding to the metrics, applies the metrics to a candidate document and obtains respective returned values, applies the weights to the returned values and sums the results to obtain a Document Dissimilarity (DD) value. This DD is compared with a Dissimilarity Threshold (DT) and the candidate document is stored if the DD is less than the DT. A user can apply relevance values to the search results and the agent modifies the weights accordingly. The agent can be used to improve a language model for use in speech recognition applications and the like.

Inventors: Wyard; Peter J (Woodbridge, GB), Rose; Tony G (Guildford, GB)

Assignee: British Telecommunications public limited company

International Classification: G06F 17/30 (20060101); G06F 017/30 ()

Expiration Date: 12/26/2017