Patent Number: 7,133,856

Title: Binary tree for complex supervised learning

Abstract: The present invention provides a powerful and robust classification and prediction tool, methodology, and architecture for supervised learning, particularly applicable to complex datasets where multiple factors determine an outcome and yet many other factors are irrelevant to prediction. Among those features which are relevant to the outcome, they have complicated and influential interactions, though insignificant individual contributions. For example, polygenic diseases may be associated with genetic and environmental risk factors. This new approach allow us consider all risk factors simultaneously, including interactions and combined effects. Our approach has the strength of both binary classification trees and regression. A simple rooted binary tree model is created with each split defined by a linear combination of selected variables. The linear combination is achieved by regression with optimal scoring. The variables are selected using backward shaving. Cross-validation is used to find the level of shrinkage that minimizes errors. Using a selected variable subset to define each split not only increases interpretability, but also enhances the model's predictive power and robustness. The final model deals with cumulative effects and interactions simultaneously.

Inventors: Huang; Jing (Sunnyvale, CA), Olshen; Richard A. (Stanford, CA)

Assignee: The Board of Trustees of the Leland Stanford Junior University

International Classification: G06E 1/00 (20060101)

Expiration Date: 2019-11-07 0:00:00