Patent Number: 8,818,919

Title: Multiple imputation of missing data in multi-dimensional retail sales data sets via tensor factorization

Abstract: A system, method and computer program product provides for multiple imputation of missing data elements in retail data sets used for modeling and decision-support applications based on the multi-dimensional, tensor structure of the data sets, and a fast, scalable scheme is implemented that is suitable for large data sets. The method generates multiple imputations comprising a set of complete data sets each containing one of a plurality of imputed realizations for the missing data values in the original data set, so that the variability in the magnitudes of these missing data values can be captured for subsequent statistical analysis. The method is based on the multi-dimensional structure of the retail data sets incorporating tensor factorization, that in a preferred embodiment can be implemented using fast, scalable imputation methods suitable for large data sets, to obtain multiple complete data sets in which the original missing values are replaced by various imputed values.

Inventors: Natarajan; Ramesh (Pleasantville, NY), Banerjee; Arindam (Roseville, MN), Shan; Hanhuai (St. Paul, MN)

Assignee: International Business Machines Corporation

International Classification: G06F 15/18 (20060101)

Expiration Date: 8/26/12018