Patent Number: 8,719,271

Title: Accelerating data profiling process

Abstract: A data profile request is handles by utilizing data in a distributed file system. Tabular data is extracted from a data source and stored in a distributed file system. Each table in the tabular data is split by columns, which are each stored in separate files in a set of physical nodes of the distributed file system. In response to a data profiling request, a master node determines, based on the profiling request, which groups of files are needed to be on a same physical node in order to perform the profiling analysis. The master node creates jobs using physical nodes that contain the requisite files needed for each job.

Inventors: Nelke; Sebastian (Schoenaich, DE), Oberhofer; Martin (Bondorf, DE), Saillet; Yannick (Stuttgart, DE), Seifert; Jens (Gaertringen, DE)

Assignee: International Business Machines Corporation

International Classification: G06F 17/30 (20060101)

Expiration Date: 5/06/12018