Patent Number: 8,719,240

Title: Apparatus and method to sequentially deduplicate groups of files comprising the same file name but different file version numbers

Abstract: A method to sequentially deduplicate data, wherein the method receives a plurality of computer files, wherein each of the plurality of computer files comprises a label comprising a file name, a file type, a version number, and file size, and stores that plurality of computer files in a deduplication queue. The method then identifies a subset of the plurality of computer files, wherein each file of the subset comprises the same file name but a different version number, and wherein the subset comprises a maximum count of version numbers, and wherein the subset comprises a portion of the plurality of computer files. The method deduplicates the subset using a hash algorithm, and removes the subset from said deduplication queue. During the deduplicating, the method receives new computer files comprising the same file name, stores those new computer files to the deduplication queue, but does not add those new computer files to the subset.

Inventors: Bates; Allen Keith (Tucson, AZ), Haustein; Nils (Mainz, DE), Hepworth; Gail (Tucson, AZ), Klein; Craig Anthony (Tucson, AZ), Troppens; Ulf (Mainz, DE), Winarski; Daniel James (Tucson, AZ)

Assignee: International Business Machines Corporation

International Classification: G06F 7/00 (20060101); G06F 17/00 (20060101)

Expiration Date: 5/06/12018