Patent Number: 8,219,848

Title: Mechanism for recovery from site failure in a stream processing system

Abstract: A failure recovery framework to be used in cooperative data stream processing is provided that can be used in a large-scale stream data analysis environment. Failure recovery supports a plurality of independent distributed sites, each having its own local administration and goals. The distributed sites cooperate in an inter-site back-up mechanism to provide for system recovery from a variety of failures within the system. Failure recovery is both automatic and timely through cooperation among sites. Back-up sites associated with a given primary site are identified. These sites are used to identify failures within the primary site including failures of applications running on the nodes of the primary site. The failed applications are reinstated on one or more nodes within the back-up sites using job management instances local to the back-up sites in combination with previously stored state information and data values for the failed applications. In additions to inter-site mechanisms, each one of the plurality of sites employs an intra-site back-up mechanism to handle failure recoveries within the site.

Inventors: Branson; Michael John (Rochester, MN), Douglis; Frederick (Basking Ridge, NJ), Fawcett; Bradley William (Byron, MN), Liu; Zhen (Tarrytown, NY), Rong; Bin (Melbourne, AU), Ye; Fan (Yorktown Heights, NY)

Assignee: International Business Machines Corporation

International Classification: G06F 11/00 (20060101)

Expiration Date: 2020-07-10 0:00:00