Mastering Disaster Recovery—Part 1: Seven Levels

disaster recovery

When discussing business continuity plans, it’s important to understand the concepts of high availability (HA) and disaster recovery. High availability is a system’s ability to remain resilient against single points of failure, ensuring consistent performance and uptime. However, HA alone is not sufficient. Organizations must also have a robust disaster recovery strategy to quickly restore infrastructure and data with minimal data loss in the event of a disruption.

In this blog, I will provide an overview of disaster recovery and introduce the seven levels of disaster recovery, setting the stage for a deeper exploration in future blogs.

Disaster Recovery

Disaster recovery is a crucial aspect of maintaining or re-establishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or cyberattack. It’s essential for keeping all critical aspects of a business functioning despite significant disruptive events. Effective disaster recovery requires well-thought-out policies, procedures, and tools to ensure business continuity.

Measuring Data Loss and Recovery Time

In the event of a disaster, an organization’s primary goal is to restore all systems rapidly while minimizing data loss. These objectives are quantified as the Recovery Time Objective (RTO) and Recovery Point Objective (RPO):

  • Recovery Time Objective (RTO): This is the duration required to restore infrastructure and data to resume business operations.
  • Recovery Point Objective (RPO): This represents the acceptable amount of data loss, measured in time, from the point of the disaster.
Example showing longer 'actual' times that do NOT meet either RPO or RTOs ('objectives'). Diagram provides schematic representation of the terms RPO and RTO.

The Need for a Secondary Site

A secondary location equipped with comparable infrastructure—like computing resources, storage, and networking—is necessary, particularly when the primary site is not immediately recoverable. The data restored at this secondary site is crucial for continuing business operations.

States of Infrastructure and Data Layers

The secondary site can be either active or passive. For instance, while the computing, network, and storage might be active, data restoration is needed if the site lacks the necessary data (or state) to function as the primary site. In this scenario, the data layer is in a passive state, which impacts the RTO during disaster recovery.

Considerations for Your Disaster Recovery Plan (DRP)

To effectively establish a DRP, businesses must discuss their domain-specific needs to determine appropriate RPO and RTO requirements. For example, banks typically require very low RPO and RTO, aiming for minimal downtime, whereas a university or research organization might tolerate some data loss and a longer recovery period.

From Backups to Continuous Data Replication: The 7 Tiers of Disaster Recovery

Achieving desired RPO and RTO goals involves understanding the different levels of disaster recovery, ranging from level 0 to level 6. Each level offers varying data protection and recovery speed, with increasing cost and complexity.

  1. Level 0: No Off-Site Data: This basic level involves storing data exclusively on-site, without off-site backups. It’s the most cost-effective but carries the highest risk of total data loss in the case of on-site disasters. Ideal for small, non-critical setups.
  2. Level 1: Backup Tapes Off-Site: This involves backing up data to magnetic tapes stored off-site. It’s a more secure option than Level 0, but it can be slow in data recovery. It is suitable for institutions where data recovery speed is not a critical factor.
  3. Level 2: Disk Backup Off-Site: Faster recovery is possible as data is backed up onto disk-based systems off-site. It’s more expensive than tape backups, but it allows for more frequent backups. Suitable for medium-sized businesses prioritizing recovery speed.
  4. Level 3: Electronic Vaulting: Data is sent in batches to an off-site location at regular intervals. It strikes a balance between backup frequency and costs, which is ideal for organizations with moderate data-change rates.
  5. Level 4: Point-in-Time Copies: Offers frequent snapshots of data, providing multiple recovery points. This level is storage-intensive and ideal for businesses with high transaction rates or those maintaining critical systems.
  6. Level 5: Transaction Integrity: Ensures all transactions are captured up to the point of failure, offering high data integrity. It’s technically complex and ideal for setups where transactional consistency is crucial, like financial institutions.
  7. Level 6: Zero or Near-Zero RPO: Provides continuous data protection with almost instantaneous recovery and minimal data loss. It’s the most sophisticated and costly solution, suitable for large enterprises or critical government systems.

Conclusion

In disaster recovery planning, accurately defining the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) is crucial for business resilience. These objectives dictate how quickly and effectively a company can bounce back from disruptions. However, implementing these objectives through appropriate disaster recovery tiers involves a careful balance of costs and capabilities. A successful DR plan aligns with the organization’s risk tolerance and budget, ensuring that the level of investment is proportional to the potential risks and impacts. In essence, a well-crafted DR plan not only protects critical business functions but also aligns with the organization’s financial strategy, ensuring long-term stability and growth.


Explore More Topics