Understanding Recovery Point Objective (RPO) In DR Planning

Understanding Recovery Point Objective (RPO) In DR Planning

Recovery Point Objective (RPO) — Definition

Recovery Point Objective is the maximum acceptable amount of data your company can lose from the point in time a data loss event occurs. RPOs specify the acceptable data loss your business can afford till the last point to which you can recover the data. That is to say, if you have a disaster event (power outage, ransomware, power outage, etc.), you will lose all the data up to your set RPO. Any business, big, or small should calculate the loss tolerance that occurs between backups. Especially, you should determine the time frame while transitioning from disaster to operational status as per scheduled data backups.

But how would you ensure the recovery strategies and technology once you anticipate the catastrophes? Here, you need a Business Continuity Plan (BCP) that sets the maximum threshold with minimum data loss during the disruption. Depending upon the RPO committed, you will lose the data up to your set RPO. For example, for an RPO set at 4 hours, you will lose 4 hours of data. However, 4 hours ago is the last point in time since the most recent backup.

Importance of Recovery Point Objective (RPO)

Recovery Point Objective or RPO is a time-based metric. RPO informs the maximum interval that might pass from the disaster or failure. That is to say, the RPO metric includes the amount of data your business can afford to lose before a disaster strikes.

Also, RPOs determine certain specific actions that IT should take at a point in the event of a major disruption. For your business, RPO is a crucial metric to determine the company’s loss tolerance. Imagine the backups of your critical systems performed on Friday evenings, and the system fails on Saturday afternoon. In this case, the entire week’s data is lost. Again, performing constant backup doesn’t guarantee complete data recovery. This is because you might face a sudden system failure, disaster, or a disruptive event. 

RPO determines:  

  • What is the minimum backup schedule frequency;
  • How much data can be lost after a disaster causing significant harm to the organization; and
  • How far back the IT team must go to employ recovery techniques without delaying the data loss against expected RTO.

How Does Recovery Point Objective in Azure Work?

While many organizations opt for Azure backup solutions & Azure Site Recovery (ASR) as effective disaster recovery solutions, but if you are not establishing the priority for mission-critical workloads, you might face a vital data loss.

Once you have identified the critical applications to back up, you must set the applications that rank the highest priority. The objective is to save and which applications are of the least importance. Azure backups provide more granular backups or cross-region restore. With Azure, the RPO is 15 minutes for database backup & 1 day for VM backups.

RPO in Azure backup

Again, Azure backup RPOs are considerably higher, Azure Site Recovery (ASR) backups provide much lower RPOs. Azure Site Recovery provides continuous low replication intervals (as low as 30 seconds) for virtual machines. That’s because it supports multiple scenarios such as disaster recovery, migration, and test environments. In fact, ASR can perform disaster recovery in minutes as they sync with the source by automating recovery tasks. As a result, you can reduce the manual steps once the machine is brought back online.

If you set the backup twice at 8 AM & 8 PM, and disaster hits at 5 PM, RPO can be drastically reduced to mere 3 hours! But, high-priority applications would require frequent RPOs and more frequent backups. For example:

  • 4 hours of RPO will require the IT team to schedule backup snapshots. Besides, they should also set up replication strategies (also known as near-CDP continuous data protection).
  • Near-zero RPOs will require continuous replication and failover services for mission-critical applications that need high availability of data.
  • Depending on the replication frequency, you can run planned failover for expected outages. You can also run unplanned failovers for unexpected disasters with minimal data loss.

Examples of Recovery Point Objective

Indeed, the RPO acts like an indicator to determine how often should you backup your data or rather save your work. Suppose, you have defined the time between data backups. At this point, your business can survive 3 to 4 days of data loss. Then, the RPO would be 3 days. If the RPO is 3 days, backups should happen at least once in 72 hours or less. Based on the application priority, your IT team must agree on such acceptable RPOs to bring back an application to its pre-disaster state.

  • For businesses that use auxiliary hard drives, a 2-hour RPO is appropriate.
  • For businesses that use magnetic tape or recordable compact disks, a 5-day RPO is appropriate.
  • CRM systems, 1-hour RPO is ideal.
  • For Emails, 2-hour RPO is appropriate.
  • File Servers, 4 hour RPO can be agreed upon.
  • For Development servers, 24 hours RPO

Delivering the above RPO’s might make you think that you are covered as you have met the SLA. But, imagine if there’s a huge data loss or data change that the business underwent. In that case, you might not be able to recover the data to a point acceptable to the business. Let’s take a closer look at how to apply these concepts in different data loss scenarios :

Scenario-1:

Assume that there’s a complete failure of the primary site of a business that uses backup tape storage equipment. The frequency of backups is twice a day 8 AM and 8 PM daily. In this case, your RPO would be 3 hours as the most recent backup was at 8 PM. Again, a primary server failure at 11 PM would require the IT team to restore data from the 8 AM backup, since the RPA (Recovery Point Actual) is 3 hours. The RTA (Recovery Time Actual) will determine the overall time it takes to restore based on the actual performance of the Disaster Recovery (DR) solution. This includes the time taken to move production from the source to the target machine.

Scenario-2:

For relational databases, many businesses use the Amazon infrastructure that provides close to RPO zero at the storage level. Using recovery point objective AWS for disaster recovery, you can easily meet the requirement of recovery point objective (RPO) zero. Financial institutions or e-commerce companies employ multiple databases to restore data to provide near RPO zero at the storage level. Hence, there’s no dependency on RPO as the data will be available in its original state. Here RPO is not critical. However, during disasters like ransomware attacks or if the relational database model is dead at any point, the RPO for recovery might be longer. That’s because you need to wait for the Availability Zone to recover.

Scenario-3:

Achieving RPO with Continuous Data Replication:

Also known as Continuous Data Protection (CDP) technique, CDP is another technique of a regular backup system wherein you need to run backups frequently.

During the event of a disaster, you can use the Continuous Data Protection (CDP) technique to copy the data as a backup method. The CDP lets you go back to the specific point in time of the last update before any disaster has occurred. Such a method of data replication includes CDP snapshots taken rapidly. The CDP offers an RPO which is higher than zero or near zero but the data stays up-to-date and secured. The RPA values change when the actual time changes of server replication in real-time.

For enterprise backup solutions that offer granular restoration points, mission-critical data or deleted files can be recovered within an RTA (Recovery Time Actual) of several minutes as the IT team backs up the data continuously.

How to Calculate Recovery Point Objective?

An RPO is a business decision that dictates the BC/DR strategy. You must define how much data you are willing to lose if the production system is unavailable. Apart from downtime and money, you also need to consider the company’s reputation in a way that impacts the performance. Also, you must calculate how long your business can afford to go offline risking customer trust and partner relationships. The IT team needs to match the RPO based on;

  • The frequencies of the backup restore i.e., the most updated data until recent backup;
  • Identifying the criticality for various types of workloads against their respective SLAs. Thereafter, the IT team can rate the RTO and RPO requirements.

Factors to Be Considered While Calculating RPO

You can take into account the following factors while calculating RPO:

  • Mitigation cost
  • Maximum tolerable amount of data loss the business can handle
  • Cost of lost data and unavailable operations
  • Software implementation cost for recovery solutions
  • Criticality of systems and data for users
  • Regulatory compliance schemes for disaster recovery, data availability, and data loss

RPO and RTO are the keys to design your DR strategy and achieve business continuity. The tighter the RPOs for high-priority applications, the costlier the storage costs, and requires more rigorous backups. There are several variables involved while defining the actual RPO. For example, the RPO of replication varies based on the types of replications: Synchronous and Asynchronous. However, it’s critical to understand and establish RPO by aligning your backups and DR plan to the committed RPO.

RPO Sample Intervals

The following table suggests the sample tiers that you can use while setting RPO for your business unit:

This tier includes mission-critical business operations that cannot afford to lose more than an hour of data. The difficulty level of recreating such data is high because of their dynamic nature, the number of variables involved, and high volume. E.g., banking transactions, CRM systems patient records, etc. This tier represents semi-critical business units that can afford data loss up to 4 hours. E.g., file servers, customer chat logs. This tier includes businesses that cannot afford to lose more than 12 hours’ worth of data. E.g., sales, and marketing data. Businesses that deal with semi-critical information and require an RPO of not less than 24 hours fall under this tier. E.g., human resource data (HR), and purchase departments.

Achieving RPO: The Zmanda Way

The most critical variable in the event of a disaster is your RPO, RTO, & RTA (Recovery Time Actual). Because, using these values, you can prepare for any shortcoming by performing backups at the right intervals. However, if there’s a significant gap in communicating your goals, specifically between calculated RTO and RTA values, you will need to re-align your Business Continuity/Disaster Recovery (DR) strategy. A solid DR plan includes the steps to fulfill two main objectives- the RPO & the RTO while keeping recovery time low. With seamless hybrid cloud setups, Zmanda’s disaster recovery solution strives to answer this problem.

Through high availability native code architectures, the Zmanda backup engine accelerates recovery time by eliminating delay for an outage notification. So enterprises can achieve aggressive RPOs of near-zero for VMware instances. Also with layered security, should an unrecoverable outage occur, you can retain copies of the database across a host of mediums (remote locations or on the cloud, long-term tapes, on-prem NAS/SAN for backup, and archival at any point in time. This offers enhanced flexibility to Zmanda customers to run backups frequently for reliable long-term storage and backup.

To learn how we can help you determine practical recovery objectives and achieve business continuity, contact us for more information or simply visit our DR solution page.