Recovery Point Objective (RPO) - Definition
RPO or Recovery Point Objective is defined as the maximum acceptable amount of volume/data that can be lost from the point in time i.e., from the instant at which disaster, failure, or an unplanned data loss event occurs, measured in terms of an amount of time. RPOs specify the maximum age of the data worth of loss that the company can tolerate in-between backups from the time elapsed to the last point of time up to which data could be successfully recovered, before the disaster event.
For a successful RPO strategy, any business, large, or small should calculate the loss tolerance that occurs between backups. It is also equally important to determine the time frame while transitioning from disaster to operational status as per scheduled data backups. But how would you ensure the recovery strategies and technology once you anticipate the catastrophes? This is where you should engage in a sound Business Continuity Plan (BCP) that sets the maximum allowable threshold for the RTO or the interval of time with minimum data loss during the disruption.
Depending upon the set RPO committed, you will lose the data up to your set RPO. E.g., for RPO set at 4 hours of time interval, you will lose 4 hours of data, while 4 hours ago is the last point in time since the most recent backup.
What is Recovery Point Objective (RPO)?
Recovery Point Objective or RPO is a time-based metric that informs the maximum interval that might pass from the disaster or failure wherein businesses can lose an acceptable amount of data before a disaster strikes. Also known as backup recovery point objective, RPOs determine certain specific actions that require users to obtain access at a point in the event of a major disruption. In this given process, the maximum possible amount of data can be lost from the point of failure or disaster before the enterprise has suffered any significant damage. The significance of the RPO is to determine the company’s loss tolerance i.e. if the backups are still within the parameters of the Business Continuity Plan’s RPO, otherwise, lack thereof can lead to catastrophic losses.
The importance of the disaster recovery point objective is critical because the achievable RPO accounts for a certain limit of expected data loss during disaster events. Even performing constant backup doesn’t guarantee data recovery as you might face a sudden system failure, disaster, or a disruptive event.
What is the minimum backup schedule frequency;
How much data can be lost after a disaster causing significant harm to the organization; and
How far back the IT admin team should go to employ optimal recovery technologies without delaying the data loss against expected RTO.
How Does Recovery Point Objective in Azure Work?
After you have identified which critical applications to back up, you should set the priority as to which data and applications rank the highest priority to save and which applications are of the least importance. Since Azure backups provide more granular backups or cross-region restore, the RPO is as much as 15 minutes for database backup and 1 day for VM backups. While Azure backup RPOs are considerably higher, Azure Site Recovery (ASR) backups provide much lower RPOs.
Azure Site Recovery provides continuous low replication intervals (as low as 30 seconds) for virtual machines as it supports multiple scenarios such as disaster recovery, migration, and test environment. ASR can perform disaster recovery in minutes as they sync with the source by automating recovery tasks thereby reducing the manual steps once the machine is brought back online. If you set up the backup twice at 8 AM and 8 PM, and disaster hits at 5 PM, your RPO can drastically be reduced to mere 3 hours.
High-priority applications would require frequent RPOs and more frequent backups. For example, 4 hours of RPO requires the IT departments to schedule backup snapshots and set up replication strategies (also known as near-CDP continuous data protection). Near-zero RPOs will require continuous replication and failover services for mission-critical applications that need high availability of data. Depending on the replication frequency, you can run planned failover for expected outages or unplanned failovers for unexpected disasters with minimal data loss.
Recovery Point Objective Examples
Given that RPO functions like an indicator that determines how often will you backup your data or rather save your work. So, if you have defined the time between data backups that your business can survive 3 to 4 days of data loss, the RPO would be 3 days. If the RPO is 3 days, backups should happen at least once in 72 hours or less. Based on the application priority, your IT team must establish different RPOs to bring back an application to its pre-disaster state.
Some of the pertinent examples of RPO applications are listed below:
For businesses that use auxiliary hard drives, a 2-hour RPO is appropriate.
For businesses that use magnetic tape or recordable compact disks, a 5-day RPO is appropriate.
Let’s take a closer look at some of the examples of RPO and how to apply these concepts.
Consider the complete failure of the primary site of a business that uses backup tape storage equipment, the frequency of backups is twice a day 8 AM and 8 PM daily. In this case, your RPO would be 3 hours as the most recent backup was at 8 PM. A primary server failure at 11 PM would require the IT team to restore from the 8 AM backup, an RPA (Recovery Point Actual) of 3 hours. The RTA (Recovery Time Actual) will determine the overall time it takes to restore based on the actual performance of the Disaster Recovery (DR) solution to move production from source to the target machine.
During the event of a disaster, Continuous Data Protection (CDP) technique copies the data allowing you to roll back to the specific point of time of the last update before any disaster or corruption update has occurred. Such a form of data replication or CDP snapshots offers an RPO of close to zero which is the most up-to-date and secured. The RPA values change based on the actual time changes of server replication in real-time.
For enterprise backup solutions that offer granular restoration points, mission-critical data or deleted files can be recovered within an RTA of several minutes as the IT team might backup data continuously.
For relational databases, many businesses use the Amazon infrastructure that provides close to RPO zero at the storage level. Using recovery point objective AWS for disaster recovery, you can easily meet the requirement of recovery point objective (RPO) zero. For example, financial institutions or e-commerce company wherein multiple databases are used to restore data, it provides near RPO zero at the storage level. Since there is no dependency on RPO as the data will be available in its original state, RTO is not critical. However, during scenarios such as ransomware attacks or if the relational database model is dead or down at any point, the RPO for recovery might be longer as you need to wait for the Availability Zone to recover.
How to Calculate Recovery Point Objective?
An RPO is a business decision that dictates the BC/DR strategy as to how much data you are willing to lose if the production system is unavailable. Apart from downtime and money, you also need to consider the company’s reputation in the sense that how long your business can afford to go offline risking customer trust and partner relationships. The IT team needs to match the RPO based on;
The frequencies of the backup restore i.e., the most updated data until recent backup;
Identifying the criticality for various types of workloads against their respective SLAs. Thereafter, the IT team can rate the RTO and RPO requirements.
Factors to Be Considered While Calculating RPO
Maximum tolerable amount of data loss the business can handle
Cost of lost data and unavailable operations
Software implementation cost for recovery solutions
Criticality of systems and data for users
Regulatory compliance schemes for disaster recovery, data availability, and data loss
RPO and RTO are the keys while designing your DR strategy and achieve business continuity. The tighter the RPOs for high-priority applications, the costlier the storage costs, and requires more rigorous backups. There are several variables involved while defining the actual RPO.
E.g., the RPO of replication solutions varies based on the specific distance limitations of the types of replications: Synchronous and Asynchronous. However, it's critical to understand and plan. Ideally, to work out RPO, it’s important to align your backups and DR plan to the committed RPO.
RPO Sample Intervals
The following table suggests the sample tiers that you can use while setting RPO for your business units:
|This tier includes mission-critical business operations that cannot afford to lose more than an hour of data. The difficulty level of recreating such data is high because of their dynamic nature, the number of variables involved, and high volume. E.g., banking transactions, CRM systems patient records, etc.||This tier represents semi-critical business units that can afford data loss up to 4 hours. E.g., file servers, customer chat logs.||This tier includes businesses that cannot afford to lose more than 12 hours' worth of data. E.g., sales, and marketing data.||Businesses that deal with semi-critical information and require a RPO of not less than 24 hours fall under this tier. E.g., human resource data (HR), and purchase departments.|
Achieving RPO: The Zmanda Way
The most critical variable in the event of a disaster is your RPO, RTO, and RTA (Recovery Time Actual) which helps you prepare for any shortcoming by performing backups at the right intervals. However, if there’s a significant gap in communicating your goals, specifically between calculated RTO and RTA values, you will need to re-align your Business Continuity/Disaster Recovery (DR) strategy. A solid DR plan details what you’re going to do step-by-step to fulfill two objectives- the RPO and the RTO while keeping your recovery time to a minimum. With seamless hybrid cloud setups, Zmanda’s reliable all-in-one business continuity and cloud-native disaster recovery solution strive to answer this problem.
Through high availability native code architectures, the Zmanda backup engine accelerates recovery time by eliminating delay for an outage notification. So enterprises can achieve aggressive RPOs of near-zero for VMware instances. Also with layered security, should an unrecoverable outage occur, you can retain copies of the database across a host of mediums (remote locations or on the cloud, long-term tapes, on-prem NAS/SAN for backup, and archival at any point in time. This offers enhanced flexibility to Zmanda customers to run backups frequently for reliable long-term storage and backup.