Rtos Rpos and Slas Disaster Recovery or Business Continuity

RPO vs. RTO: Key differences explained with examples, tips

The recovery point objective and recovery time objective enable an organization to know how much data it can lose and how long it can be down, key elements of a backup and DR plan.

Achieving the best results when it comes to data backup and recovery involves the use of two important metrics: recovery time objective and recovery point objective. Both metrics are essential when developing data backup and recovery plans, as well as traditional business continuity and technology disaster recovery plans.

It's important to examine each of these metrics, their role in the areas identified above, how to compute them and their cost implications and how to build them into a variety of resilience plans.

What is RTO?

Recovery time objectives (RTOs) specify the amount of time from the occurrence of a disruptive event to when the affected resource(s) must be fully operational and ready to support the organization's objectives. Figure 1 depicts the RTO metric.

When a resource is disrupted, several actions might be needed, e.g., replacing damaged components, reprogramming and testing, before the resource can be placed back in service and business as usual (BAU) can return. An inverse relationship exists between the time for recovery and the cost needed to support recovery. Specifically, the shorter an RTO is in terms of time, the cost for recovery increases, and vice versa. Therefore, it's very important to have business unit leaders involved when determining RTO values. They might want a 30-minute recovery, for example, as the target time, but the cost to achieve that goal might be prohibitive.

RTO timeline — Figure 1. The RTO can be measured in seconds, minutes, hours or days.

What is RPO?

Recovery point objective (RPO) is especially important when it comes to data backup and recovery activities. Organizations -- such as banks or credit card firms -- that conduct many transactions over the course of a day will probably need backups to occur more frequently, almost in real time, so they will have the most current critical data for their specific needs available for future transactions. This means data must not age very much from when it was last backed up, meaning the data will be as up-to-the-moment as possible. For example, RPOs with very low values, such as less than one minute, might need continuous replication of critical files, databases and systems. This is the RPO, to have backed up data as current as possible. Figure 2 depicts the RPO and its relationship to the RTO.

Again, we see an inverse relationship between the RPO value and the cost to achieve it. A very short RPO, for example, 10 to 30 seconds, means that data must be backed up very frequently, necessitating the use of high-speed backup technologies such as data mirroring or continuous replication, especially if backups are stored off site in a cloud or other arrangement. Add to that the network bandwidth needed to transmit large quantities of data, and the cost can be significant to achieve the required data availability.

RPO timeline — Figure 2. The RPO is expressed backward in time from the instant at which the failure occurs.

RTO vs RPO: Similarities and differences

Both metrics are important elements used in data backup and data recovery plans. Ideally, both should be key backup and recovery features to ensure that critical data and systems are available when needed, especially in the aftermath of a disruptive event. Table 1 provides additional details on the two terms in the context of a post-disaster scenario:

Situation	Planned RPO	Actual RPO	Planned RTO	Actual RTO	Analysis
Mission-critical application	0.5 hr	1.5 hrs	0.5 hr	2.0 hrs	Application backup resources were insufficient; technology couldn't be recovered quickly enough
Critical database	0.25 hr	2.0 hrs	0.25 hr	2.0 hrs	Application backup resources were insufficient; technology couldn't be recovered quickly enough
Critical network switch	NA	NA	0.5 hr	2.0 hrs	Technology couldn't be recovered quickly enough
HVAC system and associated application	0.25 hr	2.0 hrs	0.25 hr	2.5 hrs	HVAC system backup resources were insufficient; HVAC system couldn't be recovered quickly enough

Table 1

In this example, both business-critical applications and databases were disrupted by the event. RPOs and RTOs were fairly aggressive for each asset; the outcomes showed that the assets weren't as well protected as anticipated. The duration of time needed for recovery indicates the need for:

reconfiguration of storage resources and backup platforms for application priorities;
spare parts that can be used as part of the recovery process; and
greater focus on critical infrastructure and environmental systems and efforts to maintain business operations.

Aside from their use in business continuity plans and technology disaster recovery plans, they are quite different in practice. RTOs are designated after an event occurs. RPOs are used before an event occurs. However, when the two are linked, a short RTO usually requires an equally short RPO (see Table 1) particularly when data protection is the requirement. If the disaster recovery strategy addresses the backup and recovery of systems only (see Table 1), an RTO value might be sufficient to determine how recovery will take place. However, if the system to be recovered also processes critical data (see Table 1), then both metrics should be synchronized.

Computing RPO and RTO

A business impact analysis (BIA) is designed to identify relevant RTO and RPO values. Risk analyses can also provide valuable input to assigning values to these metrics. BIAs identify mission-critical business processes and identify the technologies, people and facilities needed to ensure BAU. They might also identify the financial implications -- such as loss of revenue or imposition of fines -- caused by the disruption.

Based on input from business unit leaders and senior management, numeric values are defined that represent the best-case scenarios for recovering from disruptions from a business perspective. Now, no mathematical formulae exist to compute RTO/RPO values. They are strictly numeric time values. For example, an RTO for a fairly critical server might be one hour, whereas the RPO for less-than-critical data transaction files might be 24 hours, and might also support the use of backup tape storage equipment.

As mentioned earlier, as RTO/RPO numeric values decrease, costs to achieve those metrics are likely to increase. The only way to determine the true cost is to first identify the desired RTO/RPO values, then conduct research to determine what is needed to achieve the metric if a disruption occurs. It might then be necessary to advise business unit leaders and senior management of the added investment.

This is where potential conflicts might occur, because if management doesn't want to spend additional funds to achieve the desired metrics they specified, they must understand that such resistance might incur additional risk if a disruptive event occurs. Ideally, management must be made aware of the potential financial issues and other implications from an event, such as damage to reputation, before they decide.

Tips for achieving RPOs and RTOs

Based on the results of risk analysis and BIA, IT administrators should have a good idea of the kinds of events that could threaten the IT infrastructure. The analyses might provide ratings for metrics indicating the frequency of occurrence, likelihood of occurrence, effects to the organization (e.g., operationally and financially) and might also identify vulnerabilities (e.g., low frequency of backup for certain applications) and potential threats (e.g., power outages caused by nearby construction activity).

Once these risk-based issues have been identified and quantified, IT administrators can translate these factors into infrastructure assets, and from that assessment, identify measures that can help reduce the threats or mitigate their severity if they occur. These analyses can then be translated into RPO and RTO values that should be reviewed and approved by business unit management as well as senior management. Assuming the risks have been accepted, IT can then identify actions to take (e.g., more data storage, more network bandwidth, more frequent reviews of system performance) in the course of establishing realistic RPO and RTO values.

Building RTO/RPO into data backup and recovery plans

The inclusion of RTO/RPO metrics in data backup, data recovery and other resilience -- e.g., BCDR -- plans is essential, and ensures that the procedures, personnel and technology resources used to achieve the metrics are appropriate. RTO/RPO values can be included in plans for reference and an indication of where the recovery bar has been set.

For data backup and recovery, these metrics are essential for planning, as they help determine the optimum data backup and technology configuration to achieve the goals. They are also important from compliance and audit perspectives, for example, as auditors might look for evidence of these values as key data backup/recovery controls.