[an error occurred while processing this directive]
The Service Disruption Continuum
Disruptive events don't have to be a major disaster to wipe out your business. They can be anything from a relatively minor malfunctioning network card to a devastating event such as a sudden regional disaster that not only destroys your data center but also shuts down surrounding roads, bridges, and other infrastructure.
When businesses take adequate protective measures, they can survive even major disasters. But without protective measures, a business can be wiped out by something as trivial as a coffee spill.
A disruptive event might cause a business interruption depending on the importance of disrupted services and resilience of underlying systems. A disruptive event might also result in data loss/corruption depending on the pervasiveness of the event and the data protection measures in force.
Many companies, when planning for disruptive events, tend to classify those situations into either a high-availability problem domain or a disaster-recovery problem domain. But many disruptive events don't clearly fall into one category or the other.
Instead, a more sensible way to deal with disruptive events is to ignore these distinctions and simply deal with a continuum of disruptive events as they apply to your business. This model, the service disruption continuum, addresses the issues related to preserving or restoring service both during and after a disruptive event.
The continuum places disruptive events at various points in the same matrix. This change of perspective allows an organization to analyze risks, investigate approaches and implement solutions without being bound by misleading domain boundaries. Here we take a closer look at the service disruption continuum, and how it can help prepare organizations to handle whatever disruptive event may come.
The events on either end of the continuum have the greatest impact on a business. For example, on the right, an earthquake or hurricane could shut down a data center. But on the left, you will see that an operational mishap might corrupt a database with even worse consequences than a natural disaster if protective measures were not in place.
High-Availability vs. Disaster Recovery
High-availability objectives are commonly specified as percent uptime. To achieve 99.99 percent uptime, you might have 5 outages per year of 10 minutes each, one 53 minute outage every year, or a 4 day outage every 108 years. Similarly, you might recover from 100 outages in five minutes or recover from 99 outages in one minute and one outage in 6.7 hours to achieve the same average. Averages, however, mean very little when severe consequences are at stake. The only outage of any consequence is the one that requires 6.7 hours for recovery.
By contrast, there are two metrics for disaster recovery: recovery time objective (RTO) and recovery point objective (RPO).
RTO is the elapsed time from service interruption until service is restored. It answers the question: "How long can you be without service?" RTO represents a time limit that you cannot exceed or you will face severe consequences. A unified high-availability and disaster-recovery approach would establish both an uptime objective and an RTO for each service.
RPO, on the other hand, is the point of time represented by the data upon service resumption. It answers the question: "How old can the data be?" We interpret RPO differently for real-time and transactional processes. With real-time processes, the world does not stand still; data has no value after a very short time. Conversely, transactional processes usually deal with information that has been committed at some known point in time, where the value of data remains relatively stable long after it is committed. Planning for high-availability assumes no loss of committed database transactions, although loss of data recently written to application files is commonly acceptable. The high-availability objective for databases is assumed to be zero. By contrast, in disaster recovery planning, RTOs for databases and application files are explicit. A unified objective would ignore both high-availability and disaster recovery boundaries and establish an RPO for all services.
A service with a high RTO might have a zero RPO. For example, the loss of financial data which must be reported to the government is unacceptable, although resuming access to that data might take days. Similarly, the RPO for a service could be greater than the RTO of the service. The RTO for a given service might be a few minutes, but the RPO might allow restoration to a two-day-old image.
Recovery time criticality, however, does not mean that applications with high RTO and RPO values are unimportant. Many strategically important applications have high RTO/RPO values.
A high-availability objective has a single dimension for expressing average recovery time (RTO is assumed to be zero for databases and undefined for application files). A disaster recovery objective has two dimensions expressing recovery time and recovery point. But in a service disruption continuum, recovery objectives for all services have both RTO and RPO dimensions. Figure 3 illustrates a 4x4 recovery-class matrix for classifying applications comprising a service and mapping applications to solution approaches.
Analysis of your business processes will determine the coordinates of your matrix and how you classify applications. For any given coordinate, there are a small number of solutions that will satisfy the corresponding RTO/RPO values.
In theory, the solution that best satisfies the objectives of a given coordinate will cost less than the best solution for a coordinate having a lower RTO/RPO value. The goal is to implement the lowest cost solution that satisfies the RTO and RPO for a given application.
However, there are usually several ways to implement a given solution, depending on the following considerations:
Your last question might be: "What problems does the service disruption continuum address?" Very simply, you avoid two undesirable outcomes:
About the Author