|
Disaster Avoidance and Disaster Recovery: Okay, here comes one more in the plethora of Titanic analogies the recent film has spawned. That historic tragedy offers a lesson to those of us with an interest in fault tolerance and disaster recovery. It provides a classic example of our tendency to place too much confidence in our ability to prevent a disaster from affecting operations, and not adequately preparing to recover should a disaster break through those preventive measures. Capabilities for effective recovery must be in place because there is always a chance that, despite all the preventive steps, there will still be a failure. In the case of the Titanic, why weren’t there enough lifeboats on board? Because the ship was unsinkable! Throughout the vessel were what today are known as failovers: systems and drills established to prevent the sinking of the ship. So confident were its designers and crew in the unsinkability of the vessel, they were willing to wager the lives of hundreds of people that lifeboats were unnecessary, except as tasteful decoration. This analogy has a very strong application in today’s transaction-based businesses. The laws of probability and mean time between failure may pronounce your company’s systems unsinkable. You may have many types of failover systems – as simple as a UPS or as complicated as a redundant, mirrored system – to ensure that you don’t go down. But without a plan to respond and recover when you collide with that inevitable iceberg, you will have problems. In fact, according to a Department of Labor study, some 40 percent of companies that experience a devastating loss to their data systems never reopen their doors. Are You Ready For The Worst Kind Of Disaster? There are three basic types of computing disasters. There are natural disasters, such as earthquakes, floods, and hurricanes. Accidents can range from a devastating fire or an airplane crashing into your data center to an employee inadvertently wiping out a crucial block of data. But the most sinister, and frequently the most catastrophic form of disaster, is deliberate: a disgruntled employee or ex-employee seeking revenge by trashing or stealing key data or introducing a debilitating virus. Also in this category are the possibilities of corporate espionage or damage from hackers. According to statistics provided by the Computer Security Institute (ComputerWorld, Nov. 9, 1998) corporate data losses for the year exceeded $135 million. Of this amount, more than $50 million was the result of unauthorized insider access. Murphy’s Law Will Prove Itself The probability of such an event happening to your company is up to you to decide and respond to. However, you cannot place all of your bets on the risk-analysis business model, which simply guesses at the likelihood of a disastrous event occurring. The prudent solution is instead to examine the worst-case scenario and to accept that somewhere in the many steps taken to prevent disaster, the ship could still sink. In other words, establish complementary processes: take preventive measures to keep an incident from becoming a disaster. Then establish a plan that will enable you to recover when Professor Murphy decides to show up anyway. Disaster Avoidance: The First Step Disaster avoidance encompasses the many steps a company can take that enable it to respond in stride to any of the aforementioned events. High availability, redundancy, fault tolerance, and failover are all mantras of this process. By all means, you need that superb failover system. However, you must assume that it will not provide foolproof protection from certain types of damage, particularly that caused by human error or malice. Enter The Disaster Recovery Plan Once you’ve invoked those mantras, it is time to add a new one: the concept of "failsafe." That is, the company’s ability to survive the disaster it has so valiantly tried to avoid. The disaster recovery plan is extremely necessary to the survival of a company. Miami-based Burger King offers an excellent example of the virtues of a good disaster recovery plan, which enabled it to survive Hurricane Andrew in 1992. Because the company went to great lengths to develop a plan for its corporate headquarters, it was up and running while other Florida-based companies foundered or sank. While the information that follows is highly simplified, it will give you a sense of what is needed for an effective disaster recovery plan. Take your data off site. It is standard practice to store your media at an off-site location. This step reduces the likelihood that the same event will affect both your on-site data and your backup. However, some companies take even greater steps by placing a large geographical distance between the data center and the vaulting site. For example, several Seattle-based institutions, including one of the nation’s largest banks, accept the very real threat of a catastrophic earthquake hitting the Pacific Northwest. Should that occur, several events could combine with paralyzing results. In the worst-case scenario, both the data center and the vaulting facility could be damaged. A couple of other, much more likely events could also occur. First, the infrastructure – roadways, bridges, airports, etc. – would be devastated, making shipment of off-site data to a hot-site extremely difficult or impossible. And even if enough of the infrastructure survives, the human factor comes into play. The effects of the event – emotional and otherwise – on local employees could bring recovery to a standstill. With such possibilities in mind, these firms have taken the additional step of shipping their backups 280 miles away, to Spokane, WA. By doing so, they have accomplished two necessities. First, they are placing their backups in an area that is not affected by Northwest fault zones, eliminating the danger of an unrecoverable loss. Second, they are entrusting those backups to personnel who are much less likely to be affected directly by a Seattle earthquake. As a result, they are positioned to better respond and recover. Take your data off line. Many companies, particularly those who use paperless transactions extensively, maintain a mirror image of their production data, sometimes in an off-site facility. While this is an effective disaster-avoidance or continuity measure, it could have a serious impact from a disaster-recovery perspective. Take for example, the introduction of a devastating virus. If you run a mirrored image of your data to assure an up-to-date backup, you end up with an up-to-date, virus-infected backup as well. An effective disaster-recovery plan doesn’t discount the value of the mirror image; however, it must also take into account the possibility of an event such as a virus. It does so by placing a break in the stream, by taking the backup physically off line. Put your data out of reach. Only by keeping multiple generations of data on tape and shipping them to a remote location can you be fully assured that you are protected from viruses, sabotage, human error, and other online attacks that a mirrored system does not protect against. If your data loss is due to an internal act of sabotage – an embezzler attempting to cover the trail, for example – there is virtually no chance that the culprit will be able to access the vaulted data (at least not without leaving a lot of evidence). To do so would require the embezzler – or an accomplice – to breach the vault and its security measures. The bottom line is that while it will take a little longer to recover using an older generation of data, your company will be able to recover. Test, Test, Test! This is an all-important and often-overlooked aspect of the data recovery process. Schools conduct fire drills regularly. And the drills aren’t simply to keep the kids in practice. They are also used to check the amount of time it takes to clear the building and to find any weak links in the safety process – before it’s too late. A good recovery plan requires the same attention. Do the drills. Find the weaknesses in your plan. And don’t point fingers. Just fix the problems. No reputable disaster recovery firm discounts the value of failover steps, particularly for firms participating in e-commerce. There are enormous costs that even a few minutes of downtime can levy against firms who rely on all electronic transactions. However, it is imperative that such firms do not place all their eggs in that failover basket. You must expect a disaster and be prepared to respond to it in order to survive. About the Author |