Hunting the Black Swans in Your Continuity Program
This is the fifth in the DRG ongoing series regarding hunting and mastery of the black swans in your continuity program. Look for it on the first Wednesday of each month.
“Black Swans” in your Continuity Program are those events that remain outside the range of your normal expectations, and may well produce a significant negative impact when they occur. For reasons of budget, culture, or simple lack of awareness, we just do not see or deal with these potentially devastating exposures in our enterprise continuity capability. This series discusses some of the most common of these “black swans” in business continuity programs, those that are really staring us in the face and screaming for attention.
Quarry 1: Employee Availability for Response Activities.
Quarry 2: The Level of Individual Employee Commitment to BCM
Quarry 3: Exercising Your Plans
Quarry 4: Exercising Your Plans: Objectives and Annual Programs
Quarry 5: Exercising Your Plans: Business Unit Continuity Plans
So here we are in September and the summer is nearly over as I write this. Fall is the time for coming back from vacation, going back to business, the time to remember that Business Continuity is all about the business – our overall objective must be to recover the most critical business functions within the RTO and with the RPO that have been agreed to. As we have discussed, there are several basic exercise techniques: Notification, Tabletop, and Displacement.
Notification exercises should be performed regularly for every business unit continuity plan to verify that contact information is correct AND that team members are familiar with the process. These can be individual standalone exercises or a part of a more complex exercise involving multiple business unit continuity plans and/or technology recovery plans and/or logistics and support plans.
Tabletop exercises come in a number of “flavors” with increasing complexity (reality):
- Walk through a single business unit continuity plan only. Use a specific event scenario.
- Walk through several related business unit continuity plans. Be sure that each has previously been tested individually (in a unit exercise). Use a specific event scenario.
- Add IT System Recovery Plans to this walkthrough format. Be sure that each IT system has been though a unit test – either a walkthrough or an actual system re-creation at an alternate site prior to bringing it in to a complex walkthrough involving a defined event scenario.
- Add Logistics and Support Plans such as Employee Communications, Employee Support Plans, Insurance, Restore/Relocation Plans, etc. Again, each of these should have been unit tested prior to being brought in.
- Perform tabletop exercises with multiple plans of the various types against event scenarios that evolve over the exercise timeframe.
Displacement exercises require that employees relocate to their defined alternate work environment. Except for individuals working periodically from home (and where home is the defined alternate site), most displacement exercises require a number of unit exercises of increasing complexity in order to evolve from individual unit exercises to multiple unit exercises. Examples include:
- Individual members of a business unit work from their defined alternate site on a regular basis, such as once a month.
- All members of one or more business units work from their defined alternate site(s) on a designated day.
- The above are combined with an integrated IT Recovery Exercise where business people access the re-created version of their critical IT applications. Normally this type of exercise is performed only after extensive unit exercising so as assure that the exercise will be not derailed by the failure of a single component.
- Complex displacement exercises with multiple business unit plans, IT recovery plans, and/or logistics/support plans of various types against a complex and evolving event scenario.
All of the above may also be “surprise” exercises, where the participants find out shortly before the exercise just prior to the precipitating “event”. This injects yet another level of realism, or verisimilitude, into the exercise. What you eventually want to be able to do is to assess the capability of all of these plans to support the recovery objectives (RTO, RPO) of each business unit, with those business units being defined as critical obviously being exercised first. So let’s review for a minute what we have said before about the objectives of exercising:
There are three main objectives around which you should be building your BCM longer-term exercise program:
- Plan documentation correctness and completeness
- Staff and/or vendor training
- Increasing verisimilitude
Once you have established that your recovery strategy does respond to your requirements, and that the recovery documentation is largely correct and complete, and that your recovery people are minimally competent to execute their recovery duties, THEN you can begin to perform much more useful exercises. You will probably need a few years or more of diligent exercising just to get to this point.
Even for a small to medium size organization, it will take 3-4 years or more of diligent exercising to arrive at the point where you can feel confident that your continuity plans will give your business what it needs to “stay in business.” And this is if no significant changes occur within that timeframe – which of course rarely happens in a successful organization!
It is in performing exercises that your business units and the Board Room will begin to understand the impacts of the strategies and requirements that they have requested….and to understand that these may be wholly inadequate to protect them from complex event scenarios, or even simple event scenarios. A good example is finding the unresolvable data resynchronization issues among interconnected IT systems only when some of these systems fail, while some remain operational. Only then will the well-ensconced black swans be revealed. It is much easier to exercise with a total, “worst-case scenario” than with various partial failures (which are clearly of higher probability as well).
And so our exercises need to do not what is easy, but what is effective. Remember that exercising is about revealing these black swans that can derail your recovery and take your company down. If you cannot see them, you cannot get rid of them. The more complex your event scenarios become, the greater the probability that you will find these inadequacies, and do what you need to do to slay these black swans. Taleb says that we are adept at rationalizing our vulnerabilities after the fact….and equally adept not seeing them before they are revealed by an improbable event. Integrated exercising with a focus on the impacts to the business processes helps us to see these exposures... without going through an actual disruptive event.
Integrated and complex plan exercising can be expensive and sometimes mildly disruptive. And you always have a choice: have an annual planned exercise strategy to exercise against specific event scenarios and then to correct the exposures you learn about. This maximizes your chances to survive a disruptive event. Or you can pay lip service to doing exercises and take your chances with all of the exposures you don’t know about by doing the same unrealistic exercises year after year after year.
It’s your choice.
About the Author:
Kathleen Lucey, FBCI, is President of Montague Risk Management, a business continuity consulting firm founded in 1996. She is a member of the Board of Directors of the BCI, and the founding President of the BCI USA Chapter. IBM chose her as the first winner of its Business Continuity Practitioner of the Year Award in 1998. She speaks and publishes widely in both North America and Europe. Kathleen may be reached via email at firstname.lastname@example.org.