BlackswanHunting the Black Swans in Your Continuity Program

This is the eighth in the DRG ongoing series regarding hunting and mastery of the black swans in your continuity program. Look for it on the first Wednesday of each month.

“Black Swans” in your Continuity Program are those events that remain outside the range of your normal expectations, and may well produce a significant negative impact when they occur. For reasons of budget, culture, or simple lack of awareness, we just do not see or deal with these potentially devastating exposures in our enterprise continuity capability. This series discusses some of the most common of these “black swans” in business continuity programs, those that are really staring us in the face and screaming for attention.

Already published:
Quarry 1: Employee Availability for Response Activities.
Quarry 2: The Level of Individual Employee Commitment to BCM
Quarry 3: Exercising Your Plans
Quarry 4: Exercising Your Plans: Objectives and Annual Programs
Quarry 5: Exercising Your Plans: Business Unit Continuity Plans
Quarry 6: Exercising Your Plans: Technology Recovery Plans
Quarry 7: Exercising Your Plans: Logistics, Communications, and Support Plans

Quarry 8: Lessons Learned

As the early TV show used to say…."There are a million stories in the naked city. This has been one of them." There were multiple millions of stories in New York, New Jersey, and Connecticut following the wrath of Super Storm Sandy, as it came to be called. Not leaving out the many other states that were affected by Sandy's wind, rain, and also snow. Those of us living in the affected areas that woke up on October 30 with no water in our basements or first floors, no trees on our houses, our heads, or our cars, and the power (and heat) still on counted ourselves among the very luckiest. Lives were lost to rapidly rising waters and falling trees, and horrific fires, propelled by the furious winds, took out entire communities.

We have much to learn from this storm, and herein is the context of this month's brief: the necessity of careful compilation and application of lessons learned from continuity exercises and interruptions whenever they occur.

As a backdrop to today's analysis, let's first take a look at last year's big event in New York and the surrounding areas in New Jersey and Connecticut and beyond: Hurricane Irene. Preparations in New York City included shutting down the transit systems and urging evacuations from those areas most at risk. Many did evacuate. Although Irene caused significant and widespread damage and lengthy power outages in other areas such as Connecticut, it was a relatively minor event in New York City.

Many New York and New Jersey residents learned a lesson from Irene: they learned that evacuation orders and weather forecasts can be inaccurate. It made them feel safer that the forecasters had been "wrong" with Irene, and more confident in their own abilities to judge what constituted a true evacuation requirement based on their individual experiences. Many of them in the areas most at threat from Sandy's record-breaking storm surge stayed in their homes because of their direct experience with Irene.

And so they did learn from Irene, but they learned the WRONG lesson! They applied to Sandy what they felt that they had learned from their experiences with Irene, many to their great regret. This is an area where there is nothing but varying shades of gray. There is no bright line separating right from wrong. Decisions such as evacuation orders and transit system closures must be made based on the best information available at the time. Such information is necessarily incomplete. But safety must ALWAYS be the paramount factor driving those decisions.

So it is with our continuity exercises and our history with actual interruption events. It is critical that we learn from our experiences, and that we interpret and apply that knowledge correctly and effectively. To do this we need to:

  • Collect all of the relevant information.
  • Analyze what happened thoroughly and fearlessly.
  • Know what worked and what worked less well.
  • Make changes to reflect what we have learned.

And so let's start with collecting all of the relevant information. Just as conducting exercises is a progressively more realistic and complex process, so also is the collection of all of the information pertaining to each event, organized in a way that will facilitate its analysis. Key to this process is the immediate recording of information about exercises or real events as these occur. Therefore people who are sufficiently knowledgeable to understand what is happening in each area of your exercise or interruption event must be present to record the information so that it can be examined in more detail later. These "recorders" are critical to the process as their work will help the team members to remember the details of what has happened. Human beings have a disturbing tendency to see reality as it affects them personally, so look for keenly observant people who can get beyond their personal biases to serve as your recorders, and be sure to bring in a new recorder every 8 hours, allowing sufficient time for turnover to the next person taking on this task next.

Your team will gain expertise faster if you use the same tools each time that you conduct an exercise or progress through an event. You will keep honing and tuning this process and these tools each time you use them. Your team members will also gain expertise each time that they use the process, thus improving their skills at collecting all of the information.

Fearless analysis of the collected information is next. The ideal will be to make this as much as possible an ego-less process, but your ability to get to this point will depend both on your leadership skills as well as the trust that comes from having created a continuity culture that senior management sees as a critical contributor to their organization's success.

Your initial analysis will be easier if you first compare the information from this exercise to the previous ones, and to prior interruption events, if any. And so this means that you must also have effective record-keeping of prior exercises and events. The more complete that set of data becomes, the more effective can be your analysis.

You are looking for areas that are less than optimal, and will be identifying specific ways that these may be improved in future exercises. Be aware always that success in an area that is patently unrealistic, such as calling in backup tapes ahead of time, is of very limited value as this is NOT the way that the process would occur in a real event. The giveaway that your exercise is insufficiently aggressive is that NOTHING is different from what you have seen in prior tests. Remember that the annual exercise objectives should always push closer to realistic conditions, and should be training multiple staff members capable of executing all functions so that single-points-of-failure are eliminated.

Be careful to guard the confidentiality of this preliminary analysis so that all event participants can feel "safe" enough to share their thoughts. This is the leader's most difficult responsibility (and would form the subject of a completely different analysis). Your actual debriefing process should be continuous throughout a lengthy exercise or event, as well as when your teams step down. Within 24 hours of the action phase of the exercise or event, you will go into a more formal debrief, from which a final report should ensue. Unresolved problems or issues should be assigned for analysis and resolution; areas requiring compensating actions to avoid noted weaknesses should be identified. Appropriate measures can then be designed, scheduled, and eventually implemented.
As exercises progress to greater levels of complexity and more realistic conditions, you will begin to see the kind of cascading risks that occur in real events, the type of cascades that we saw with Sandy, and that will occur in all serious interruption events. It is in these cascades that we see the emergence of the most serious black swans – those flaws that are very difficult to foresee and difficult to accept when they are seen. These are the flaws that will make it impossible to continue the organization's activities in an effective manner. Some of these may be far beyond the direct control of the organization, such as an area-wide power failure, transportation systems failures, or inability to refuel a generator due to power outages at pumps and/or lack of fuel deliveries, or even the failure of customer operations. When you reach this level of complexity in your exercises, you can begin to devise the sophisticated strategies that will advance your continuity program into new territory.

Implementing these new strategies will improve your continuity capability. Some of these will be easy and inexpensive. Some will take years to design and implement, and may be relatively costly. Your capability to successfully implement these tools is a function of your technical and leadership skills as well as the positioning of business continuity within your organization. Yes, it may be extremely difficult to do this. But it is the surest way to identify and then slay these most dangerous of black swans.

About the Author

Kathleen Lucey, FBCI, is President of Montague Risk Management, a business continuity consulting firm founded in 1996. She is a member of the Board of Directors of the BCI, and the founding President of the BCI USA Chapter. IBM chose her as the first winner of its Business Continuity Practitioner of the Year Award in 1998. She speaks and publishes widely in both North America and Europe. Kathleen may be reached via email at kathleenalucey@gmail.com.