Written by Kathleen Lucey, FBCI   
Untitled Document
Blackswan

Hunting the Black Swans in Your Continuity Program


This is Vol. II, No. 12 in the DRG ongoing series regarding hunting and mastery of the black swans in your continuity program.

"Black Swans" in your Continuity Program are those events that remain outside the range of normal expectations, and may well produce a significant negative impact when they occur. For reasons of budget, culture, or simple lack of awareness, we just do not see or deal with these potentially devastating exposures in our enterprise continuity capability. This series discusses some of the most common of these "black swans" in business continuity programs, those that are really staring us in the face and screaming for attention.

Already published:

Volume I

Quarry 1: Employee Availability for Response Activities.
Quarry 2: The Level of Individual Employee Commitment to BCM
Quarry 3: Exercising Your Plans
Quarry 4: Exercising Your Plans: Objectives and Annual Programs
Quarry 5: Exercising Your Plans: Business Unit Continuity Plans
Quarry 6: Exercising Your Plans: Technology Recovery Plans
Quarry 7: Exercising Your Plans: Logistics, Communications, and Support Plans
Quarry 8: Lessons Learned
Quarry 9: New Year's Resolutions
Quarry 10: 10 Steps to Building a Black Swan-free Business Continuity Management Program
Quarry 11: New Year's Resolutions
Quarry 12: Developing "Black Swan Sighting" Skills: Warm-up Exercises

Volume II:

Quarry 1: The Centrality of Power: Seeing the Connections
Quarry 2: Power Outages: Isolation Effects
Quarry 3: Power Outages: How Employers Can Get Involved
Quarry 4: Cascading Effects on the Support Fabric
Quarry 5: Deeper Dives to Narrower Terrains: Dive 1
Quarry 6: Deeper Dives in Wider Terrains: Dive 1
Quarry 7: Cascading Black Swan Events
Quarry 8: Cascading Black Swan Events 2: Avian Flu Outbreak
Quarry 9: Black Swans in Our Midst: Debugging Your Response Preparations
Quarry 10: Black Swans in Our Midst: Effective BSE (Black Swan Event) Management in Your Recovery Plans: Part I
Quarry 11: Black Swans in Our Midst: Effective BSE (Black Swan Event) Management in Your Recovery Plans: Part II


Volume II: Quarry 12:
Black Swans in Our Midst: Effective BSE (Black Swan Event) Management in Your Recovery Plans: Part III

Today we continue with Part III of our discussion of Incident Management Teams and the many ways that they need to work with each other. Last month we discussed the importance of a centralized communications function. Today we will deal more fully with the necessity for the smooth interaction of the Incident Management Teams and the Emergency Logistics Teams, on the left and right top of the diagram, respectively. We will also touch on Insurance as well as the Repair/Relocate Teams in the lower right corner of the diagram.

Much has been written about Business Continuity Plans, both as a single unified plan as well as individual Business Unit and Information Technology applications and infrastructure. There are many different products available to help you to design and write these plans – the ones on the bottom left of the diagram. But what about the other plans and activities that are so essential to management of an event? What about everything else that appears in this diagram? ALL of these activities are necessary to manage an interruption event well. And NONE of these are normal business activities.

Today we will discuss more of the many specific roles of the Interruption Response Management and Emergency Logistics functions, as well as Insurance and the Site Repair/ Relocate functions.

Incident-Management-layout

"Interruption Response Management" has many different names. Some call it Crisis Management, some call it Emergency Response Management, and some call it just Incident Management. Whatever its name, it defines the roles of Executive Management and Operational Management, as well as those of Media Relations, in the management of the incident. It is both internally and externally focused.

In its internal focus, these teams receive information from the Emergency Logistics teams regarding the following:

  • State of the Organization: Damage Assessment, HAZMAT (hazardous materials security), and other security issues, availability of staff, etc.
  • Requests for Support: funding, emergency transportation, physical security, etc., employee support, etc.
  • This information is then passed on to the Interruption Management Team, which analyzes it. This Team then consolidates the information, and makes recommendations to the Executive Oversight Team.
  • Once a decision is made by the Executive Oversight Team, the Interruption Management Team implements it, and communicates decisions back out to the Logistics Teams as well as to the Business Continuity Coordination Team and the IT Recovery Coordination Team, going through the central Business Continuity Communications function, which has responsibility for chronicling all of the response events.

Interruption Response Management also has a responsibility to decide what communications are appropriate to all external and internal audiences based on the information received about the incident as well as Senior Management decisions about basic information distribution strategies. Often the actual dissemination of external and internal communications is performed by the Business Continuity Communications function, often using an automated notification system. However, for some very large incidents, a representative of the company may be asked to make a statement or answer questions on public media such as CNN and other television or web media. We have all seen the meltdown that can occur when this management representative has not had sufficient media training, so make sure that your representatives know how to conduct themselves within this arena, which can be very challenging to untrained amateurs.

Be aware also of the need to monitor and manage social media; this is usually the role of Media Relations, but it may involve Business Continuity Communications as well.

Make sure that you have the right people on the Emergency Logistics teams. For larger organizations, this may not be a problem. Multiple sites and depth of personnel make it easier to assign multiple people qualified to handle specific responsibilities. However, in smaller organizations, there may be only one person qualified to staff one or more of these teams. What will you do if that person is not available during the management of an interruption event? Here are a few techniques that can minimize that impact:

  • Cross-training. Ensure that the Alternate Team Leader for each team is identified and trained. This person should NOT live in the same area as the designated Team Leader. And so yes, this means that you need to construct a map of residence locations for ALL of the Team Leaders, Alternate Team Leaders, and Team Members so that you KNOW how many risk being impacted by the same natural or man-made disaster. Remember that for organizations that work one shift, Monday through Friday, there are 168 hours in a week, and only 40-50 of these will occur during your "normal" business hours. More than two-thirds of the time in that week is "off-hours". So you need absolutely to prepare for off-hours operation of these teams. Knowing where each team member lives is just the start of what you will need to do to ensure that appropriate personnel will be available to respond when an interruption occurs. Here are a few others:
  • Make sure that contact information is current. Updates to personnel names and contact information on these teams must be current at all times.
    • What if you do not know how to reach the new head of personnel at your outsourced physical security staff provider because he has only been there a week? Black Swan.
    • What if this person does not yet know that s/he has these responsibilities for your organization? Black Swan.
    • What if the alternate to this person is on vacation very far away? Major Black Swan.
    • Now you have a serious situation: Who will you contact and how will you contact that person to get the additional security staff you must have to protect your damaged premises?
  • Conduct regular detailed exercises based on realistic scenarios to identify situations where one person on an Emergency Logistics Team is being asked to be in two or more places at the same time. (I know this sounds crazy, but I have seen it happen repeatedly when testing is superficial.)
  • Incorporate forms (electronic and paper) for the collection of logistics information. When electronic systems and devices are available, electronic forms will certainly be preferable. But as we cannot guarantee the continuous availability of either cellular or land-based electronic communications, we need to rehearse using both the paper forms that are incorporated into Logistics Plans as well as the usual electronic forms.

We still have two areas remaining to discuss, Insurance and Site Repair/Relocate Plans. Rarely are either of these essential functions planned for in any kind of formal way. It is essential that these get underway just as soon as possible after the event and so they also need plans. Note that different plans may be required for different kinds of facilities: office environment, manufacturing environment, IT environment, etc. In order not to run up against an MTPD (Maximum Tolerable Period of Disruption), these functions must begin immediately after the organization is in a stable recovery state: emergency RTOs have been met.

We may see the same staffing problems here as with the Emergency Logistics Plans. However, some functions have such lengthy execution times that it becomes necessary to address them almost immediately after the interruption event occurs. Of course this is easier in large organizations with multiple personnel qualified to ensure that insurance filings are complete and correct (and can be based on detailed pre-event photographs) as well as supervision of the location and build-out of replacement office space as well as specialized facilities.

Once again, we need to have a map of the residences of the qualified people assigned to these teams. AND we need to conduct regular, challenging tabletop exercises involving detailed and realistic scenarios in order to train team personnel in the challenges they will face when they attempt to rebuild lost facilities. These include the following:

  • In an event with widespread damages, reservation of equipment and supplies to priority functions, such as government, healthcare, and essential services.
  • Unanticipated long lead times to build replacement equipment due either to unusual volumes or lack of prior information about the "normal" lead time.
  • Construction and other permit delays.
  • Longer than usual delays for communications equipment installations due to very high demand.
  • Longer than usual delays for computer and network equipment due to extremely high demand after a wide-area incident.

If you fail to conduct detailed tabletop exercises, you will not identify the Black Swans that are just waiting patiently here for the opportunity to bite you. Identify them and you will be able to compensate via other measures.

And so if you did not believe before that development of detailed plans for these information collection, communications, and management functions is absolutely required, and if you did not believe before that these functions require extremely rigorous maintenance and exercise protocols, I hope you have a different understanding now. This is one of the greatest liabilities in many well-implemented and accepted business continuity programs. You will not find a requirement anywhere that says that you must address these areas in your planning and in your exercises. But I trust that you understand now just how important it will be to have cleaned these areas of the black swans that hide within them and grow ever stronger when they are not subjected to the rigorous exercises that will expose them.

And what if you have none or very sketchy versions of all of the functions that appear on the diagram? Look to have a very difficult recovery from a serious incident. ALL business functions ultimately run on logistics, coordination, and communications. The surprises you find if you have not developed ALL of the capabilities referenced in this diagram may well take down your organization. So don't listen to some of the planning tool vendors: your organization needs A LOT MORE than just business unit and IT application recovery plans. And those plans need to be exercised to learn where and how they are inadequate.

Accept the challenge to develop ALL of the elements that you need to stay in operation and not just some of them……JUST DO IT.

About the Author:
Kathleen Lucey, FBCI, is President of Montague Risk Management, a business continuity consulting firm founded in 1996. She is a member of the BCI Global Membership Council, past member of the Board of the BCI, and the founding President of the BCI USA Chapter. IBM chose her as the first winner of its Business Continuity Practitioner of the Year Award in 1998. She speaks and publishes widely in both North America and Europe. Kathleen may be reached via email at kathleenalucey@gmail.com