Introduction
Business Continuity Management (BCM) - which is also sometimes called Business Continuity Planning (BCP) - is a structured approach to ensuring that business functions in general and, in particular, IT systems, continue to operate despite disruptions of varying magnitude. It can be seen as a component part of the broader topic of Risk Management
Risk Identification & Assessment
The approach begins with the identification and assessment of the various risks confronting the organisation. Typically these will comprise: staff and manpower problems; interruptions to utility & consumable supplies; systems and production failures; product & financial difficulties; crime & terrorism; and biological & natural hazards. The methodologies that can then be employed to assess each of these risks range from the "What-if" approach (WIA) to the more formal Fault-Tree Analysis (FTA) and Failure Mode and Effects analysis (FMEA). WIA depends on the availability of knowledgeable individuals who can 'brainstorm' the hazards that could bring negative consequences to the organisation. FTA is a deductive top-down approach where undesirable events are defined and the potential faults that could initiate them are identified. FMEA is a bottom-up methodology which aims to analyse all potential failure modes of a system and then assess how to correct or mitigate these failures and their effects.
Impact Analysis & Prevention
This covers the analysis the impacts of the various hazards on: health and safety, continuity of operations, property, facilities, and infrastructure, delivery of services, the environment, economic and financial condition, regulatory and contractual obligations, reputation of or confidence in the organisation. Once the risks have been identified and assessed, an appropriate programme of preventative measures can be developed. These could include: correction of faults; protective and surveillance measures (e.g. firewalls and anti-virus software for computer systems); deterrence and security systems (e.g. shutters and alarms); health precautions (e.g. immunisations and quarantine).
Incident Management
Procedures and policies need to be established for managing incidents of different types. These need to cover alerting emergency services and other interested parties, including the media and stakeholders. If the incident is judged to have reached crisis proportions, a pre-defined Incident Management Team can be required to assume management control from a pre-prepared crisis management room. If the incident renders the organisation's main location inaccessible, there should be a pre-prepared secondary location available some distance (say at least one kilometre) away. Procedures also need to be set down for any required succession of management.
Mitigation Strategy
This is a portfolio of measures to optimise the consequences of damaging events should they occur. Commonly employed measures include:
- The purchase of appropriate insurance policies.
- In the case of key individuals, succession planning, non-disclosure agreements and gardening leave provisions.
- In the case of key organisational data, a backup strategy involving offsite storage and hardcopies.
- In the case of critical systems (both IT and production), some systems duplication or redundancy.
Disaster Recovery
Disaster Recovery is often regarded as a subordinate element of BCP with the aim of ensuring that the organisation's IT systems are continuous operational under any foreseeable circumstances. This generally involves the creation and operation of a duplicated IT site with up-to-date systems and data. The most expensive option is a so-called "hot site" where the duplicated backup site completely mirrors the main environment with real time synchronisation and can take over its responsibilities with negligible interruption. A cheaper solution involves mutual assistance agreements with other organisations for the provision of resources, facilities, services, and other required support during an incident. As well as the differing setup and running costs of the various disaster recovery strategies other important characteristics are the recovery point objective (RPO) and recovery time objective (RTO) for each business process.