10 Best Practices To Improve Disaster Recovery
How you deal with a network outage, recover from it, and the steps you take to keep it from ever occurring again are crucial factors for your organization.
Articles published January 23, 2017
How you deal with a network outage, recover from it, and the steps you take to keep that issue from ever occurring again are some very crucial factors for your organization. Infrascale reports that one hour of downtime can cost small businesses $8,000 and large companies $700,000 on average. These are large numbers, and it can be alarming to think that this could happen to your organization at any time.
The following content from the Microsoft Business Blog provides some crucial tips and advice for improving business continuity and disaster recovery planning processes.
If you woke up tomorrow and ran a marathon, how would you fare? It’s highly doubtful that you would successfully run the 26.2 miles without months of training, drills, and exercises.
The same is true for disaster recovery (DR): The chance that you could successfully recover IT operations without having exercised your DR plans on a regular basis is slim at best. The chance that you could successfully recover and meet your recovery objectives is zero. Yet Forrester finds that exercising DR plans is one area in which many businesses continue to fall short.
Although most businesses claim they conduct a full exercise of their DR plans at least once per year, anecdotal evidence suggests that the majority of these exercises are not comprehensive and thorough; businesses — small and large — often just exercise a portion of the plan or a subset of applications. Indeed, many of the organizations Forrester has spoken with know that they need to improve their DR exercise program, but face barriers such as a lack of executive support, limited employee resources and time, and a fear of interrupting vital business processes.
Disaster Recovery Exercise Best Practices
If the aforementioned examples sound all too familiar, consider following these 10 best practices for updating and improving your current DR exercise program.
1. Define Specific Exercise Objectives Upfront
Exercising for the sake of exercising is a waste of time. Make sure that there are clear and concrete objectives and goals set up front that will help determine the ultimate success of an exercise. One objective may be as simple as, “Verify our stated recovery time and recovery point objectives.” You could orient other objectives around training, such as, “Familiarize the database administrators with the plans for recovering Oracle.”
2. Include Business Stakeholders
Business owners play a vital role in your DR exercises, and they need to be involved from the start of the exercise until you have recovered all services. All business stakeholders should verify the successful recovery of services.
This has the dual benefit of ensuring that you have properly recovered business processes with all of their critical components as well as ensuring that business stakeholders know what to expect in terms of recovery capabilities and performance at the recovery site during an actual declaration.
3. Rotate Staff Responsibilities
It’s important that the person who wrote the DR plan is not the same person who executes the test, as it is unlikely that that individual would be available in a real disaster. Some companies Forrester interviewed went so far as to have employees with little specific knowledge of a system executing those tests, such as a system administrator running the database DR test.
An important secondary benefit of a DR exercise is training; by assigning staff to take on new roles during exercises, you are essentially cross-training staff in different areas.
4. Develop Specific Risk Scenarios For Your Exercises
Many companies conduct their DR exercises without specific scenarios; they tell the response team to assume the data center is “a smoking hole.”
It is important, however, to define specific risk scenarios even for DR testing for two main reasons: 1) It provides a more realistic situation for the response team to react to, and 2) different scenarios require different actions from the IT staff.
For example, the DR plan for a short outage at the primary data center that only requires resuming operations would be different from a long-term outage that requires failover (and eventually failback), which in turn would be different from scenarios where only portions of the IT infrastructure were down.
5. Run Joint Exercises With Business Continuity (BC) Teams
In our research, Forrester found that many BC and DR teams run all of their exercises separately and often fail even to communicate when they run exercises. However, you should aim to exercise the full BC and DR concurrently at least once per year. This is especially important if the data center is in the same location as the head office.
6. Vary Exercise Types From Technical Tests to Walk-Throughs
A common misconception in IT is that walk-throughs and tabletop exercises are not necessary for DR exercises. While it’s true that these types of exercises won’t test the technical capabilities of a failover, they are still critical for training, awareness, and preparedness. Interviewees told us that the majority of the time, exercises that didn’t go as planned actually struggled most with communication and employees’ understanding of their roles during the exercise. Non-technical exercises such as walk-throughs and tabletops will help make these processes go more smoothly.
7. Test All IT Infrastructure Concurrently at Least Once Per Year
Waiting longer than a year risks too much change in IT environments and personnel — you need to bring new staff members throughout the organization up to speed on DR plans. The most advanced firms run full DR tests as often as four times per year. In between full tests, most firms conduct component tests that vary in frequency depending on the criticality of the systems and rate of change in the environment.
8. Identify Members for the Core DR Response Team
The stress of working under time and resource restraints for long hours, often during nights and weekends, is something people cope with in different manners. If you are putting together a core response team to lead IT recovery, it’s important to pick people who can work under extreme amounts of pressure (and sleep deprivation). During an exercise or test, identify those individuals who can remain calm and collected.
9. Learn From Your Mistakes
The point of running DR exercises is to find potential barriers to recovery while in a controlled environment. If you aren’t encountering problems during your exercises and tests, it’s more than likely you aren’t looking hard enough, aren’t testing thoroughly enough, or you have designed scenarios for recovery that are too simple. When you complete exercises and tests and you have identified problem areas, use what you have learned to update plans and create best practice documents.
10. Report Results to Stakeholders
If your business has recently made significant investments in improving preparedness, most likely executives, business owners, and other stakeholders want to know what the return is on their investment — how prepared are you? Reporting exercise and test results regularly and in a timely fashion gives executives and business leaders visibility into your DR program. Remember that the results are not pass/fail but should detail aspects of recovery that went well and areas for improvement.
When is the last time you tested your DR plans?
Enhance Your Information Security
Disaster recovery is just one aspect of a comprehensive business continuity plan. You should also take steps to enhance your information security with an approach that employs common-sense password policies, network configuration, data security practices, and an eye for ongoing social engineering that threaten your systems.
DOWNLOAD THE WHITE PAPER