What is a DR test?

Why test?

Without testing your disaster recovery (DR) plan you have no idea whether it will actually work at its time of need. In theory your processes might seem water tight, but you don't want to find out they're not at your most vulnerable and critical point. Testing might highlight weaknesses, but at least you have the time to address them before they have the chance to cause any real damage to your business.

Why don't organisations test?

There are a lot of reasons that an organisation might avoid testing their disaster recovery plans. The most common reason we hear is simply a lack of time. Other commonly used excuses include cost, a lack of staff with the required skills to actually perform the test, and a fear of prolonged downtime should something go wrong.

What should it cover?

A good DR test will test your DR process – your recovery "settings" if you like. Do you perform daily backups or replicate your data in real time? Will you lose a days' worth of data, or just an hour? Either is fine, that decision has already been made, the purpose of testing is to find out if the reality meets the expectation.

More than just the systems and the recovery of data, it should also test the actual recovery process – is your key contact information correct? Your supplier contact information? Is your call tree up to date so that your employees can see exactly who they need to call and what they're accountable for? Are different people responsible for different types of recoveries? All this needs to be crystal clear for a DR plan to really be ready. At the end of the test, you will be able to measure how you performed against your plan. There are always lessons to learn from testing so you record any changes that need to be made and update the plan.

More than just testing your settings, the test should also cover different kinds of disaster you could potentially face, and the appropriate responses to each. For example, a small scale test of an IT failure that only affects one particular system would require a very different response than a power issue that knocked out the entire organisation and its communications.

Be aware that a lot of software products offer automated recovery testing that checks on the general status of data or servers and tells you whether they can be recovered or not. Whilst these can be really helpful as an initial indicator, they are not the same as a proper DR test and they should certainly not replace one.

Who should be involved?

Who is involved in the writing of the disaster recovery plan can depend on the size of your organisation, but usually it will be a collaborative process between the key decision makers across the business including the IT Manager, Managing Director, individual department heads and the CIO/IT Director (if you have one).

Ultimate responsibility also differs depending on size. It generally falls to the CEO in nearly half of small businesses, whereas in large businesses, it's more likely to fall to CIOs or dedicated BCP managers. I think the most important thing to remember here is that your DR plan is not just an IT issue – it's essential to agree on the criticality of systems across the entire organisation.

When and how often?

A lot of people assume you should perform testing last thing on a Friday, or over the weekend, because it gives you time to fix any issues before Monday morning comes around and you have a disgruntled team to deal with. But it could also mean that your support staff have clocked off for the weekend too, leaving you to fend for yourself should something go wrong. It may also mean your testing environment doesn't truly reflect the usual working environment – testing is meant to give you a real-life view of how you'd cope in a disaster.

The best time to test DR is going to be what fits in with your business - all businesses are different and have different requirements. Sometimes it feels that there is never a good time between managing day to day operations and testing your DR plans, but DR plans need to be tested.

A DR test should also be as realistic as possible and sometimes this can mean you perform an unscheduled test, giving no prior warning to your staff. This can be really helpful in showing you who you weren't able to reach, and how recovery would work without those key people.

Your DR plan is also not something you write once and then forget about – it quickly becomes out-dated. I'd recommend reviewing and updating it at least annually and more so if you make any significant changes to your infrastructure. Make sure all roles, responsibilities and contact details are up to date so that anyone who happened to pick up your plan in the disaster would know exactly who to call.

What are the risks?

Actually one of the biggest risks when testing is not that something will go wrong, but it's that you'll miss something. In most cases, testing is limited and you don't do absolutely everything that you would in a real disaster. For instance, you may not failover MX records for your email service, or you may not send all of your staff to your DR site or to work from home etc. You should test as closely to a real invocation as possible to limit the risks of missing something that will be important in an actual disaster.

Top tips

It's good to remember that you don't have to test your entire environment at once. In fact, we recommend that you don't do that. Ultimately, the point of testing is to identify the weak spots in your system so that in the event of an actual disaster you can feel assured that your data is safe. Rather than testing the entire system at once, test individual, more manageable components to ensure thoroughness without increasing risk.

It's also a really good idea to use a DR event log when you test so that you can keep track of exactly what you did and what issues there were. It can be as simple or detailed as you need it to be, but it needs to be recorded. You can refer back to it later in your post-test evaluation and use it to make improvements to your plan, ensuring that a real invocation goes as smoothly as possible.

If you want more information on disaster recovery planning, get in touch.