Expert Opinion: Oscar Arean on Disaster Recovery Testing

What are the biggest worries your customers have regarding disaster recovery?

The most common concern we hear is around testing. The biggest reason that a lot of IT managers seem to be avoiding DR testing altogether, is lack of time. In fact, well over a third of the 400 IT professionals we questioned in our 2014 Data Health Check listed lack of time as the main reason they hadn't tested their DR system in the last 12 months.

Other big contributors tend to be the costs involved and lack of staff with the relevant skills for testing. At a roundtable event we held earlier in the year, most of the organisations in the room had a DR function in place but, incredibly, the majority said they didn't have any confidence in it because they hadn't tested it.

Is there a good time for testing?

There is arguably never a "good" time for downtime, even if it is scheduled. It's pretty uncommon to be able to get all your key decision makers in the organisation to agree on a time for testing because ultimately it comes with the risk of prolonged downtime should anything go wrong. There is no perfect time to test, but there are ways to limit the risks.

A lot of people assume you should perform testing last thing on a Friday, or over the weekend, because it gives you time to fix any issues before Monday morning comes around and you have a disgruntled team to deal with. While this is somewhat true, it could mean that your support staff have clocked off for the weekend too, leaving you to fend for yourself should something go wrong. It may also mean your testing environment doesn't truly reflect the usual working environment – testing is meant to give you a real-life view of how you'd cope in a disaster.

You just need to pick a time and stick to it. The best time to test DR is going to be what fits in with your business - all businesses are different and have different requirements. Sometimes it feels that there is never a good time between managing day to day operations and testing your DR plans, but DR plans need to be tested. There's no point in having a fire alarm and not checking that it's working. The impact felt by poor DR planning is always far more destructive and costly to a business if it's taken for granted or left unchecked.

Who should be involved in your DR plan?

First, it's important to distinguish between a Business Continuity Plan (BCP) and a Disaster Recovery (DR) plan. A BCP is a company-wide document detailing how the business as a whole can continue to work in a disaster. A DR plan is an IT-centric subset of the wider Business Continuity Plan.

Who's involved in the writing of the disaster recovery plan can depend on the size of your organisation, but usually it will be a collaborative process between the key decision makers across the business including the IT Manager, Managing Director, individual department heads and the CEO/CIO/IT Director (if you have one).

Ultimate responsibility also differs depending on size. It generally falls to the CEO in nearly half of small businesses, whereas in large businesses, it was more likely to fall to CIOs or dedicated BCP managers. I think the most important thing to remember here is that your DR plan is not just an IT issue – it's essential to agree on the criticality of systems across the entire organisation.

Are there risks to DR testing?

As with any kind of testing, there can be risks of complications or unexpected downtime. But this is why cloud-based DR is such a compelling proposition – because it's so easy to failover into a temporary environment, configured as though it was a real situation. It doesn't matter if you run in to complications because your live environment remains unaffected.

Actually one of the biggest risks when testing is not that something will go wrong, but it's that you'll miss something. In most cases, testing is limited and you don't do absolutely everything that you would in a real disaster. For instance, you may not failover MX records for your email service, or you may not send all of your staff to your DR site or to work from home. You should test as closely to a real invocation as possible to limit the risks of missing something that will be important in an actual disaster.

What are your top tips for successful testing?

Like I said before, it's about picking a time and going for it. If you need to convince other managers it's a good idea, ask them if they're prefer you performed the test in a controlled environment that limits disruption to the live environment or to wait for an unscheduled, real disaster with no plan on how to deal with it and no guarantee you'll have the staff on hand to fix things. It should be a fairly simple decision.

It's also good to remember that you don't have to test your entire environment at once. In fact, we recommend that you don't do that.

Ultimately, the point of testing is to identify the weak spots in your system so that in the event of an actual disaster you can feel assured that your data is safe. Rather than testing the entire system at once, test individual, more manageable components to ensure thoroughness without increasing risk.

It's a really good idea to use a DR event log when you test so that you can keep track of exactly what you did and what issues there were. You can refer back to this later and use it to make improvements to your plan, ensuring that a real invocation goes as smoothly as possible.

Finally, your DR plan is not something you write once and then forget about – it quickly becomes out-dated. I'd recommend reviewing and updating it at least annually and more so if you make any significant changes to your infrastructure. Make sure all roles, responsibilities and contact details are up to date so that anyone who happened to pick up your plan in the disaster would know exactly who to call.

 



Who is Oscar Arean?

Oscar is Technical Operations Manager at Databarracks. He is a specialist in providing private, public and hybrid cloud services including backup, archiving, DR and email hosting. Oscar is also Databarracks' Information Systems Manager and has been instrumental in helping Databarracks achieve ISO certifications 9001 and 27001.


 

Visit us:

Address:

Databarracks Ltd
1 Bridges Court
London
SW11 3BB

Get in touch:

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.