The Five Biggest Problems with Disaster Recovery Plans

SEPTEMBER 30TH, 2013

Karl Palachuk is the senior systems engineer at America’s Tech Support, author of ten books, and contributor to the Recovery Zone.  From the time we are children, we do “fire drills” so we know how to evacuate a building during a fire. Everyone knows what to do during a fire. But most businesses have no idea what to do after the fire. Similarly, we’ve gotten pretty good at creating reliable backups, but many small businesses have no idea what to do in a disaster recovery scenario. Backups are vital, but backups are not a disaster recovery plan. After developing disaster recovery plans for the last twenty years, I’ve found five key weaknesses many businesses have with their plans:

  1. There is no plan
  2. The plan is wrong or inadequate
  3. The plan relies on the wrong technology
  4. The plan is not properly tested
  5. The plan has insufficient information management

Let’s look at these one at a time. I won’t repeat all the scary statistics about data loss, but let me set the stage for this discussion. A “disaster” can be large or small. It can be isolated to one drive array or affect businesses in several states. When planning for a disaster, you will need to plan for the worst case scenario. That should serve you well in all the lesser disasters that you experience. For this discussion, we’re going to include the worst case scenario that assumes the client’s office is “down” and the client has no access to the office. This might be due to a flood, a fire, a hazardous material spill, a hurricane, a tornado, an earthquake, or something you can’t even imagine. Remember the businesses in Seaside Park, NJ. First they were devastated by Super Storm Sandy. Then after ten months of rebuilding, a fire destroyed more than fifty businesses again. You can’t plan for that—specifically. But you can plan for disasters generally. Problem #1. There is no plan Most businesses have some kind of backup plan. But a backup plan is not a disaster recovery plan. In some cases, the client has the right technology, such as StorageCraft ShadowProtect, but they don’t have a plan to define how they’ll use it. They simply start creating images. In a true disaster, a calm head is extremely valuable. The last thing you want to do is show up, throw in some replacement hard drives, and then ask yourself “Okay. What’s next?” One of my mantras for success is slow down, get more done. You want to show up with a very good idea of what you’re going to do. And then you’re going to calmly execute the plan. Never forget the human element. The client will be stressed. He or she might be experiencing emotions of anger, fear, and frustration. Your calm control of the situation will be greatly appreciated. The worst example of not having a plan is when one person (usually you) has a rough plan in his head, but there’s nothing written down. As you’ll see, a disaster recovery plan is a lot more than knowing how to restore an image from ShadowProtect. Problem #2. The plan Is wrong or inadequate A disaster recovery plan can be “wrong” if it’s too simple or too complicated. It should be as brief and clear as possible, but still include the most important information. For example, a 300-page plan is too long for a small business. No one will read it except the person who wrote it. And when disaster strikes, no one will grab that book and try to figure out what they should do. It might look good, but it’s useless. A plan can also be wrong if it does not account for alternate configurations. Twenty years ago it was very common to list very specific equipment that was required for recovery. This is rarely appropriate today because of the nearly universal use of PCs in the small business environment. In the days when small businesses ran mini computers such as HP 3000s and AS400s, technology was less interchangeable and evolved more slowly. Today we are likely to need a “good server” rather than a specific model and configuration. It’s best to define machines in terms of what they need in order to accept the image from the backup device. In general, all the equipment you buy after a disaster will be newer and faster than the equipment it replaces. Finally, a disaster recovery plan can be inadequate if it does not address the most common scenarios. We build a plan for “the big one,” but it also has to cover little disasters such as a simple hard drive failure or a failed power supply. Sometimes we are tempted to skip addressing these little disasters because they are simple enough to fix. But it’s much better to create a checklist and execute that mini pan when a hard drive fails. Problem #3. The plan relies on the wrong technologies There are lots of ways you can find yourself relying on the wrong technology. The most common are probably the use of outdated media, equipment, or techniques. If you keep bringing your backup solution to the same server every 3-4 years simply because it still works, you could be in real trouble during a disaster. Yes your zip drive “works,” but you’ll have a tough time finding one in a hurry when the old one burns up in a fire. Another example of using the wrong technology happens when you try to save too much money. If every component of your disaster recovery solution is the cheapest thing available, you are simply compounding the chances for a failed recovery. Think about a cheap SAN with the cheapest drives you can find, cheap memory sticks, cheap cables, and replicated to a homemade BDR device. When things go wrong, they can go very wrong. Saving money in the short term could leave you with a very unreliable disaster recovery system. Sometimes you end up with the wrong technology because the client is too involved. In some cases, that means that the owner is playing some critical role in the backup or recovery process. Other times it means the client insists that they have to understand the entire process. If they’re not very technical, you might be relying on a completely inappropriate technology. Consultants can also be the cause of relying on the wrong technology. If you do an incomplete inventory of equipment, you might not backup up everything you should. This can happen if you rely on an automated scan of the network that is not verified. If you don’t have a complete understanding of what needs to be recovered, you can end up selling the wrong solution. Problem #4. The plan is not properly tested Just as with backups, if you don’t test your disaster recovery plan, you don’t have a disaster recovery plan. A key piece of designing the right disaster recovery plan is understanding how it will be executed. Who is authorized to initiate a disaster recovery? How do you initiate spinning up a virtual server? If you don’t practice, you’ll be learning during a disaster! During a practice disaster you will learn some things that will help you refine the plan. For example, you might find data is stored where it’s not supposed to be (generally that means it is on the desktop instead of the server). A test will help all participants understand the list of recovery priorities for services and data. For example, a retail business will need to make sure that the ability to process sales is restored as quickly as possible—both online and offline. The client needs to continue taking orders, delivering products/services, paying bills, paying payroll, etc. In other words, they need to do everything they did yesterday. And they're all stressed out. And short-tempered, frustrated, and generally not fun to deal with. They might not have a place to work or the tools to get the work done. They turn to you because you're the highly skilled technical professional. If the first words out of your mouth "I don't know what to do," they'll soon be referring to you as their previous consultant. Without proper testing and refining of the disaster recovery plan, you won’t have a solid understanding of the client’s business. This is what they call an iterative process. You keep refining, keep testing, keep refining, keep testing, etc. As the business changes over time—and technology changes over time—testing is the best way to verify that you will actually be able to execute a successful recovery when needed. Problem #5. The plan has insufficient information management Information and documentation are the most important elements of a successful disaster recovery plan. This includes communications during a disaster and during the recovery. In other words, information needs to be managed before, during, and after the disaster. In addition to some kind of checklist how-to documentation for recovering data, you need very clear documentation about who does what:

  • Who is in charge of the recovery process?
  • Who will handle communication with banks, insurance companies, the media, employees, and clients?
  • Who orders replacement equipment and software needed to get back in business?

You also need a “binder” of information that will be useful during and after a disaster. This will include:

  • Account numbers for various services, including banks, insurance companies, and important vendors
  • Contact information for employees, clients, vendors, insurance, banks, etc.
  • Description of communications protocols during the disaster. This includes both communications among your team and monitoring of the news. This might include radio, smart phone apps, government websites, news
  • Communication protocals also include a description of who calls the people on your team and coordinates all other communications.
  • What is your realistic time frame to get the client operational and making money again?
  • What's the time frame for completing a total recovery with all systems replaced and up online as they should be?

This binder (or whatever you use) should exist in paper form along with your other physical backup and disaster recovery materials such as offsite backups. It should also exist on an encrypted cloud-based drive for the client and within your professional services automation (PSA) system As you can see, there’s a great deal of information that needs to be managed and kept up to date within your Disaster Recovery Plan. You need a plan, it needs to be the right plan for the client, it needs to be based on current technology, it needs to be thoroughly tested, and it needs to be well documented. The bottom line is that preparation will make a disaster recovery go as smoothly as possible. Having technical knowledge and a vague idea of what needs to be done is simply not enough. A successful recovery requires a good plan that addresses the five biggest problems of disaster recovery plans. (For more information about making a plan, check out our Recover-Ability guide “Making Disaster Recovery Easy.”)

You May Also Like