Everyone has a backup plan. Whether you have one that you follow carefully or whether you’ve never even thought about backups, you have a plan in place. Whatever you are doing or not doing constitutes your backup plan.
I would propose that the three most common backup plans that people follow are:
- Remain completely ignorant of the need to back up files
- Vaguely know that you should back up your PC, but not really understand what this means
- Fully realize the dangers of going without backups and do occasional manual backups, but procrastinate coming up with a plan to do it regularly
Plan #1 is most commonly practiced by less technical folk—i.e. your parents, your brother-in-law, or your local pizza place. These people can hardly be faulted. The computer has always remembered everything that they’ve told it, so how could it actually lose something? (Your pizza guy was unpleasantly reminded of this when his browser informed his wife that the “Tomato Sauce Babes” site was one of his favorite sites). When these people lose something, they become angry and will likely never trust computers again.
Plan #2 is followed by people who used to follow plan #1, but graduated to plan #2 after accidentally deleting an important file and then blindly trying various things they didn’t understand—including emptying their Recycle Bin. They now understand that bad things can happen. (You can also qualify for advancement from plan #1 to #2 if you’ve ever done the following—spent hours editing a document, closed it without first saving, and then clicked No when asked “Do you want to save changes to your document”)? Although this group understands the dangers of losing stuff, they don’t really know what they can do to protect their data.
Plan #3 is what most of us techies have used for many years. We do occasional full backups of our system and we may even configure a backup tool to do regular automated backups to a network drive. But we quickly become complacent and forget to check to see if the backups are still getting done. Or we forget to add newly created directories to our backup configuration. How many of us are confident that we have regular backups occurring until the day that we need to restore a file and discover nothing but a one line .log file in our backup directory that simply says “directory not found”?
Shame on us. If we’ve been working in software development or IT for any length of time, bad things definitely have happened to us. So we should know better.
Here’s a little test. When you’re working in Microsoft Word, how often do you press Ctrl-S? Only after you’ve been slaving away for two hours, writing the killer memo? Or do you save after every paragraph (or sentence)? Most of us have suffered one of those “holy f**k” moments at some point in our career. And now we do know better.
How to Lose Your Data
There are lots of different ways to lose data. Most of us know to “save early and often” when working on a document because we know that we can’t back up what’s not even on the disk. But when it comes to actual disk crashes (or worse), we become complacent. This is certainly true for me. I had a hard disk crash in 1997 and lost some things that were important to me. For the next few months, I did regular backups like some sort of data protection zealot. But I haven’t had a true crash since then—and my backup habits have gradually deteriorated, as I slowly regained my confidence in the reliability of my hard drives.
After all, I’ve read that typical hard drives have an MTBF (Mean Time Between Failures) of 1,000,000 hours. That works out to 114 years, so I should be okay, right?
No. MTBF numbers for drives don’t mean that your hard drive is guaranteed (or even expected) to run for many years before encountering an error. Your MTBF number might be 30 years, but if the service life of your drive is only five years, then you can expect failures on your drive to start becoming more frequent after five years. The 30 year MTBF means that, statistically, if you were running six drives for that five year period, one of the drives would see a failure at the end of the five years. In other words, you saw a failure after 30 drive-years—spread across all six drives. If we were running 30 drives at the same time, we’d expect our first failure on one of those drives after the first year. (Click here for more information on MTBF).
In point of fact, your drive might fail the first year. Or the first day.
And hard drive crashes aren’t the only, or even the most common, type of data loss. A recent PC World story refers to a study saying that over 300,000 laptops are lost each year from major U.S. airports and not reclaimed. What about power outages? Applications that crash and corrupt the file that they were working with? (Excel did this to me once). Flood/fire/earthquake? Or just plain stupidity? (Delete is right next to Rename in the Windows Explorer context menu).
A Good Backup Plan
So we’re back to where we started. You definitely need a backup plan. And you need something better than the default plans listed above.
You need a backup plan that:
- Runs automatically, without your having to remember to do something
- Runs often enough to protect data that changes frequently
- Copies things not just off-disk, or off-computer, but off-site
- Allows restoring lost data in a reasonably straightforward manner
- Secures your data, as well as backing it up (when appropriate)
- Allows access to old data even after you’ve intentionally deleted it from your PC
- Refreshes backed data regularly, or stores the data on media that will last a long time
The most important attribute of a good backup plan, by far, is that it is automated. When I was in college, I used to do weekly backups of my entire PC to a stack of floppies, and then haul the floppies to my parents’ house when I’d visit on Sunday. But when the last few weeks of the semester rolled around, I was typically so busy with papers and cramming that I didn’t have time to babysit a stack of floppies while doing backups. So I’d skip doing them for a few weeks—at the same time that I was creating a lot of important new school-related data.
How often should your data get backed up? The answer is–more frequently than the amount of time that you would not want to have to spend reproducing the data. Reentering a day’s worth of data into Quicken isn’t too painful. But reentering a full month’s worth probably is—so nightly backups make sense if you use Quicken every day. On the other hand, when I’m working on some important document that I’ve spent hours editing, I typically back the file up several times an hour. Losing 10-15 minutes’ worth of work is my pain point.
Off-site backups are important, but often overlooked. The more destructive the type of data loss, the farther away from the original the backup should be, to keep it safe. For an accidental fat-finger deletion, a copy in a different directory is sufficient. Hard drive crash? The file should be on a different drive. PC hit by a voltage spike? The file should be on a different machine. Fire or flood? You’d better have a copy at another location if you want to be able to restore it. The exercise is this—imagine all the bad things that might happen to your data and then decide where to put the data to keep it safe. If you live in San Francisco and you’re planning for the Big One of ’09, then don’t just store your backups at a buddy’s house down the street. Send the data to a family member in Chicago.
If you do lose data, you ought to be able to quickly: a) find the data that you lost and b) get that data back again. If you do full backups once a year to some arcane tape format and then do daily incremental backups, also to tape, how long will it take you to find and restore a clean copy of a single corrupted file? How long will it take you to completely restore an entire drive that went bad? Pay attention to the format of your backups and the processes and tools needed to get at your archives. It should be very easy to find and restore something when you need it.
How concerned are you with the idea of someone else gaining access to your data? When it comes to privacy, all data is not created equal. You likely wouldn’t care much if someone got a hold of your Mario Kart high scores. (In fact, some of you are apparently geeky enough to have already published them). On the other hand, you wouldn’t be too happy if someone got a copy of that text file where you store your credit card numbers and bank passwords. No matter how much you trust the tool vendor or service that you’re using for backups, you ought to encrypt any data that you wouldn’t want handed out at a local biker bar. Actually, this data should already be encrypted on your PC anyway—no matter how physically secure you think your PC is.
We might be tempted to think that the ideal backup plan would be to somehow have all of your data continuously replicated on a system located somewhere else. Whenever you create or change a file, the changes would be instantly replicated on the other system. Now you have a perfect replica of all your work, at another location, all of the time. The problem with this approach is that if you delete a file or directory and then later decide that you wanted it back, it’s too late. The file will have already been deleted from your backup server. So, while mirroring data is a good strategy in some cases, you should also have a way to take snapshots of your data and then to leave the snapshots untouched. (Take a look at the Wayback Machine at the Internet Archive for an example of data archival).
On the other hand, you don’t want to just archive data off to some medium and then never touch it again, expecting the media to last forever. If you moved precious family photos off of your hard disk and burned them to CDs, do you expect the data on the CDs to be there forever? Are you figuring that you’ll pass the stack of CDs on to your kids? A lot has been written about media longevity, but I’ve read that cheaply burned CDs and DVDs may last no longer than 12-24 months. You need a plan that re-archives your data periodically, to new media or even new types of media. And ideally, you are archiving multiple copies of everything to protect against problems with the media itself.
How Important Is This?
The critical question to ask yourself is–how precious is my data to me? Your answer will guide you in coming up with a backup plan that is as failsafe as you need it to be. Your most important data deserves to be obsessed over. You probably have thousands of family photos that exist only digitally. They should be backed up often, in multiple formats, to multiple locations. One of the best ways to protect data from loss is to disseminate it as widely as possible. So maybe in addition to multiple backups, your best bet is to print physical copies of these photos and send boxes of photos to family members in several different states.
The bottom line is that you need a backup plan that you’ve come up with deliberately and one that you are following all of the time. Your data is too important to trust to chance, or to a plan that depends on your remembering to do backups from time to time. A deliberate plan, coupled with a healthy amount of paranoia, is the best way to keep your data safe.
Next Time
In my next post, I’ll put together a list of various products and services that can help you with backups. And I’ll share my own backup plan (imperfect as it is).
> When you’re working in Microsoft Word, how often do you press Ctrl-S?
Oh man.. I press this all the time, compulsively, in so many apps. It’s just burned into my finger memory at this point.
Pingback: A Five-Part Backup Strategy « Sean’s Stuff