A Five-Part Backup Strategy

In the past few posts, I surveyed some common backup tools.  Next, I thought I’d describe my backup strategy in some detail, talking about what I do to keep my data safe.

My own backup strategy is far from perfect.  There may be a few missing pieces, but I think that I’ve protected myself against most of the data loss scenarios that I can think of.  Most of us have a backup strategy based on whatever “holy ****” data loss adventure we’ve suffered in the past.   That’s certainly true for me, so I tend be pretty pessimistic when it comes to protecting my data.

I have a total of four PCs at home, including three desktops and a laptop.  And I currently have a total of about 3TB of disk space, with about 1.2TB currently in use—a respectable amount of data.

As I said in my original post on why you need a backup plan, the critical question to ask is—how precious is my data to me?  Of this 1.2TB, some of it is very precious to me, like family photos and videos.  And some of it is not at all important, like the 625MB footprint of an Office 2007 installation.  Thinking about how important my data is will drive decisions about how I structure my backups.

The main goal is to figure out how to best protect all of this data.  Questions to ask include:

  • What data needs to be backed up?
  • Where to back the data up?  Different drive?  PC?  Offsite?
  • How often?
  • Should the backup always be a mirror of original?  Or archive—capture moment in time?
  • How long should the backup sets be kept?

Before we even think about backups, it’s worth doing some preventative maintenance on your hard drives.  I’ve had good luck using SpinRite 6, from grc.com, to do surface defect detection.  It’s obviously far better to avoid defects in the first place than to have to deal with a bad drive.

Below are the five pieces of my current backup strategy.  Each serves a slightly different purpose and protects my data in a different way.

  • LiveMesh to Mirror Data Between PCs at Home
  • JungleDisk / Amazon S3 to do Continuous Backups to the “Cloud”
  • Quarterly Archival Backups to External Drive
  • Encrypt / Mirror Sensitive Data on USB Thumb Drives
  • Keep Extra Copies of All Installation Media

LiveMesh to Mirror Data Between PCs at Home

I talked a little bit last time about using LiveMesh to synchronize data between multiple PCs at the same site.  Although LiveMesh provides limited storage space (5GB) in “the cloud”, you can ignore that part and use it to synchronize an unlimited amount of data in a peer-to-peer manner.

This is my first line of defense in protecting my data.  On each of my PCs, I identify the main top-level directories that contain important data and then add those folders to my “mesh”.  Once a folder is visible to LiveMesh, you can synchronize it with any of your other PCs that are also running LiveMesh.  In my case, my important data will be replicated on two of my three main desktop PCs at home.

The two main purposes of using LiveMesh are to provide local copies on multiple PCs and to protect data against hard drive crash or system failure.

Because your folders are synchronized across multiple machines, and because LiveMesh supports two-way synchronization, you can edit/change files locally at whichever machine you happen to be sitting at.  LiveMesh will immediately synch the changes back to the other device.  This is basically just a different way of sharing files on the network, rather than creating a network share.  It’s a little easier for your applications to access a local copy of the file than to have to reach across the network to get it.

LiveMesh also protects against hard drive failure, in that you have a second copy of your data on another machine.  If a hard drive dies, you can swap in a new drive and just let LiveMesh repopulate the missing files from the other mirror.

There is one important thing that LiveMesh does not protect against, which is—unintended deletion or modification of a file.  Because LiveMesh is doing continuous (or very quick) updates to your other devices, the fact that you deleted a file will get replicated across your devices and the file will be quickly deleted from all of your devices.

The only real way to protect against this would be for LiveMesh to have full support for versioning.  So far, versioning is not part of the tool.

JungleDisk / Amazon S3 to do Continuous Backups to the “Cloud”

My next line of defense is to use JungleDisk and Amazon’s S3 (Simple Storage Service) to regularly back up my important data to the Internet (the “cloud”).

JungleDisk, or a similar tool, is required because S3 is just a service that you subscribe to for storing your files—it provides no user interface for accessing the files and no client-side tool for doing the backups.

Amazon S3 charges you only for the data that you use, as follows:

  • $0.15/GB/month for data storage
  • $0.20/GB/month for data transfer

Because you pay separately for storage vs. transfer, you’ll end up paying a bit more in the first month or two, as you back everything up.  After that, your costs will be mainly for storage, since you’ll be uploading only data that has changed.

The JungleDisk/S3 combination provides a couple of key benefits beyond what LiveMesh gives me:

  • JungleDisk keeps old versions of modified/deleted files on S3  (for 60 days)
  • S3 provides an off-site backup location for your data

Because JungleDisk is configured to keep old versions of your files for 60 days, you’re protected against inadvertent deletion or modification of a file.

Most importantly, because you’re backing your data up to “the cloud”, you’re protected against any catastrophe that might occur at home.  (But make sure to store your Amazon S3 access key and password in a different location)!

It’s worth mentioning that if you intend to archive your old e-mail, it’s a good idea to break your e-mail files into several pieces, based on age.  You might have one smaller file containing just data from the past year and then older files, one per year.  That way, you’re backing up less data on a regular basis, because JungleDisk only backs up the data files that change.

Quarterly Archival Backups to External Drive

Both LiveMesh and S3 are focused on creating mirrored copies of my data.  But there is still the danger that I inadvertently delete some data, or the data becomes corrupt, and then that deletion or corruption is propagated to my mirror.

To protect against this, it’s also important to do periodic archival backups of important data and then to store those archives at an offsite location.  This gives you a copy of your data at a particular moment in time that you then keep indefinitely.

In my case, I do archival backups as follows:

  • Quarterly archival backups
  • I use Genie Backup Manager Pro 8.0
  • I back my data up to an external USB Western Digital My Book drive (750GB)
  • I store my WD drive at work (offsite) after I’ve backed up my home data
  • I always have two copies of everything that I back up to the WD drive (two most recent archives)
  • Archives are a superset of what I back up with LiveMesh and S3

This isn’t quite ideal.  I’m not really living up to my goal of keeping my archived data permanently.  I have a rotation scheme where I keep the two most recent quarterly archives, which means that I could lose data if I delete something and then decide six months later that I really needed it.  But I use this rotation scheme because it would be too expensive to keep every archive.

My archives include a superset of the data that I back up with LiveMesh and S3.  In addition to archive what I back up with those tools, I archive data that rarely changes, like ripped CDs (.mp3 files) and home videos.  This is data that’s important enough to archive, but not worth backing up regularly, given that it never changes.

Encrypt / Mirror Sensitive Data on USB Thumb Drives

I also have data that should be encrypted, as well as backed up.  We all probably have a file or two where we keep track of all the really important stuff—online passwords, bank account numbers, personal financial data, etc.  Ideally, we’d write none of it down.  But there is so much important data that we need to keep track of, it’s no longer possible to keep it all in our heads.

My approach for securing this data and for keeping it safe is:

  • Two USB thumb drives, one kept at work, one at home
  • Both USB drives fully encrypted using TrueCrypt
  • USB drives normally unmounted, unless I need to read data from them
  • Unmount as soon as I’m done using the drive
  • To change data on a stick, I make the change on one drive, then bring the drive to the other location and synch up
  • As part of quarterly archive, also archive entire encrypted image

This is handy because I can carry one of the thumb drives with me wherever I go, with no fear of what would happen if I lost it.  The data is completely encrypted, so safe from prying eyes.  Having two drives ensures that the data is being backed up.  Because I don’t entirely trust the flash media, I also keep a copy in my quarterly archives.

Keep Extra Copies of All Installation Media

With all of the strategies described above, I’m backing up only data—never actual programs.  I figure that if I have a major crash, I’ll just reinstall the software that I need.

But I do need to make sure that I don’t lose my original media.  If I stored everything in one spot at home and had a fire, my data would be okay, but I’d lose all of the software.

In my case, I just duplicate the original media and then store a copy at work.  I don’t bother duplicating Microsoft software, because I have an MSDN Universal subscription, so I’d be able to re-download anything that I lost.

Summary

That’s my basic backup strategy—a combination of techniques and tools that gives me a fair degree of confidence that I could get data back without too much trouble if I lost it.

Windows Backup Products, part 2 – Imaging, Synchronization, Online

Last time I posted a list of the most popular file/folder backup tools.  This time, I’ll look at Windows backup tools that fall into the categories: drive imaging, file/folder synchronization, and online storage.

NOTE: This post is just a survey of available tools, rather than a review.  I’ve used some, but not all, of the tools listed.

Backing up your files and folders should be just a part of your overall backup strategy, but not the entire strategy.  A complete approach would likely include some use of full system backups (imaging), as well as synchronization and online backups.

The tools that I mentioned last time are good for:

  • Automating your backups
  • Getting your files backed up to another PC, via network device
  • Backing files up efficiently, by doing a combination of full/incremental backups
  • Creating “snapshots” of files at a specific point in time

What these traditional tools are not necessarily as good at doing is:

  • Getting your files backed up to an off-site location
  • Sharing files/folders with other devices
  • Allowing you to browse files in original directory structure
  • Backing up your Windows system files
  • Backing up and restoring an entire PC

The tools in these other categories (imaging, synchronization, and online backup) address some of the shortcomings of traditional file/folder backup tools.

Drive Imaging Tools

In addition to periodically backing up your data files, you should consider doing a full disk backup, or image backup.  Traditional file/folder backup tools typically don’t support backing up an entire disk or partition.

For drive imaging software, I took a brief look at the following products:

These products are all very similar, but there are a few differences.  My list of available features is based on the documentation on each product’s web site.

Drive Imaging Tools

Drive Imaging Tools

Synchronization Tools

The goal of synchronization tools isn’t to create a backup of a directory, but to create a copy of that directory on other devices.  Typically, one PC shares one or more directories, making them visible to the tool or service.  Other devices subscribe to the shared folder and then  replicate the contents locally.

What makes synchronization tools so powerful is their ability to do continuous/live updates.  When someone changes a file in a shared folder, that change is replicated across all of the subscribing PCs immediately.

This gives us the benefits of both shared network drives and remote backups—users on other machines have access to the data at all times and can edit it from their machine.  And the data is also backed up, in that it’s stored in multiple locations.

Desirable features to look for in file synchronization tools include things like:

  • Continuous Updates:  no need to synch manually
  • Multiple Subscribers:  synchronize across multiple devices
  • 2-Way Synchronization:  users can change files in any location
  • Share Across HTTP:  PCs don’t need to be on LAN, but can share via Internet
  • Encryption:  data transferred via HTTP in secure manner
  • Backup to Cloud:  store copy of synched files online

The chart below includes the following synchronization tools and a list of features:

Synchronization Tools

Synchronization Tools

Traditional synchronization tools worked only with devices that were directly networked on a LAN.  But modern synchronization tools are more commonly delivered as a web-based services that synchronize machines via HTTP.  A PC shares a folder to the service, causing the files to get replicated in “the cloud”.  And then other devices can in turn sync to the same folder, allowing the files to get downloaded to the subscribing device.

This “cloud” approach allows doing online backups in addition to synchronizing files across devices.  This is a nice blending of traditional synchronization tools with online backup tools.

Microsoft’s new LiveMesh platform offers maybe the best combination of features spanning both synchronization and online backup.  For each folder added to the mesh, the user can choose exactly which devices to synch the contents to—including both physical devices in the mesh, as well as the online storage area.  This allows doing peer-to-peer synchronization for some data, and online backup for other data.

There are many more network-only synchronization tools available than I list in this chart.  Given the power of the newer tools that also provide online backup, these older tools are becoming less popular.

Online Backup Tools / Services

There are also services that offer pure online backup of data, rather than both synchronization and backup.  The chart below lists some of the more common ones, including:

Online Backup Services

Online Backup Services

With easy access to highspeed Internet access these days, it’s clear that online backup, rather than network-only backup, is the preferred choice for most people.  And with storage prices continuing to drop, these services are becoming affordable, even for storing huge amounts of data, like photos & videos.

The future for these products is likely something like the LiveMesh model.  This approach (once LiveMesh provides larger amounts of online storage) is:

  • Continuous online backups
  • Automatic synchronizing of data to multiple devices
  • Ability to do both synchronizing (exact mirrors) and archival (backup at a point in time)

Next Time

At the moment, I’m personally using a combination of LiveMesh and JungleDisk for my backups.  Next time, I’ll describe how I use these tools.

Windows Backup Products, part 1 – File/Folder Backup Tools

Here is a quick summary of the most popular backup tools for Windows.  In general, there are several different flavors/families of backup tools:

  • Traditional file/folder backup tools
  • File/directory synchronization tools
  • Drive imaging tools
  • Online backup tools/services

In this post, I’m focusing on just the first group—traditional tools that let you select a group of files or folders to backup, set up an automated schedule, and then regularly back your files up to a local or network drive.

This list is by no means complete.  I’m focusing here only on tools for Windows and looking only at the more popular commercial tools.  There are, obviously, lots of open source and freeware tools out there and some of them have feature sets that approach some of the commercial tools.

I looked only at tools targeted at home users, rather than the higher-end server-based backup tools, or tools targeted at the enterprise.

My goal here is to just give people a quick list of some of the tools and do a high-level feature-for-feature comparison.

Here are the tools that I include in the chart:

Several of these products offer one or more editions, with different pricing and feature sets.  Where this is the case, I’m only listing the “professional” edition, or the one with the most features (and highest price).

Here is the feature list for these backup tools.  My understanding of which features are provided comes from the product documentation or web site.  (Apologies that this is just an image, rather than formatted as a table in HTML):

Backing up individual files or folders is obviously just one prong of a complete backup strategy.  An important part of the strategy is also in determining where to back your files to—a second or external drive, network drive, or FTP server.  Backing up file sets, though not sufficient for a complete backup strategy, is a good place to start.

If you have a favorite full-featured commercial backup tool that I’ve missed, please feel free to share it in the comments section.

Next Time

Next time, I’ll finish the backup tool survey by talking about directory synchronization tools, drive/PC imaging tools and online backup services.

Why You Need a Backup Plan

Everyone has a backup plan.  Whether you have one that you follow carefully or whether you’ve never even thought about backups, you have a plan in place.  Whatever you are doing or not doing constitutes your backup plan.

I would propose that the three most common backup plans that people follow are:

  1. Remain completely ignorant of the need to back up files
  2. Vaguely know that you should back up your PC, but not really understand what this means
  3. Fully realize the dangers of going without backups and do occasional manual backups, but procrastinate coming up with a plan to do it regularly

Plan #1 is most commonly practiced by less technical folk—i.e. your parents, your brother-in-law, or your local pizza place.  These people can hardly be faulted.  The computer has always remembered everything that they’ve told it, so how could it actually lose something?  (Your pizza guy was unpleasantly reminded of this when his browser informed his wife that the “Tomato Sauce Babes” site was one of his favorite sites).  When these people lose something, they become angry and will likely never trust computers again.

Plan #2 is followed by people who used to follow plan #1, but graduated to plan #2 after accidentally deleting an important file and then blindly trying various things they didn’t understand—including emptying their Recycle Bin.  They now understand that bad things can happen.  (You can also qualify for advancement from plan #1 to #2 if you’ve ever done the following—spent hours editing a document, closed it without first saving, and then clicked No when asked “Do you want to save changes to your document”)?  Although this group understands the dangers of losing stuff, they don’t really know what they can do to protect their data.

Plan #3 is what most of us techies have used for many years.  We do occasional full backups of our system and we may even configure a backup tool to do regular automated backups to a network drive.  But we quickly become complacent and forget to check to see if the backups are still getting done.  Or we forget to add newly created directories to our backup configuration.  How many of us are confident that we have regular backups occurring until the day that we need to restore a file and discover nothing but a one line .log file in our backup directory that simply says “directory not found”?

Shame on us.  If we’ve been working in software development or IT for any length of time, bad things definitely have happened to us.  So we should know better.

Here’s a little test.  When you’re working in Microsoft Word, how often do you press Ctrl-S?  Only after you’ve been slaving away for two hours, writing the killer memo?  Or do you save after every paragraph (or sentence)?  Most of us have suffered one of those “holy f**k” moments at some point in our career.  And now we do know better.

How to Lose Your Data

There are lots of different ways to lose data.  Most of us know to “save early and often” when working on a document because we know that we can’t back up what’s not even on the disk.  But when it comes to actual disk crashes (or worse), we become complacent.  This is certainly true for me.  I had a hard disk crash in 1997 and lost some things that were important to me.  For the next few months, I did regular backups like some sort of data protection zealot.  But I haven’t had a true crash since then—and my backup habits have gradually deteriorated, as I slowly regained my confidence in the reliability of my hard drives.

After all, I’ve read that typical hard drives have an MTBF (Mean Time Between Failures) of 1,000,000 hours.  That works out to 114 years, so I should be okay, right?

No.  MTBF numbers for drives don’t mean that your hard drive is guaranteed (or even expected) to run for many years before encountering an error.  Your MTBF number might be 30 years, but if the service life of your drive is only five years, then you can expect failures on your drive to start becoming more frequent after five years.  The 30 year MTBF means that, statistically, if you were running six drives for that five year period, one of the drives would see a failure at the end of the five years.  In other words, you saw a failure after 30 drive-years—spread across all six drives.  If we were running 30 drives at the same time, we’d expect our first failure on one of those drives after the first year.  (Click here for more  information on MTBF).

In point of fact, your drive might fail the first year.  Or the first day.

And hard drive crashes aren’t the only, or even the most common, type of data loss.  A recent PC World story refers to a study saying that over 300,000 laptops are lost each year from major U.S. airports and not reclaimed.  What about power outages?  Applications that crash and corrupt the file that they were working with?  (Excel did this to me once).  Flood/fire/earthquake?  Or just plain stupidity?  (Delete is right next to Rename in the Windows Explorer context menu).

A Good Backup Plan

So we’re back to where we started.  You definitely need a backup plan.  And you need something better than the default plans listed above.

You need a backup plan that:

  • Runs automatically, without your having to remember to do something
  • Runs often enough to protect data that changes frequently
  • Copies things not just off-disk, or off-computer, but off-site
  • Allows restoring lost data in a reasonably straightforward manner
  • Secures your data, as well as backing it up (when appropriate)
  • Allows access to old data even after you’ve intentionally deleted it from your PC
  • Refreshes backed data regularly, or stores the data on media that will last a long time

The most important attribute of a good backup plan, by far, is that it is automated.  When I was in college, I used to do weekly backups of my entire PC to a stack of floppies, and then haul the floppies to my parents’ house when I’d visit on Sunday.  But when the last few weeks of the semester rolled around, I was typically so busy with papers and cramming that I didn’t have time to babysit a stack of floppies while doing backups.  So I’d skip doing them for a few weeks—at the same time that I was creating a lot of important new school-related data.

How often should your data get backed up?  The answer is–more frequently than the amount of time that you would not want to have to spend reproducing the data.  Reentering a day’s worth of data into Quicken isn’t too painful.  But reentering a full month’s worth probably is—so nightly backups make sense if you use Quicken every day.  On the other hand, when I’m working on some important document that I’ve spent hours editing, I typically back the file up several times an hour.  Losing 10-15 minutes’ worth of work is my pain point.

Off-site backups are important, but often overlooked.  The more destructive the type of data loss, the farther away from the original the backup should be, to keep it safe.  For an accidental fat-finger deletion, a copy in a different directory is sufficient.  Hard drive crash?  The file should be on a different drive.  PC hit by a voltage spike?  The file should be on a different machine.  Fire or flood?  You’d better have a copy at another location if you want to be able to restore it.  The exercise is this—imagine all the bad things that might happen to your data and then decide where to put the data to keep it safe.  If you live in San Francisco and you’re planning for the Big One of ’09, then don’t just store your backups at a buddy’s house down the street.  Send the data to a family member in Chicago.

If you do lose data, you ought to be able to quickly: a) find the data that you lost and b) get that data back again.  If you do full backups once a year to some arcane tape format and then do daily incremental backups, also to tape, how long will it take you to find and restore a clean copy of a single corrupted file?  How long will it take you to completely restore an entire drive that went bad?  Pay attention to the format of your backups and the processes and tools needed to get at your archives.  It should be very easy to find and restore something when you need it.

How concerned are you with the idea of someone else gaining access to your data?  When it comes to privacy, all data is not created equal.  You likely wouldn’t care much if someone got a hold of your Mario Kart high scores.  (In fact, some of you are apparently geeky enough to have already published them).  On the other hand, you wouldn’t be too happy if someone got a copy of that text file where you store your credit card numbers and bank passwords.  No matter how much you trust the tool vendor or service that you’re using for backups, you ought to encrypt any data that you wouldn’t want handed out at a local biker bar.  Actually, this data should already be encrypted on your PC anyway—no matter how physically secure you think your PC is.

We might be tempted to think that the ideal backup plan would be to somehow have all of your data continuously replicated on a system located somewhere else.  Whenever you create or change a file, the changes would be instantly replicated on the other system.  Now you have a perfect replica of all your work, at another location, all of the time.  The problem with this approach is that if you delete a file or directory and then later decide that you wanted it back, it’s too late.  The file will have already been deleted from your backup server.  So, while mirroring data is a good strategy in some cases, you should also have a way to take snapshots of your data and then to leave the snapshots untouched.  (Take a look at the Wayback Machine at the Internet Archive for an example of data archival).

On the other hand, you don’t want to just archive data off to some medium and then never touch it again, expecting the media to last forever.  If you moved precious family photos off of your hard disk and burned them to CDs, do you expect the data on the CDs to be there forever?  Are you figuring that you’ll pass the stack of CDs on to your kids?  A lot has been written about media longevity, but I’ve read that cheaply burned CDs and DVDs may last no longer than 12-24 months.  You need a plan that re-archives your data periodically, to new media or even new types of media.  And ideally, you are archiving multiple copies of everything to protect against problems with the media itself.

How Important Is This?

The critical question to ask yourself is–how precious is my data to me?  Your answer will guide you in coming up with a backup plan that is as failsafe as you need it to be.  Your most important data deserves to be obsessed over.  You probably have thousands of family photos that exist only digitally.  They should be backed up often, in multiple formats, to multiple locations.  One of the best ways to protect data from loss is to disseminate it as widely as possible.  So maybe in addition to multiple backups, your best bet is to print physical copies of these photos and send boxes of photos to family members in several different states.

The bottom line is that you need a backup plan that you’ve come up with deliberately and one that you are following all of the time.  Your data is too important to trust to chance, or to a plan that depends on your remembering to do backups from time to time.  A deliberate plan, coupled with a healthy amount of paranoia, is the best way to keep your data safe.

Next Time

In my next post, I’ll put together a list of various products and services that can help you with backups.  And I’ll share my own backup plan (imperfect as it is).