A Five-Part Backup Strategy

In the past few posts, I surveyed some common backup tools.  Next, I thought I’d describe my backup strategy in some detail, talking about what I do to keep my data safe.

My own backup strategy is far from perfect.  There may be a few missing pieces, but I think that I’ve protected myself against most of the data loss scenarios that I can think of.  Most of us have a backup strategy based on whatever “holy ****” data loss adventure we’ve suffered in the past.   That’s certainly true for me, so I tend be pretty pessimistic when it comes to protecting my data.

I have a total of four PCs at home, including three desktops and a laptop.  And I currently have a total of about 3TB of disk space, with about 1.2TB currently in use—a respectable amount of data.

As I said in my original post on why you need a backup plan, the critical question to ask is—how precious is my data to me?  Of this 1.2TB, some of it is very precious to me, like family photos and videos.  And some of it is not at all important, like the 625MB footprint of an Office 2007 installation.  Thinking about how important my data is will drive decisions about how I structure my backups.

The main goal is to figure out how to best protect all of this data.  Questions to ask include:

  • What data needs to be backed up?
  • Where to back the data up?  Different drive?  PC?  Offsite?
  • How often?
  • Should the backup always be a mirror of original?  Or archive—capture moment in time?
  • How long should the backup sets be kept?

Before we even think about backups, it’s worth doing some preventative maintenance on your hard drives.  I’ve had good luck using SpinRite 6, from grc.com, to do surface defect detection.  It’s obviously far better to avoid defects in the first place than to have to deal with a bad drive.

Below are the five pieces of my current backup strategy.  Each serves a slightly different purpose and protects my data in a different way.

  • LiveMesh to Mirror Data Between PCs at Home
  • JungleDisk / Amazon S3 to do Continuous Backups to the “Cloud”
  • Quarterly Archival Backups to External Drive
  • Encrypt / Mirror Sensitive Data on USB Thumb Drives
  • Keep Extra Copies of All Installation Media

LiveMesh to Mirror Data Between PCs at Home

I talked a little bit last time about using LiveMesh to synchronize data between multiple PCs at the same site.  Although LiveMesh provides limited storage space (5GB) in “the cloud”, you can ignore that part and use it to synchronize an unlimited amount of data in a peer-to-peer manner.

This is my first line of defense in protecting my data.  On each of my PCs, I identify the main top-level directories that contain important data and then add those folders to my “mesh”.  Once a folder is visible to LiveMesh, you can synchronize it with any of your other PCs that are also running LiveMesh.  In my case, my important data will be replicated on two of my three main desktop PCs at home.

The two main purposes of using LiveMesh are to provide local copies on multiple PCs and to protect data against hard drive crash or system failure.

Because your folders are synchronized across multiple machines, and because LiveMesh supports two-way synchronization, you can edit/change files locally at whichever machine you happen to be sitting at.  LiveMesh will immediately synch the changes back to the other device.  This is basically just a different way of sharing files on the network, rather than creating a network share.  It’s a little easier for your applications to access a local copy of the file than to have to reach across the network to get it.

LiveMesh also protects against hard drive failure, in that you have a second copy of your data on another machine.  If a hard drive dies, you can swap in a new drive and just let LiveMesh repopulate the missing files from the other mirror.

There is one important thing that LiveMesh does not protect against, which is—unintended deletion or modification of a file.  Because LiveMesh is doing continuous (or very quick) updates to your other devices, the fact that you deleted a file will get replicated across your devices and the file will be quickly deleted from all of your devices.

The only real way to protect against this would be for LiveMesh to have full support for versioning.  So far, versioning is not part of the tool.

JungleDisk / Amazon S3 to do Continuous Backups to the “Cloud”

My next line of defense is to use JungleDisk and Amazon’s S3 (Simple Storage Service) to regularly back up my important data to the Internet (the “cloud”).

JungleDisk, or a similar tool, is required because S3 is just a service that you subscribe to for storing your files—it provides no user interface for accessing the files and no client-side tool for doing the backups.

Amazon S3 charges you only for the data that you use, as follows:

  • $0.15/GB/month for data storage
  • $0.20/GB/month for data transfer

Because you pay separately for storage vs. transfer, you’ll end up paying a bit more in the first month or two, as you back everything up.  After that, your costs will be mainly for storage, since you’ll be uploading only data that has changed.

The JungleDisk/S3 combination provides a couple of key benefits beyond what LiveMesh gives me:

  • JungleDisk keeps old versions of modified/deleted files on S3  (for 60 days)
  • S3 provides an off-site backup location for your data

Because JungleDisk is configured to keep old versions of your files for 60 days, you’re protected against inadvertent deletion or modification of a file.

Most importantly, because you’re backing your data up to “the cloud”, you’re protected against any catastrophe that might occur at home.  (But make sure to store your Amazon S3 access key and password in a different location)!

It’s worth mentioning that if you intend to archive your old e-mail, it’s a good idea to break your e-mail files into several pieces, based on age.  You might have one smaller file containing just data from the past year and then older files, one per year.  That way, you’re backing up less data on a regular basis, because JungleDisk only backs up the data files that change.

Quarterly Archival Backups to External Drive

Both LiveMesh and S3 are focused on creating mirrored copies of my data.  But there is still the danger that I inadvertently delete some data, or the data becomes corrupt, and then that deletion or corruption is propagated to my mirror.

To protect against this, it’s also important to do periodic archival backups of important data and then to store those archives at an offsite location.  This gives you a copy of your data at a particular moment in time that you then keep indefinitely.

In my case, I do archival backups as follows:

  • Quarterly archival backups
  • I use Genie Backup Manager Pro 8.0
  • I back my data up to an external USB Western Digital My Book drive (750GB)
  • I store my WD drive at work (offsite) after I’ve backed up my home data
  • I always have two copies of everything that I back up to the WD drive (two most recent archives)
  • Archives are a superset of what I back up with LiveMesh and S3

This isn’t quite ideal.  I’m not really living up to my goal of keeping my archived data permanently.  I have a rotation scheme where I keep the two most recent quarterly archives, which means that I could lose data if I delete something and then decide six months later that I really needed it.  But I use this rotation scheme because it would be too expensive to keep every archive.

My archives include a superset of the data that I back up with LiveMesh and S3.  In addition to archive what I back up with those tools, I archive data that rarely changes, like ripped CDs (.mp3 files) and home videos.  This is data that’s important enough to archive, but not worth backing up regularly, given that it never changes.

Encrypt / Mirror Sensitive Data on USB Thumb Drives

I also have data that should be encrypted, as well as backed up.  We all probably have a file or two where we keep track of all the really important stuff—online passwords, bank account numbers, personal financial data, etc.  Ideally, we’d write none of it down.  But there is so much important data that we need to keep track of, it’s no longer possible to keep it all in our heads.

My approach for securing this data and for keeping it safe is:

  • Two USB thumb drives, one kept at work, one at home
  • Both USB drives fully encrypted using TrueCrypt
  • USB drives normally unmounted, unless I need to read data from them
  • Unmount as soon as I’m done using the drive
  • To change data on a stick, I make the change on one drive, then bring the drive to the other location and synch up
  • As part of quarterly archive, also archive entire encrypted image

This is handy because I can carry one of the thumb drives with me wherever I go, with no fear of what would happen if I lost it.  The data is completely encrypted, so safe from prying eyes.  Having two drives ensures that the data is being backed up.  Because I don’t entirely trust the flash media, I also keep a copy in my quarterly archives.

Keep Extra Copies of All Installation Media

With all of the strategies described above, I’m backing up only data—never actual programs.  I figure that if I have a major crash, I’ll just reinstall the software that I need.

But I do need to make sure that I don’t lose my original media.  If I stored everything in one spot at home and had a fire, my data would be okay, but I’d lose all of the software.

In my case, I just duplicate the original media and then store a copy at work.  I don’t bother duplicating Microsoft software, because I have an MSDN Universal subscription, so I’d be able to re-download anything that I lost.

Summary

That’s my basic backup strategy—a combination of techniques and tools that gives me a fair degree of confidence that I could get data back without too much trouble if I lost it.

Windows Backup Products, part 2 – Imaging, Synchronization, Online

Last time I posted a list of the most popular file/folder backup tools.  This time, I’ll look at Windows backup tools that fall into the categories: drive imaging, file/folder synchronization, and online storage.

NOTE: This post is just a survey of available tools, rather than a review.  I’ve used some, but not all, of the tools listed.

Backing up your files and folders should be just a part of your overall backup strategy, but not the entire strategy.  A complete approach would likely include some use of full system backups (imaging), as well as synchronization and online backups.

The tools that I mentioned last time are good for:

  • Automating your backups
  • Getting your files backed up to another PC, via network device
  • Backing files up efficiently, by doing a combination of full/incremental backups
  • Creating “snapshots” of files at a specific point in time

What these traditional tools are not necessarily as good at doing is:

  • Getting your files backed up to an off-site location
  • Sharing files/folders with other devices
  • Allowing you to browse files in original directory structure
  • Backing up your Windows system files
  • Backing up and restoring an entire PC

The tools in these other categories (imaging, synchronization, and online backup) address some of the shortcomings of traditional file/folder backup tools.

Drive Imaging Tools

In addition to periodically backing up your data files, you should consider doing a full disk backup, or image backup.  Traditional file/folder backup tools typically don’t support backing up an entire disk or partition.

For drive imaging software, I took a brief look at the following products:

These products are all very similar, but there are a few differences.  My list of available features is based on the documentation on each product’s web site.

Drive Imaging Tools

Drive Imaging Tools

Synchronization Tools

The goal of synchronization tools isn’t to create a backup of a directory, but to create a copy of that directory on other devices.  Typically, one PC shares one or more directories, making them visible to the tool or service.  Other devices subscribe to the shared folder and then  replicate the contents locally.

What makes synchronization tools so powerful is their ability to do continuous/live updates.  When someone changes a file in a shared folder, that change is replicated across all of the subscribing PCs immediately.

This gives us the benefits of both shared network drives and remote backups—users on other machines have access to the data at all times and can edit it from their machine.  And the data is also backed up, in that it’s stored in multiple locations.

Desirable features to look for in file synchronization tools include things like:

  • Continuous Updates:  no need to synch manually
  • Multiple Subscribers:  synchronize across multiple devices
  • 2-Way Synchronization:  users can change files in any location
  • Share Across HTTP:  PCs don’t need to be on LAN, but can share via Internet
  • Encryption:  data transferred via HTTP in secure manner
  • Backup to Cloud:  store copy of synched files online

The chart below includes the following synchronization tools and a list of features:

Synchronization Tools

Synchronization Tools

Traditional synchronization tools worked only with devices that were directly networked on a LAN.  But modern synchronization tools are more commonly delivered as a web-based services that synchronize machines via HTTP.  A PC shares a folder to the service, causing the files to get replicated in “the cloud”.  And then other devices can in turn sync to the same folder, allowing the files to get downloaded to the subscribing device.

This “cloud” approach allows doing online backups in addition to synchronizing files across devices.  This is a nice blending of traditional synchronization tools with online backup tools.

Microsoft’s new LiveMesh platform offers maybe the best combination of features spanning both synchronization and online backup.  For each folder added to the mesh, the user can choose exactly which devices to synch the contents to—including both physical devices in the mesh, as well as the online storage area.  This allows doing peer-to-peer synchronization for some data, and online backup for other data.

There are many more network-only synchronization tools available than I list in this chart.  Given the power of the newer tools that also provide online backup, these older tools are becoming less popular.

Online Backup Tools / Services

There are also services that offer pure online backup of data, rather than both synchronization and backup.  The chart below lists some of the more common ones, including:

Online Backup Services

Online Backup Services

With easy access to highspeed Internet access these days, it’s clear that online backup, rather than network-only backup, is the preferred choice for most people.  And with storage prices continuing to drop, these services are becoming affordable, even for storing huge amounts of data, like photos & videos.

The future for these products is likely something like the LiveMesh model.  This approach (once LiveMesh provides larger amounts of online storage) is:

  • Continuous online backups
  • Automatic synchronizing of data to multiple devices
  • Ability to do both synchronizing (exact mirrors) and archival (backup at a point in time)

Next Time

At the moment, I’m personally using a combination of LiveMesh and JungleDisk for my backups.  Next time, I’ll describe how I use these tools.