In the past few posts, I surveyed some common backup tools. Next, I thought I’d describe my backup strategy in some detail, talking about what I do to keep my data safe.
My own backup strategy is far from perfect. There may be a few missing pieces, but I think that I’ve protected myself against most of the data loss scenarios that I can think of. Most of us have a backup strategy based on whatever “holy ****” data loss adventure we’ve suffered in the past. That’s certainly true for me, so I tend be pretty pessimistic when it comes to protecting my data.
I have a total of four PCs at home, including three desktops and a laptop. And I currently have a total of about 3TB of disk space, with about 1.2TB currently in use—a respectable amount of data.
As I said in my original post on why you need a backup plan, the critical question to ask is—how precious is my data to me? Of this 1.2TB, some of it is very precious to me, like family photos and videos. And some of it is not at all important, like the 625MB footprint of an Office 2007 installation. Thinking about how important my data is will drive decisions about how I structure my backups.
The main goal is to figure out how to best protect all of this data. Questions to ask include:
- What data needs to be backed up?
- Where to back the data up? Different drive? PC? Offsite?
- How often?
- Should the backup always be a mirror of original? Or archive—capture moment in time?
- How long should the backup sets be kept?
Before we even think about backups, it’s worth doing some preventative maintenance on your hard drives. I’ve had good luck using SpinRite 6, from grc.com, to do surface defect detection. It’s obviously far better to avoid defects in the first place than to have to deal with a bad drive.
Below are the five pieces of my current backup strategy. Each serves a slightly different purpose and protects my data in a different way.
- LiveMesh to Mirror Data Between PCs at Home
- JungleDisk / Amazon S3 to do Continuous Backups to the “Cloud”
- Quarterly Archival Backups to External Drive
- Encrypt / Mirror Sensitive Data on USB Thumb Drives
- Keep Extra Copies of All Installation Media
LiveMesh to Mirror Data Between PCs at Home
I talked a little bit last time about using LiveMesh to synchronize data between multiple PCs at the same site. Although LiveMesh provides limited storage space (5GB) in “the cloud”, you can ignore that part and use it to synchronize an unlimited amount of data in a peer-to-peer manner.
This is my first line of defense in protecting my data. On each of my PCs, I identify the main top-level directories that contain important data and then add those folders to my “mesh”. Once a folder is visible to LiveMesh, you can synchronize it with any of your other PCs that are also running LiveMesh. In my case, my important data will be replicated on two of my three main desktop PCs at home.
The two main purposes of using LiveMesh are to provide local copies on multiple PCs and to protect data against hard drive crash or system failure.
Because your folders are synchronized across multiple machines, and because LiveMesh supports two-way synchronization, you can edit/change files locally at whichever machine you happen to be sitting at. LiveMesh will immediately synch the changes back to the other device. This is basically just a different way of sharing files on the network, rather than creating a network share. It’s a little easier for your applications to access a local copy of the file than to have to reach across the network to get it.
LiveMesh also protects against hard drive failure, in that you have a second copy of your data on another machine. If a hard drive dies, you can swap in a new drive and just let LiveMesh repopulate the missing files from the other mirror.
There is one important thing that LiveMesh does not protect against, which is—unintended deletion or modification of a file. Because LiveMesh is doing continuous (or very quick) updates to your other devices, the fact that you deleted a file will get replicated across your devices and the file will be quickly deleted from all of your devices.
The only real way to protect against this would be for LiveMesh to have full support for versioning. So far, versioning is not part of the tool.
JungleDisk / Amazon S3 to do Continuous Backups to the “Cloud”
My next line of defense is to use JungleDisk and Amazon’s S3 (Simple Storage Service) to regularly back up my important data to the Internet (the “cloud”).
JungleDisk, or a similar tool, is required because S3 is just a service that you subscribe to for storing your files—it provides no user interface for accessing the files and no client-side tool for doing the backups.
Amazon S3 charges you only for the data that you use, as follows:
- $0.15/GB/month for data storage
- $0.20/GB/month for data transfer
Because you pay separately for storage vs. transfer, you’ll end up paying a bit more in the first month or two, as you back everything up. After that, your costs will be mainly for storage, since you’ll be uploading only data that has changed.
The JungleDisk/S3 combination provides a couple of key benefits beyond what LiveMesh gives me:
- JungleDisk keeps old versions of modified/deleted files on S3 (for 60 days)
- S3 provides an off-site backup location for your data
Because JungleDisk is configured to keep old versions of your files for 60 days, you’re protected against inadvertent deletion or modification of a file.
Most importantly, because you’re backing your data up to “the cloud”, you’re protected against any catastrophe that might occur at home. (But make sure to store your Amazon S3 access key and password in a different location)!
It’s worth mentioning that if you intend to archive your old e-mail, it’s a good idea to break your e-mail files into several pieces, based on age. You might have one smaller file containing just data from the past year and then older files, one per year. That way, you’re backing up less data on a regular basis, because JungleDisk only backs up the data files that change.
Quarterly Archival Backups to External Drive
Both LiveMesh and S3 are focused on creating mirrored copies of my data. But there is still the danger that I inadvertently delete some data, or the data becomes corrupt, and then that deletion or corruption is propagated to my mirror.
To protect against this, it’s also important to do periodic archival backups of important data and then to store those archives at an offsite location. This gives you a copy of your data at a particular moment in time that you then keep indefinitely.
In my case, I do archival backups as follows:
- Quarterly archival backups
- I use Genie Backup Manager Pro 8.0
- I back my data up to an external USB Western Digital My Book drive (750GB)
- I store my WD drive at work (offsite) after I’ve backed up my home data
- I always have two copies of everything that I back up to the WD drive (two most recent archives)
- Archives are a superset of what I back up with LiveMesh and S3
This isn’t quite ideal. I’m not really living up to my goal of keeping my archived data permanently. I have a rotation scheme where I keep the two most recent quarterly archives, which means that I could lose data if I delete something and then decide six months later that I really needed it. But I use this rotation scheme because it would be too expensive to keep every archive.
My archives include a superset of the data that I back up with LiveMesh and S3. In addition to archive what I back up with those tools, I archive data that rarely changes, like ripped CDs (.mp3 files) and home videos. This is data that’s important enough to archive, but not worth backing up regularly, given that it never changes.
Encrypt / Mirror Sensitive Data on USB Thumb Drives
I also have data that should be encrypted, as well as backed up. We all probably have a file or two where we keep track of all the really important stuff—online passwords, bank account numbers, personal financial data, etc. Ideally, we’d write none of it down. But there is so much important data that we need to keep track of, it’s no longer possible to keep it all in our heads.
My approach for securing this data and for keeping it safe is:
- Two USB thumb drives, one kept at work, one at home
- Both USB drives fully encrypted using TrueCrypt
- USB drives normally unmounted, unless I need to read data from them
- Unmount as soon as I’m done using the drive
- To change data on a stick, I make the change on one drive, then bring the drive to the other location and synch up
- As part of quarterly archive, also archive entire encrypted image
This is handy because I can carry one of the thumb drives with me wherever I go, with no fear of what would happen if I lost it. The data is completely encrypted, so safe from prying eyes. Having two drives ensures that the data is being backed up. Because I don’t entirely trust the flash media, I also keep a copy in my quarterly archives.
Keep Extra Copies of All Installation Media
With all of the strategies described above, I’m backing up only data—never actual programs. I figure that if I have a major crash, I’ll just reinstall the software that I need.
But I do need to make sure that I don’t lose my original media. If I stored everything in one spot at home and had a fire, my data would be okay, but I’d lose all of the software.
In my case, I just duplicate the original media and then store a copy at work. I don’t bother duplicating Microsoft software, because I have an MSDN Universal subscription, so I’d be able to re-download anything that I lost.
Summary
That’s my basic backup strategy—a combination of techniques and tools that gives me a fair degree of confidence that I could get data back without too much trouble if I lost it.
I just ran across your great post. Perhaps, you could answer the question I posted at http://social.microsoft.com/Forums/en-US/LiveMesh/thread/69ce82ab-bbdc-49c1-9d28-00b3b66dcbce
I’m trying to get a handle on what this would look like for a much smaller backup/sync job.