Jeff Atwood proclaimed Dec. 14 as “International Backup Awareness Day” after losing his blog. He later managed to recover it, partially through a miraculous backup on the computer of a random student from Bologna.
Most of us don’t pay attention to desktop backup. We burn our data on a bunch of DVDs or disk-on-keys and hope they will last. Sometimes we don’t even do that. This is probably so because we don’t have a cheap remote-storage solution, nor an easy to use backup utility.
Consumer backup offerings were poor up until several years ago, but in the age of cloud storage, they have become very much an affordable commodity. I’d like to review several existing backup solutions, and some future trends to watch for.
Network folder solutions take a folder on your computer and synchronize it with a remote storage facility. Thus, anything that goes into the folder could be considered backed up as soon as you go online.
The upside is that you know exactly what’s backed up and what’s not since the setup is so simple. Also, many of these services offer a file hosting service as a way to share some files with others.
The downside is that you might have to change the way you work: If you currently want to backup only part of a folder (e.g. The folder has some big subdirectories that aren’t important) then you have to start reorganizing.
For me, on my Linux machine, I would like to backup most of my “home” directory, but not all of it. So this is not the best solution.
A good vendor to start with is DropBox, which offers the first 2 gigabytes for free, and then a 50GB for $10 monthly. (Sorry, $9.99.)
A background backup solution is a piece of software (a daemon) running in the background and constantly sending new local files to remote storage.
The upside is that you don’t have to change anything on your computer. Also, these services usually do not limit you by storage capacity.
The downside is that you have to go through a setup process where you define what to back up, and what not to back up. This could lead to situations where you may forget a folder is not backed up. In addition, these services usually limit you to backing up a single machine, so you would need several purchases to backup several machines.
A good vendor to start with is mozy backup. They charge $5/month for unlimited backup of a single machine.
A hybrid vendor allows for a network drive alongside a background backup, so you can get the best of both worlds. An example is JungleDisk, which I am currently using, and is discussed in the conclusion to this article.
Dispersed / Decentralized Peer-to-peer Storage
So far I have discussed different interfaces for backup. There is another issue of where exactly is your data stored?. The traditional services discussed above all rely on some sort of centralized cloud storage. There is an alternative to that, though.
One of the most exciting applications to look for is dispersed, or decentralized storage. The concept, which has been successfully implemented by bittorrent downloads, is that no single machine “owns” the data, rather it is dispersed among many machines.
In such a solution, all of your files would be encrypted, cut into small pieces and sent out into the wild, where they would be stored on many different machines such that no single machine could potentially reconstruct the original data. Whoever designed this solution must of watched The Godfather for inspiration.
And yet, disperse storage is different from dispersed sharing (or downloading). When you share a torrent file, there are many others sharing it with you. A 100MB file might be shared by 10 users, meaning there is actually 1GB of storage available for this file on the web, and this is what allows you to quickly and reliably download it.
In the disperse storage scenario, no one has a copy of your files on their computer. They have slices, which means they have to make special room for them. If you’re storing 50GB, this means 50GB should be available out there for you. Are you willing to donate 50GB of your disk space in order to store 50GB remotely? Even if you are, that would mean that your data could only have one copy on the internet, which is far from being a good backup. You need several copies of your data online so that you could support error correction and not have a single-point-of-failure.
A “good” storage scheme would probably need a redundancy of at least 1:3, if not 1:10. This means that for every 1GB you store on the cloud, you would have to donate 3-10GB of your own storage. The cost of purchasing that extra storage is probably not worth it, since the price of backup solutions is cheaper, and they keep going down as storage cost goes down.
Still, there are already some services offering dispersed backup. The most prominent, targeted specifically at consumers, is Wuala, which allows you to either purchase storage, or trade your own storage for remote storage.
Do It Yourself (With Friends)
An interesting option for communities, friend networks and dispersed businesses, is to use a local distributed storage system. These solutions are usually free software, the most prominent being CleverSafe and Tahoe-LAFS. The concept is basically the same as that of dispersed/decentralized storage, except you could all share the costs of setting up dedicated machines for the backup. That way, you reduce the dependency upon local storage. Also, since this is your company, or a bunch of family and friends, you don’t need a lot of data redundancy, since you know these people would be there tomorrow. (That doesn’t mean you should have blind trust in their hard drives, though).
Edit: By readers’ advice, I’d also like to mention CrashPlan. It is a versatile utility that allows local, p2p and centralized storage backup. It only charges for the centralized storage.
As mentioned earlier, I’m currently using JungleDisk. It’s not a storage service per se, but rather a client, backing your data to a third-party storage cloud (Amazon’s S3, or Rackspace). They go at $3/month for the service itself, and then an extra charge proportional to the amount of cloud storage you are using (add around $0.15 per GB). They offer a multi-machine backup daemon, and a network drive.
I used to be especially happy with JungleDisk as they were using a respectable storage facility (Amazon S3), had really receptive support, Linux client, and their CEO was the designer of Rocket Arena, a famous Quake mod. However, recently they’ve done some changes I don’t like at all. In addition, they’d never go open source since it would knock them out of business. I now consider moving to SpiderOak, which are not open-source either, but leaning towards it. SpiderOak’s pricing is reasonable, and I’m approaching the point in which it would become an economical decision as well.
* Tarsnap was designed to be secure against even the most skilled attackers — and was written by someone (myself) with non-trivial expertise in cryptography and computer security.
* Because Tarsnap is built around tar(1), it is heavily scriptable; for experienced users this makes it far more flexible than any other tools.
* Tarsnap is AFAIK the only backup system which works as a metered service — pricing per byte of bandwidth and per byte-month of storage used, starting at a (very small) fraction of a cent per month. Where other services have fixed monthly fees, Tarsnap just looks at your usage and charges you accordingly.
* I’m not sure if Tarsnap’s snapshotting model is unique, but it’s certainly unusual; and once Tarsnap users follow my advice of “forget everything you know about incremental backups”, they all tell me that it’s far more intuitive than other approaches.
- Comparison of all online backup solutions (wikipedia.com)
- International Backup Awareness Day (codinghorror.com)
- When the Clouds break; Risks in the Public Cloud (brilliantthinking.net)