Archive for November, 2011

Drop the box and start backing up!

Tuesday, November 22nd, 2011

Okay first let me say this: I love Dropbox and like many of you depend on it each day to seamlessly access my important files from office/home/shared computers and from my cell phone. Also ever since Dropbox released the developer APIs, an increasing number of innovative applications (see here and here for a few examples) are coming to the fore that extend Dropbox beyond its “native” features of syncing, sharing and collaboration.

This is great but creates a potential problem. With all this excitement it is easy to get carried away and think of using Dropbox to solve a problem which it was never designed to solve – a robust cloud backup. Even at a conceptual level, classic data backup technology based tools such as Zmanda Cloud Backup (ZCB) and sync-sharing tools such as Dropbox solve very different needs of businesses. To most of backup administrators it would seem outlandish to even suggest that one can be used in place of the other (a silicon valley based system administrator, I tossed this idea to, frowned upon it and found the comparison so illogical that he spent a few seconds thinking about where to begin his explanation from!).

But yet over the last few months, since the same time since Dropbox started gaining mass acceptance, we’ve been seeing this confusion pop up in the heads of some of our prospective users. Thanks to the (well-deserved) widespread attention which Dropbox has gathered in recent times, such users would begin comparing ZCB with Dropbox for solving their data backup problems. And so far, to clear up the matter, we largely just tried to remind them about the fundamentals of disaster recovery and how Dropbox is an excellent tool to share and synchronize data but a very primitive tool to perform data backups.  I can’t tell how far we’ve succeeded in conveying this, but I know some of them indeed saw our point (they became our customers!).

But this post became unavoidable, since the plot seems to have thickened with the recent introduction of Dropbox for teams. With this latest offering, Dropbox now consciously targets businesses by offering them huge shared storage (1 TB) along with some administrative tools to manage the service. Not a bad idea really. The problem however is that to sell SMBs this much storage, Dropbox now seems to be telling them to use this storage for data backups, something it never claimed to do well so far.

So let’s scratch the surface a bit here to see what Dropbox is and what it can or can’t backup.

At the outset, let’s try to see what problems Dropbox has been designed to solve and how data backup was not one of those problems. This is how the Wikipedia defines Dropbox:

Dropbox is a Web-based file hosting service operated by Dropbox, Inc. that uses cloud computing to enable users to store and share files and folders with others across the Internet using file synchronization.

This is what it really is. You give Dropbox some files which you want to share and it laps them up, stores them on its cloud storage and shares them among multiple Dropbox clients:

Dropbox at work


And when any of your files change from any shared machine, the changes are instantaneously replicated across all the shared devices. So what’s the secret sauce? Well the steadfast decision process to keep things simple for syncing and sharing the user files. See such an instance of decision making on this page.

On the other hand, a true backup solution, such as ZCB, exists to ensure that all your data gets backed up regularly and you can go back to any of the backed-up states of your machine when the sky comes crumbling down. This may sound similar, so let’s see why this goal is not achievable with Dropbox:

  1. Completeness: At a higher level, the data in your computer can be classified in following categories:
    1. User files: These are independent files like documents, presentations, spreadsheets which are created by users for their official or personal work.
    2. File system/Interlinked files: These can be your entire directory structure such as D:\, a particular special directory such as “My Documents” or a set of some files which are inter-linked – for e.g. a bunch of website files or a spreadsheet with embedded images or macros.
    3. Application data: The data created and used by your business applications such as SQL Server or Outlook. These can be databases, configuration files, temporary files, etc. and are generally created in the installation directory of these applications. Also these files are “open” when the application is running.
    4. Applications: Binaries and configuration files of applications which are installed such as Microsoft office and Adobe PDF suite.
    5. Operating system and system configuration: The installed operating system, its configuration (“System State” in Windows) and other system information such as partition table, etc.

    Now looking at the above, it is obvious that Dropbox can only be considered for data in the first and second categories. And even in second category, some special folders (e.g. C:\Program Files) can’t even be put inside your Dropbox folder. And those which can be, you are likely to have problems during restores. With many interlinked files, how are you going to find a logically consistent set of interlinked files as it existed at a particular historic point in time?

    A true backup solution such as ZCB, on the other hand, backs up almost all the above categories of data (ZCB backups Windows system state though not the operating system and boot loader/partitioning information), and the backup archives represent logical and consistent states at particular points in time.

  2. Modification/Deletion of original copy of data: A true backup solution never modifies the original copy of data, let alone deleting it. In fact even changing a file’s meta-data (archival bit, modification time etc) has been considered unacceptable by many backup administrators, since that may interfere with some other installed applications.

    But since the primary goal of Dropbox is to “synchronize” data across multiple machines, it will do all which is necessary to accomplish this goal. So if a file gets accidently deleted or corrupted on one system, Dropbox will gleefully and promptly propagate that accident to all the shared machines.This is obviously a serious problem and hence in its paid versions, Dropbox offers an “unlimited undo history” feature to allow you to undelete files. Though this surely helps, but from Disaster Recovery standpoint it still is a risky situation, since this would mean that you have lost all your local copies and now have only one remaining copy of your original data. What’s worse - it is only available on the cloud, so if you need it when you have no or poor internet connectivity, you are out of luck.

    On the other hand, a true cloud backup solution such as ZCB supports smart redundancy options where you can keep backup data on local as well as cloud storage. Since you will have 3 copies of your data (original + 2 separate copies), even if you accidently delete your original copy of file you still have two redundant copies to restore from.

  3. Security: The tricky thing about security is that it’s like insurance – you may not care for it in steady state but it can be catastrophic when something goes wrong. And security has been the number one reason why Dropbox is still unwelcome in many enterprises today. Some issues:
    1. True data privacy: Dropbox encrypts your data on the Amazon S3 cloud using an encryption key which is unique to your Dropbox account. Also note that this encryption key is known to Dropbox. This means two things. First, your data is not truly private as Dropbox personnel can potentially see your data (Of course, we believe that this is unlikely). Second, it means you can’t have any data privacy between two of your users sharing the same Dropbox account.

      The only way out here would be to use a separate file/volume level encryption tool on top of Dropbox (such as TrueCrypt). But in addition to burdening your users with new workflows related to encryption/decryption, this would most probably also make the Dropbox synchronizations inefficient, thus defeating the whole purpose of using Dropbox in first place. I recommend checking out the experiences of the commenters on this blog for the gory details of such problems if you are indeed thinking of going down this path.

      In comparison, a true backup solution like ZCB offers asymmetric encryption with the user generated certificates, making it virtually impossible for anyone else to see your encrypted data.

    2. The disadvantage of being a public “data sharing” service: Dropbox was designed to support data exchanges among multiple devices and multiple users over the internet. You can imagine that such a service needs to have somewhat relaxed rules when it comes to authentication, access rules, open ports, etc. Dropbox has already had its share of such issues – see this page and this page for examples.

    Again, in contrast, a true backup application such as ZCB has much more tighter security mechanisms. It can securely encrypt your data with user-generated keys as soon it is backed up, can send the data over a SSL tunnel to the cloud which is protected by multiple layers of authentication for gaining access. This ensures that your backup data is safe and secure; irrespective of its location - on local disk or cloud.

  4. Flexibility in choosing data retention policy: Choosing retention policy is a very important decision variable for your Disaster Recovery plan as it decides the oldest historic time you can restore to and has direct implication on your storage costs.But since Dropbox has the “unlimited undo history” feature, why should one even worry about this? My doubts about the long term sustainability of a truly “unlimited” deleted file history notwithstanding, there are at least two reasons why data retention policy still is an issue with Dropbox:
    1. There is no automatic management of your storage quota – so you need to manually delete the older files manually to free up space for newer data. With multiple users working on your shared data, won’t it be challenging to identify what data is too old and delete it manually? Until of course you buy a storage quota which is multiple times of your actual storage requirement, so you never have to delete anything!
    2. In addition, many organizations need to abide with the data storage laws which stipulate which geographical location to store data and even the maximum time customer data can be retained by a business. You don’t have any such control with Dropbox.
  5. Scheduling uploads for making them efficient and unobtrusive: One key issue for many businesses while considering cloud backup is the lack of adequate internet bandwidth. During normal business hours there is only so much bandwidth which you can devote for data backups. This is why many administrators like to schedule the backup uploads to run during the idle times such as weekends.

    Telling Dropbox when to sync is not possible, and even if it is made possible, it surely defeats the whole purpose of using such a sync tool. Yet another problem (feature!) with Dropbox is that it immediately syncs every change of your data. So if you make frequent changes to your files during the day, each of them will be synced across all your devices thus wasting your bandwidth, even though you may have just wanted to make a copy of your file at the end of the day. Again, for syncing and sharing this “churn” is the necessity and one of the core benefits of Dropbox but for backups, it is nothing but “noise” which is wasteful and disruptive for your normal business network traffic.

As you can see, the above list is by no means an exhaustive one. As you go deeper into this, more such differences pop up. But the question is – is that surprising? Given that Dropbox was conceived, designed and implemented to solve the need of syncing and sharing and not robust cloud backup, isn’t trying to do the latter is more of a “hack” than a true solution?

And did I mention that we have a webinar coming up on Dec 7th, 2011 in which we will be discussing how to get the maximum out of your ZCB installation and will also be taking some of the above issues for discussion? Please register for this webinar here. Hope to see you then!