Optimizing the cost of your cloud backup

January 5th, 2012

A well-known challenge of new technologies such as cloud backup is that there are no set standards. Take pricing, what would you expect to pay for storing 10 GB of your data on cloud today? Given that the answer can be anything from zero to a few hundred dollars, how do you know that you are not paying more than you really should for your requirements? The question worth asking essentially is - since businesses are different and have different backup needs, why shouldn’t they be allowed to control how much they want to pay for cloud backup?

We broached this question in our recent Zmanda Cloud Backup (ZCB) webinar titled “How to get the maximum out of ZCB” (recording available here) and looked at ways to optimize ZCB costs for one’s requirements. While exploring different options, we realized something interesting – ZCB’s flexibility not only makes it very versatile, but when combined with its pay-as-you-go pricing model, it also allows great leeway in optimizing backup costs. In this post, I will try to explore the options available in ZCB to do just that.

Before we begin, allow me to clarify – while the bulk of this post focuses on cost optimization options with ZCB, the intent is to provide a systematic way of thinking about cloud backup costs. If you are a ZCB user, you can use these options directly. And if you are not a ZCB user, you can map some of these options to your backup solution (and for the benefit of all of us, please do remember to post your results in a comment below!) and see how better (or worse) it fares.

First, a look at the ZCB pricing model

ZCB’s pricing model has two components:

  • Fixed monthly license fee: $4.95 per month
  • Usage based fee:
    • Storage: $0.15 per GB-month
    • Upload to cloud: $0.15 per GB
    • Data download from cloud: Free

Admittedly, this does look more complicated than a fixed monthly cost, but its complexity really emerges from its flexibility, which does leave a lot of room for optimizing costs. Let’s see how.

Step 1: “Divide and conquer” the monthly license fee!

Got multiple machines to back up? Congratulations! Unlike most other backup services, which charge a fee per machine, ZCB allows a single ZCB license to be used to protect an unlimited number of systems. So if you have, say, 5 or 10 machines to be backed up, the fixed monthly cost per backup system becomes non-significant. (However, just be aware that machines that share a ZCB license can potentially access the backup data of each other – although use of encryption can alleviate the potential privacy issue).

Step 2: Optimize the usage based fee!

The usage based fee with ZCB simply means you pay for data storage and data uploads. Thus, optimizing this fee can involve two steps:

Step 2.1: Optimize your total backup size

Let’s first try to see how much data you really need to backup and how to shrink the size of backup media to store the backed up data. ZCB offers following options here:

  • Carefully choose what data needs to be backed up: While backing up applications such as Exchange with ZCB, you can select specific datastores instead of all datastores. For file system backups, you entirely control what gets backed up (ZCB does NO automatic selection of *.mp3, *.jpg files etc.) and you can also specify an “exclude list” to skip backing up large user files by mentioning patterns such as *.mp3 or *.mov. This point may look obvious, but doing this is not easily allowed by many cloud backup applications which attempt to maximize your backup data size, for obvious reasons ;).

    Figure 1: Exclude list

    Figure 1: Exclude list

  • Use backup levels: Incremental and differential backups contain only the data which changed since a last backup and hence reduces backup size. Use incremental and differential backups judiciously to reduce data size while still adhering to your backup strategy.

    Figure 2: Differential backups – backup changed data since last full backup

    Figure 2: Differential backups – backup changed data since last full backup

    Figure 3: Incremental backups – backup changed data since any last backup

    Figure 3: Incremental backups – backup changed data since any last backup

  • Choose backup frequency: How often you backup directly impacts your total backup data size. So you need to choose the right backup frequency which fits your backup requirements but also keeps your total data size manageable. With ZCB you can choose to do only manual backups (when you want) or choose from powerful scheduling options to perform backups every 15 mins to a certain date in every year.

    Figure 4: Choose backup frequency

    Figure 4: Choose backup frequency

  • Enable compression: Depending on your data type (document, text files are more compression friendly), enabling compression may help you shrink your storage requirement by about 10-50%.Here is a figure which summarizes all of these ZCB options:

    Figure 5: Summary of all options to optimize total backup size

    Figure 5: Summary of all options to optimize total backup size

Step 2.2: Optimize how much cloud storage is used to store backup data

Now that you have optimized the total backup data size, let’s see how you can reduce the storage required on the cloud for keeping this backup data. Here are your options with ZCB:

  • Blend cloud storage with local storage: ZCB allows you to store all or some of your backup data to your local or network storage. e.g. you may choose to store only certain full backups on cloud storage while using your local/network disk storage for your primary/frequent backups. Below is an example:

    Figure 6: A sample backup strategy to minimize cloud storage (only monthly backups go to cloud, rest all backups go to local/network storage)

    Figure 6: A sample backup strategy to minimize cloud storage (only monthly backups go to cloud, rest all backups go to local/network storage)

  • Judiciously choose the cloud data retention policy: ZCB allows complete control over the retention period for your backup data. So you can choose to adopt as aggressive retention policy as your backup policy allows, such as “retain full backups for 2 weeks and retain incremental backups for 2 days”.
  • Monitor, Monitor and Monitor: Monitor your cloud usage regularly and purge old backup runs which you don’t require. For monitoring, you can use Amazon bills, ZCB Global Dashboard and jets3t tool. And for purging old backup data which is not required, you can click on File > Purge Backup Runs Before and select a historic date all the backup runs before which will be deleted by ZCB. Do note that deleting any data which is required by subsequent backup runs (such as deleting full backups while retaining incremental/differential backups) may make your dependent backups useless for any restoration requirement in future.

    Figure 7: Purging old backup data which is no longer required

    Figure 7: Purging old backup data which is no longer required

  • Exploit the ZCB free tier: ZCB offers 5 GB free cloud storage and uploads for each of the 5 Amazon S3 regions, making it possible to use up to 25 GB free cloud storage across all 5 regions completely free! You can scatter your data across all the 5 AWS regions to fully exploit this free tier. (With two more AWS regions supported in ZCB 4.1, this free tier will soon become 35 GB free tier and hence this option becomes even more effective!).

    Figure 8: The ZCB free tier

    Figure 8: The ZCB free tier

Quite a handful ways to optimize costs, isn’t it? And perhaps the best part is - since ZCB as well as the pricing model is super-flexible, the above is not even an exhaustive list!

Are you a ZCB user? If yes, do consider these steps and let us know if/how they worked for you. And if you are not a ZCB user, I’m very curious to know how you are optimizing your costs with your current solution?

Have a “cost effective” new year!

-Nik

Drop the box and start backing up!

November 22nd, 2011

Okay first let me say this: I love Dropbox and like many of you depend on it each day to seamlessly access my important files from office/home/shared computers and from my cell phone. Also ever since Dropbox released the developer APIs, an increasing number of innovative applications (see here and here for a few examples) are coming to the fore that extend Dropbox beyond its “native” features of syncing, sharing and collaboration.

This is great but creates a potential problem. With all this excitement it is easy to get carried away and think of using Dropbox to solve a problem which it was never designed to solve – a robust cloud backup. Even at a conceptual level, classic data backup technology based tools such as Zmanda Cloud Backup (ZCB) and sync-sharing tools such as Dropbox solve very different needs of businesses. To most of backup administrators it would seem outlandish to even suggest that one can be used in place of the other (a silicon valley based system administrator, I tossed this idea to, frowned upon it and found the comparison so illogical that he spent a few seconds thinking about where to begin his explanation from!).

But yet over the last few months, since the same time since Dropbox started gaining mass acceptance, we’ve been seeing this confusion pop up in the heads of some of our prospective users. Thanks to the (well-deserved) widespread attention which Dropbox has gathered in recent times, such users would begin comparing ZCB with Dropbox for solving their data backup problems. And so far, to clear up the matter, we largely just tried to remind them about the fundamentals of disaster recovery and how Dropbox is an excellent tool to share and synchronize data but a very primitive tool to perform data backups.  I can’t tell how far we’ve succeeded in conveying this, but I know some of them indeed saw our point (they became our customers!).

But this post became unavoidable, since the plot seems to have thickened with the recent introduction of Dropbox for teams. With this latest offering, Dropbox now consciously targets businesses by offering them huge shared storage (1 TB) along with some administrative tools to manage the service. Not a bad idea really. The problem however is that to sell SMBs this much storage, Dropbox now seems to be telling them to use this storage for data backups, something it never claimed to do well so far.

So let’s scratch the surface a bit here to see what Dropbox is and what it can or can’t backup.

At the outset, let’s try to see what problems Dropbox has been designed to solve and how data backup was not one of those problems. This is how the Wikipedia defines Dropbox:


Dropbox is a Web-based file hosting service operated by Dropbox, Inc. that uses cloud computing to enable users to store and share files and folders with others across the Internet using file synchronization.

This is what it really is. You give Dropbox some files which you want to share and it laps them up, stores them on its cloud storage and shares them among multiple Dropbox clients:

Dropbox at work

Source: http://www.dropbox.com/static/images/install_graphic.gif

And when any of your files change from any shared machine, the changes are instantaneously replicated across all the shared devices. So what’s the secret sauce? Well the steadfast decision process to keep things simple for syncing and sharing the user files. See such an instance of decision making on this page.

On the other hand, a true backup solution, such as ZCB, exists to ensure that all your data gets backed up regularly and you can go back to any of the backed-up states of your machine when the sky comes crumbling down. This may sound similar, so let’s see why this goal is not achievable with Dropbox:

  1. Completeness: At a higher level, the data in your computer can be classified in following categories:
    1. User files: These are independent files like documents, presentations, spreadsheets which are created by users for their official or personal work.
    2. File system/Interlinked files: These can be your entire directory structure such as D:\, a particular special directory such as “My Documents” or a set of some files which are inter-linked – for e.g. a bunch of website files or a spreadsheet with embedded images or macros.
    3. Application data: The data created and used by your business applications such as SQL Server or Outlook. These can be databases, configuration files, temporary files, etc. and are generally created in the installation directory of these applications. Also these files are “open” when the application is running.
    4. Applications: Binaries and configuration files of applications which are installed such as Microsoft office and Adobe PDF suite.
    5. Operating system and system configuration: The installed operating system, its configuration (“System State” in Windows) and other system information such as partition table, etc.

    Now looking at the above, it is obvious that Dropbox can only be considered for data in the first and second categories. And even in second category, some special folders (e.g. C:\Program Files) can’t even be put inside your Dropbox folder. And those which can be, you are likely to have problems during restores. With many interlinked files, how are you going to find a logically consistent set of interlinked files as it existed at a particular historic point in time?

    A true backup solution such as ZCB, on the other hand, backs up almost all the above categories of data (ZCB backups Windows system state though not the operating system and boot loader/partitioning information), and the backup archives represent logical and consistent states at particular points in time.

  2. Modification/Deletion of original copy of data: A true backup solution never modifies the original copy of data, let alone deleting it. In fact even changing a file’s meta-data (archival bit, modification time etc) has been considered unacceptable by many backup administrators, since that may interfere with some other installed applications.

    But since the primary goal of Dropbox is to “synchronize” data across multiple machines, it will do all which is necessary to accomplish this goal. So if a file gets accidently deleted or corrupted on one system, Dropbox will gleefully and promptly propagate that accident to all the shared machines.This is obviously a serious problem and hence in its paid versions, Dropbox offers an “unlimited undo history” feature to allow you to undelete files. Though this surely helps, but from Disaster Recovery standpoint it still is a risky situation, since this would mean that you have lost all your local copies and now have only one remaining copy of your original data. What’s worse - it is only available on the cloud, so if you need it when you have no or poor internet connectivity, you are out of luck.

    On the other hand, a true cloud backup solution such as ZCB supports smart redundancy options where you can keep backup data on local as well as cloud storage. Since you will have 3 copies of your data (original + 2 separate copies), even if you accidently delete your original copy of file you still have two redundant copies to restore from.

  3. Security: The tricky thing about security is that it’s like insurance – you may not care for it in steady state but it can be catastrophic when something goes wrong. And security has been the number one reason why Dropbox is still unwelcome in many enterprises today. Some issues:
    1. True data privacy: Dropbox encrypts your data on the Amazon S3 cloud using an encryption key which is unique to your Dropbox account. Also note that this encryption key is known to Dropbox. This means two things. First, your data is not truly private as Dropbox personnel can potentially see your data (Of course, we believe that this is unlikely). Second, it means you can’t have any data privacy between two of your users sharing the same Dropbox account.

      The only way out here would be to use a separate file/volume level encryption tool on top of Dropbox (such as TrueCrypt). But in addition to burdening your users with new workflows related to encryption/decryption, this would most probably also make the Dropbox synchronizations inefficient, thus defeating the whole purpose of using Dropbox in first place. I recommend checking out the experiences of the commenters on this blog for the gory details of such problems if you are indeed thinking of going down this path.

      In comparison, a true backup solution like ZCB offers asymmetric encryption with the user generated certificates, making it virtually impossible for anyone else to see your encrypted data.

    2. The disadvantage of being a public “data sharing” service: Dropbox was designed to support data exchanges among multiple devices and multiple users over the internet. You can imagine that such a service needs to have somewhat relaxed rules when it comes to authentication, access rules, open ports, etc. Dropbox has already had its share of such issues – see this page and this page for examples.

    Again, in contrast, a true backup application such as ZCB has much more tighter security mechanisms. It can securely encrypt your data with user-generated keys as soon it is backed up, can send the data over a SSL tunnel to the cloud which is protected by multiple layers of authentication for gaining access. This ensures that your backup data is safe and secure; irrespective of its location - on local disk or cloud.

  4. Flexibility in choosing data retention policy: Choosing retention policy is a very important decision variable for your Disaster Recovery plan as it decides the oldest historic time you can restore to and has direct implication on your storage costs.But since Dropbox has the “unlimited undo history” feature, why should one even worry about this? My doubts about the long term sustainability of a truly “unlimited” deleted file history notwithstanding, there are at least two reasons why data retention policy still is an issue with Dropbox:
    1. There is no automatic management of your storage quota – so you need to manually delete the older files manually to free up space for newer data. With multiple users working on your shared data, won’t it be challenging to identify what data is too old and delete it manually? Until of course you buy a storage quota which is multiple times of your actual storage requirement, so you never have to delete anything!
    2. In addition, many organizations need to abide with the data storage laws which stipulate which geographical location to store data and even the maximum time customer data can be retained by a business. You don’t have any such control with Dropbox.
  5. Scheduling uploads for making them efficient and unobtrusive: One key issue for many businesses while considering cloud backup is the lack of adequate internet bandwidth. During normal business hours there is only so much bandwidth which you can devote for data backups. This is why many administrators like to schedule the backup uploads to run during the idle times such as weekends.

    Telling Dropbox when to sync is not possible, and even if it is made possible, it surely defeats the whole purpose of using such a sync tool. Yet another problem (feature!) with Dropbox is that it immediately syncs every change of your data. So if you make frequent changes to your files during the day, each of them will be synced across all your devices thus wasting your bandwidth, even though you may have just wanted to make a copy of your file at the end of the day. Again, for syncing and sharing this “churn” is the necessity and one of the core benefits of Dropbox but for backups, it is nothing but “noise” which is wasteful and disruptive for your normal business network traffic.

As you can see, the above list is by no means an exhaustive one. As you go deeper into this, more such differences pop up. But the question is – is that surprising? Given that Dropbox was conceived, designed and implemented to solve the need of syncing and sharing and not robust cloud backup, isn’t trying to do the latter is more of a “hack” than a true solution?

And did I mention that we have a webinar coming up on Dec 7th, 2011 in which we will be discussing how to get the maximum out of your ZCB installation and will also be taking some of the above issues for discussion? Please register for this webinar here. Hope to see you then!

-Nikunj

Introducing ZCB 4 – Next Generation Cloud Backup!

August 29th, 2011

Today, we announced the immediate availability of Zmanda Cloud Backup (ZCB) 4, our comprehensive cloud backup offering for Windows servers, desktops and laptops. ZCB 4 had been under a limited beta program for past few weeks and was extensively tested by many end users and resellers. We received some great feedback which led to many improvements and bug fixes. Thank you all who participated!

ZCB 4 is a huge step forward for the idea of cloud backup. And after working with many ZCB users for a while, we say this with a lot of conviction. We took a hard look at what users wanted to achieve with cloud backup and compared that with the available solutions on the market. We identified various gaps, which are addressed in ZCB4:

  1. Flexibility of choosing where to store backup data: Users of cloud backup have different needs. Some are embracing it as an extra line of defense and hence want to backup to both on-premises disks and on the cloud. On the other hand, some users are looking at cloud storage as their primary and only backup location and hence are looking for a solution which backs up data only to the cloud.ZCB had this covered from day 1. The data was always first backed up to disk and then to cloud. The data on disk could then be deleted or retained, depending on which of the above two use cases you wanted to deploy. But we realized that we could improve it further by offering a “backup to cloud” operation which would back up your data to cloud directly without using any temporary local storage. So, if you are short on disk space or don’t want backup on disk, then this operation would be handy.


    Cloud Backup new operation

  2. Improving transfer speeds: Users who have either a lot of data to backup or less Internet bandwidth to use, are hit by a basic problem – how to transfer data to and from cloud within required time limits? We also discovered cases where users had the bandwidth available and met their part of the bargain, but the backup solution was either not capable to use the bandwidth completely or the backup provider imposed limits to the upload/download speeds for the users.ZCB never imposed any transfer limits, whatsoever, on upload/download speeds and always tried hard to maximize the throughput. In ZCB 4, we made it even better by adding support for multithreaded uploads/downloads. This feature makes ZCB use multiple concurrent connections to Amazon S3 cloud and hence unlocks the bandwidth you always had available. And true to our promise of providing flexibility to users, we have made this feature entirely configurable:

    Cloud Backup multithreading

    So by default, we use 3 concurrent connections for data transfer. If you wish you can tweak this value to experiment and find out what works best in your work environment. Higher thread count may be beneficial if you have spare bandwidth and CPU/memory resources to push or pull data.

  3. Manageability and usability: We believe usability is core to the idea of cloud backup, as it involves critical decisions about when/where/how you backup your systems. Users need to be given full freedom to make these decisions and yet it shouldn’t be hard to configure and monitor the solution. Though our users and experts always rated us high in this area, we realized that the user interface  needed to be ready to handle our planned rapid growth, both in terms of product features as well as customer use cases. So in ZCB 4, we redesigned our user interface, made many workflow improvements and made it much more intuitive and easier to use. To see it in action, you can view the new ZCB screenshots.

In addition to the above, ZCB 4 also offers:

  • Backup/Restore of selective databases in SQL Server and Exchange server
  • Differential backup of SharePoint server
  • Parallel operations across multiple backup sets
  • Extensive Reporting across multiple backup sets
  • Hundreds of other improvements

ZCB 4 is also available in German and Japanese languages. For more details on ZCB 4, please refer to the release notes page.

ZCB 4 brings to the market a comprehensive, flexible and practical cloud backup solution. Also, as we gain scale, we are also working on the pricing (in case you didn’t notice, we recently announced the 25 GB free tier, perhaps a first in the industry) to make it affordable to a bigger set of users.

We are already working very aggressively on our next releases and will soon be making some exciting announcements. If you have a suggestion for us, please do drop me a line at nikunj@zmanda.com.

Zmanda Cloud Backup adds Tokyo as its latest cloud storage location

March 16th, 2011

We are adding support for Asia Pacific (Tokyo) Region in Zmanda Cloud Backup (ZCB). This is the fifth worldwide location supported by ZCB.

This support provides faster uploads for ZCB users in Japan. Throughput will be significantly higher because of less hops along the way and very high bandwidth connections typically available in Japan. Overall processing will be faster because of lower latency (expected to be single digit millisecond latency for most end users in Japan).

Cloud Backup to Three Continents Now Includes Japan

Cloud Backup to Three Continents Now Includes Japan

This support enables users to ensure that their data does not leave Japan, e.g. if required for compliance reasons.

In summary, users in Japan now have an effective and scalable solution to backup their Windows Filesystems, Microsoft Applications and Databases (MySQL, SQL Server, Oracle) to a robust storage cloud

As an introductory offer to our customers in Japan, we are waiving all transfer and storage charges to the Tokyo location until April 30th, 2011. You only pay for the initial setup fees ($4.95) and pro-rated monthly fees ($4.95 per month). After April 30th, our regular charges will apply at par with all other supported regions.

There is more on the horizon for our Japanese customers. We are soon going to offer a fully localized Japanese version of ZCB (Current shipping version has already been tested with Japanese file and folder names). Watch this space for an announcement on that within few weeks.

Zmanda Cloud Backup with Japanese Files/Folders

MySQL Backup Webinar Series: Scalable backup of live databases

October 14th, 2010

mysql logo

Setting up of a good backup and recovery strategy is crucial for any serious MySQL implementation. This strategy can vary from site to site based on various factors including size of the database, rate of change, security needs, retention and other compliance policy etc. In general, it is also required from MySQL DBAs to have least possible impact on usability and performance of the database at the time of backup - i.e MySQL and its dependent applications should remain hot during backup.

Join MySQL backup experts from Zmanda for two webinars dedicated to hot backup of MySQL:

MySQL Backup Essentials: In this webinar, we will go over best practices for backing up live MySQL databases. We will also cover Zmanda Recovery Manager (ZRM) for MySQL product in detail, including a walk through the configuration and management processes. We will discuss various features of ZRM including full backups using snapshots, point-in-time recovery, monitoring and reporting.

Register for MySQL Backup Essentials Webinar on November 23rd at 10:00AM PT

MySQL Backup to Cloud: In this webinar, we will focus on backing up MySQL databases running on Windows to the cloud. Cloud Storage provides an excellent alternative to backing up to removable media and shipping it to remote secure site. We will provide live demonstration of the Zmanda Cloud Backup (ZCB) product backing up MySQL to Amazon S3 storage. ZCB is an easy-to-use cloud backup solution which supports all Windows platforms. We will also discuss recovering MySQL database in the cloud, creating a radically low cost disaster recovery solution for MySQL.

Register for MySQL Backup to Cloud Webinar on November 30th at 10:00AM PT

Zmanda @ Oracle OpenWorld 2010

September 7th, 2010

Oracle Open World

If you are coming to this year’s Oracle OpenWorld 2010, please do visit us at Booth #3824.

We will have our backup solution experts at the show to discuss any of your database or infrastructure backup needs.

When it comes to backing up various products offered by Oracle, we have several solutions:

We hope to see you at the show!

Go Tapeless - Use Zmanda Cloud Backup for backup and disaster recovery

June 23rd, 2010

If you are in charge of ensuring backup and disaster recovery of critical servers for your business, you have undoubtedly grappled with unwieldy tapes. In this age of digital everything, writing to tapes and then shipping them to a remote location seems like a relic from another era. Advances in Cloud based services, e.g. those offered by Amazon Web Services, provide an excellent alternative to tapes for backup and disaster recovery.

We have been offering Amazon S3 based cloud backup solution for about three years now. Today we are announcing the third generation of our Zmanda Cloud Backup product. Particularly exciting for me is the support for the Asia Pacific Region.

Cloud Backup to Three Continents

Cloud Backup to Three Continents

For many of the same reasons that Amazon picked Singapore as their first Asia Pacific Region, Singapore is a great destination to preserve your valuable assets. Performance and robustness provided by Singapore’s Internet connectivity is a major plus for backup and disaster recovery needs.

Backing up your data to the cloud requires several steps. You need to (1) Plan what do you want to backup and when; (2) Extract data out of your live applications, e.g. SQL Server or Exchange; (2) Stage this backup image to transfer to the cloud; (3) Monitor the transfer for any Internet hiccups and take corrective actions; and (4) Delete backup images which have expired per your retention policy. Zmanda Cloud Backup automates these steps through an easy GUI based backup configuration and management. ZCB integrates with S3’s REST API to coordinate transfer of on-premises data to the storage cloud.

In third-gen ZCB we also added support for international character sets. So, ZCB is friendly with files and folders named in e.g. Chinese (Simplified or Traditional), Japanese or Korean.

Backup What screenshot - Chinese filenames

Backup What screenshot - Chinese filenames

Backup What screenshot - Japanese filenames

Backup What screenshot - Japanese filenames

While a lot of Zmanda’s customers backup to local disks or tapes, Cloud Backup is fastest growing part of our business. In many environments, customers are backing up some backup sets to local media and other backup sets to the Cloud - with plans to move entire backup to the storage on the Cloud in a few years. We have seen this adoption across the board, including in the traditionally conservative financial industry. So, it appears more and more IT managers are daring to go tapeless when it comes to their backup operations!

Disaster Recovery in the Cloud

June 21st, 2010

Most small and medium sized business do not have a formal Disaster Recovery (DR) plan and implementation because of its cumbersome and costly nature. Various factors make DR complex, including: (1) Allocation and administration of remote compute and storage resources; (2) Data transport mechanism - e.g. tape shipment or data replication; and (3) Application environment synchronization. To makes matter worse, regular testing of a DR implementation tends to be complicated, and in many cases not practical.

Cloud Computing provides an excellent means to radically simplify the DR process. This is achieved by backing up your critical applications to a Storage Cloud (e.g. Amazon S3), and making preparation to quickly recover in the nearby Compute Cloud (e.g. Amazon EC2).

We have two solutions for backup and DR in the cloud: Amanda Enterprise (with the Amazon S3 Option) and Zmanda Cloud Backup (ZCB). Amanda Enterprise is meant for environments with heterogeneous systems, whereas ZCB is targeted at small businesses with a handful of Windows servers and desktops.

Amanda Enterpise DR in the Cloud

Setup of Amanda Enterprise for Cloud Based DR

 

Zmanda Cloud Backup DR in the Cloud

Setup of Zmanda Cloud Backup for Cloud Based DR

 

The process of setting up DR in the cloud is as follows:

  1. Set up backup process to Amazon S3.
  2. Complete first backup of applications on primary site to S3.
  3. Configure standby VMs on EC2 to match the OS (and patch level) of the corresponding systems on your primary site. For all data storage, use Elastic Block Storage, so you have persistent data across reboots.
  4. Install Zmanda backup software on these standby VMs.
  5. Install the same S3 certificate that is used in step #1 on the standby VMs.
  6. In case of Amanda Enterprise setup the AE-DR option to replicate backup catalog and configuration to the standby VM running the AE server.
  7. Perform full recovery from S3 to standby VMs.
  8. Take a snapshot of the standby VMs.
  9. Shutdown standby VMs.
  10. Optionally start standby VMs periodically to perform steps #6-#8. This will help in reducing the time to recover after a disaster and also tests your DR process.

If you are considering the Cloud for your DR needs, come join us tomorrow (June 22nd) for a webinar: Noted Storage Analyst, Lauren Whitehouse from Enterprise Strategy Group, will be joining me: Leveraging the Cloud for Radically Simple and Cost-Effective Disaster Recovery

Taking a Snapshot of a Thousand Dancing Dolphins

April 12th, 2010

An increasing number of large MySQL applications, e.g. social networking and SaaS back-ends, use a distributed MySQL architecture. MySQL data is distributed logically or heuristically on multiple, and in some cases thousands of, real or virtual servers. Backing up such large and dynamic environments presents its own complexities.

In this blog, we will use the cluster terminology - but we do not imply that NDB Cluster storage engine is being used for MySQL. Most implementations use InnoDB for data and MyISAM for dictionary. Typical architecture for such applications uses Database Sharding - i.e. shared-nothing partitioning of data across similarly configured nodes.

In most sharded environments, high availability is built-in - i.e. the cluster can continue to answer the queries and commit the transactions of all users in face of a node failure. This is typically accomplished either by database level replication or by designing the application so that each row is mirrored on two or more nodes. If MySQL Replication is being used, then slaves can be used for load-balancing as well - as long as it is ok that some clients may not get the latest data on the master node. E.g. a profile update by a user may not be visible to all her friends right away.

But built-in high availability does not do away with the need for setting up a backup and recovery process. Just like RAID does not replace backup, Sharding with redundancy does not replace backup either. The inherent complexity of large scale distributed database environments makes errors (human, system, environmental) more probable. Also, the implied availability of these environments increases the stress during the recovery process.

Here are the backup and recovery needs for such environments, some of the needs conflict with each other:

  • Application managers desire a point-in-time restore which is coordinated across multiple servers.
  • IT managers want to have as identical configuration as possible across all nodes - so process of replacing nodes becomes simple.
  • Depending on the application, retention policy could be several years.
  • Overall application should be able to recover from multiple node failures, human errors or sabotage, and geographic problems (disaster, connectivity etc.)

Zmanda Recovery Manager for MySQL is designed to meet these challenging needs. It uses various backup methods for backing up individual shards, and manages backup and recovery of the overall MySQL environment.

For point-in-time restore capability, ZRM uses MySQL binary logs. In very high update-oriented environments - size of these binary logs can become very big. In such environments, if the organization’s Recovery Point Objective (RPO) requires to be able to recover to any point within the past few weeks, it may not be possible to store these binary logs on the MySQL node itself. In any case, in order to be able to recover in the face of complete node failures, these logs need to be stored outside of the node. So, a storage environment which is physically or logically shared among the nodes is typically a requirement for storing the backup images. This shared secondary storage does not violate the shared-nothing principles of sharding, because it is not in the path of actual application. It is out-of-band storage being accessed and managed by the backup software. Also note that ZRM can automatically remove the binary logs from the MySQL node once they have been copied over to their archive location.

Taking a Snapshot of a multiple MySQL databases

ZRM can use two techniques to allow for point-in-time recovery of distributed MySQL environments: Coordinated Backups or Coordinated Restores:

Coordinated backup provides a backup image of all nodes consistent to a specific event. E.g. all rows are backed up until a specific Global Sequence Number (GSN) - assuming a GSN exists in the application. Another option is to create a checkpoint event specifically for backup purposes. Of course, having a GSN or a checkpoint event may create periodic brief hiccups which may or may not be acceptable for the business needs. But this process creates the cleanest backup images for the whole application.

Coordinated restore allows for each individual node to be backed up completely independent of each other. This eliminates the need for a backup checkpoint event. However at the time of recovery more processing is required to make sure all nodes are recovered to a point which is logically acceptable to the higher level application. ZRM can be scripted to identify this point in the backed up binary logs for every shard. Also, the visual log analyzer feature of ZRM helps DBAs to efficiently search for these points. Note that it is possible that all shards are not recovered to their state as it existed at exact same time, however they should be recovered to a state which is acceptable for the overall application. Having the clocks of nodes synchronized will also help the DBAs to identify points-of-recovery across nodes - by being able to correlate events easily.

Being able to backup a smaller shard instead of the whole dataset provides some opportunities both from technical and logical perspective. Since the size of each shard may be relatively small, a particular backup method may be acceptable even though it would not have been acceptable if the whole dataset was in one monolithic database. If data was distributed among shards using some external criteria (e.g. users of each zip code go to a particular shard), then backup images of each shard may be individually usable by an application. ZRM creates portable backup images - a key need for backing up shards - so backups from one node can be restored on another.

If recovery from a site wide disaster is also an objective, then suitable backup images need to be securely transported to the remote site. This can be done via the new Disaster Recovery Option now available for ZRM. This option replicates backup images, backup catalog and configuration data to the remote site - enabling full disaster recovery on an as-needed basis. Individual nodes need not be replicated, saving huge hassle and cost.

If your show is backed by a pod of dancing dolphins, a well implemented and documented backup and disaster recovery process is a good investment.

What’s New in Amanda Community: Postgres Backups

March 25th, 2010

Second installment in a series of posts about recent work on Amanda.

The Application API allows Amanda to back up structured data — data that cannot be handled well by dump or tar. Most databases fall into this category, and with the 3.1 release, Amanda Community Edition ships with ampgsql, which supports backing up Postgres databases using the software’s point-in-time recovery mechanism.

The how-to for this application is on the Amanda wiki.

Operation

Postgres, like most “advanced” databases, uses a logging system to ensure consistency even in the face of (some) hardware failures. In essence, it writes every change that it makes to the database to the logfile before changing the database itself. This is similar to the operation of logging filesystems. The idea is that, in the face of a failure, you just replay the log to re-apply any potentially corrupted changes.

Postgres calls its log files WAL (write-ahead log) files. By default, they are 16MB. Postgres runs a shell command to “archive” each logfile when it is full.

So there are two things to back up: the data itself, which can be quite large, and the logfiles. A full backup works like this:

  • Execute PG_START_BACKUP(ident) with some unique identifier.
  • Dump the data directory, excluding the active WAL logs. Note that the database is still in operation at this point, so the dumped data, taken alone, will be inconsistent.
  • Execute PG_STOP_BACKUP(). This archives a text file with the suffix .backup that indicates which WAL files are needed to make the dumped data consistent again.
  • Dump the required WAL files

An incremental backup, on the other hand, only requires backing up the already-archived WAL files.

A restore is still a manual operation — a DBA would usually want to perform a restore very carefully. The process is described on the wiki page linked above, but boils down to restoring the data directory and the necessary WAL files, then providing postgres with a shell command to “pull” the WAL files it wants. When postgres next starts up, it will automatically enter recovery mode and replay the WAL files as necessary.

Quiet Databases

On older Postgres versions, making a full backup of a quiet database is actually impossible. After PG_STOP_BACKUP() is invoked, the final WAL file required to reconstruct a consistent database is still “in progress” and thus not archived yet. Since the database is quiet, postgres does not get any closer to archiving that WAL file, and the database hangs (or, in the case of ampgsql, times out).

Newer versions of Postgres do the obvious thing: PG_STOP_BACKUP() “forces” an early archiving of the current WAL file.

The best solution for older versions is to make sure transactions are being committed to the database all the time. If the database is truly silent during the dump (perhaps it is only accessed during working hours), then this may mean writing garbage rows to a throwaway table:

CREATE TABLE push_wal AS SELECT * FROM GENERATE_SERIES(1, 500000);
DROP TABLE push_wal;

Note that using CREATE TEMPORARY TABLE will not work, as temporary tables are not written to the WAL file.

As a brief encounter in #postgres taught me, another option is to upgrade to a more modern version of Postgres!

Log Incremental Backups

DBAs and backup admins generally want to avoid making frequent full backups, since they’re so large. The usual pattern is to make a full backup and then dump the archived log files on a nightly basis for a week or two. As the log files are dumped, they can be deleted from the database server, saving considerable space.

In Amanda terms, each of these dumps is an incremental, and is based on the previous night’s backup. That means that the dump after the full is level 1, the next is level 2, and so on. Amanda currently supports 99 levels, but this limit is fairly arbitrary and can be increased as necessary.

The problem in ampgsql, as implemented, is that it allows Amanda to schedule incremental levels however it likes. Amanda considers a level-n backup to be everything that has changed since the last level-n-1 backup. This works great for GNU tar, but not so well for Postgres. Consider the following schedule:

Monday level 0
Tuesday level 1
Wednesday level 2
Thursday level 1

The problem is that the dump on Thursday, as a level 1, needs to capture all changes since the previous level 0, on Monday. That means that it must contain all WAL files archived since Monday, so those WAL files must remain on the database server until Thursday.

The fix to this is to only perform level 0 or level n+1 dumps, where n is the level of the last dump performed. In the example above, this means either a level 0 or level 3 dump on Thursday. A level 0 is a full backup and requires no history. A level 3 would only contain WAL files archived since the level 2 dump on Wednesday, so any WAL files before that could be deleted from the database server.

Summary

The combination of a powerful open source database system and the open source ampgsql plugin combine to produce a powerful protected storage system for your mission-critical data. We will continue to develop additional Application API plugins, and encourage you and other members of the community to do the same!