Archive for the ‘Cloud Backup’ Category

Zmanda “googles” cloud backup!

Friday, May 11th, 2012

Today, we are thrilled to announce a new version of Zmanda Cloud  Backup (ZCB) that backs up to Google Cloud Storage. It feels great to support perhaps the first mainstream cloud storage service we were introduced to (via the breakthrough Gmail and other Google services) and considering the huge promise shown by Google’s cloud services, we are sure that this version will be very useful to many of our customers.

However, a new cloud storage partner explains only part of the excitement. :) What makes this version more significant to us is its new packaging. As you may be aware, until now ZCB came only in a Pay-As-You-Go format and while this option has been great for our customers who value the flexibility offered by this model, we realized that there are our other customers (such as government agencies) who need a fixed amount to put down in their proposals and budget provisions. To put it differently - these customers would rather trade-off some of the flexibility for certainty.

So with these customers in mind, we chose to offer this ZCB version in the following prepaid usage quota based plans:

  • $75/year for 50 GB
  • $100/year for 100 GB
  • $1,000/year for 1000 GB
  • $10,000/year for 10000 GB

Note that the above GB values are the maximum size of data users can store on the cloud at any point in time. The prices above are inclusive of all costs of cloud storage and remain unaffected even if you wish to protect multiple (unlimited!) systems.

    So what are the benefits of this new pricing option? Here are some:

  • Budget friendly: Whether you are an IT manager submitting your annual IT budget for approval or a service provider vying for a client’s business, the all-inclusive yearly plans are a great option, one you can confidently put down in writing.
  • Cost effective: If you know your requirements well, this option turns out to be dramatically cost effective. Here is a rough comparison of our pricing with some other well-known providers:

    Note:
    Zmanda Cloud Backup: The annual plan pricing for Google Cloud Storage version was used.
    MozyPro: Based on http://mozy.com/pro/pricing/ “Server Pass” option was chosen since ZCB protects Server applications at no extra cost.
    JungleDisk: Based on: https://www.jungledisk.com/business/server/pricing/ Rackspace storage option was used since this was the only “all-inclusive” price option

  • More payment options: In addition to credit cards, this version supports a variety of payment options (such as Bank transfer, checks, etc.). So whether you are a government agency or an international firm, mode of payment is never going to be an issue.
  • Simplified billing and account management: Since this aspect is entirely handled by Zmanda, it is much easier and user friendly to manage your ZCB subscription. So no more hassles of updating your credit card information and no need of managing multiple accounts. When you need help, just write to a single email id (zcb@zmanda.com), or open a support case with us, and we will assist you with everything you may need assistance with.
  • Partner friendly: The direct result of all the above benefits is that reselling this ZCB version will be much more simplified and rewarding. If you are interested in learning more, do visit our new reseller page for more details.

So with all the great benefits above, do we still expect some customers to choose our current pay-as-you-go ZCB version for Amazon S3? Of course! As we said, if your needs are currently small or unpredictable, the flexibility of scaling up and down without committing to a long term plan is a sensible option. And the 70 GB free tier and volume discount tier offered on this ZCB version can keep your monthly costs very low.

Oh and I almost forgot - along with this version, we have also announced the availability of ZCB Global Dashboard, the web-interface to track usage and backup activity of multiple ZCB systems at a single place. If you have multiple ZCB systems in your environment or you are a reseller, it will be extra useful to you.

As we work on enhancing our ZCB solution more, please keep sending us your feedback at zcb@zmanda.com. Much more is cooking with Cloud Backup at Zmanda. Will be with you with more exciting news soon!

-Nik

Great Combination for Cloud Storage: Ubuntu 12.04 + OpenStack Swift Essex Release

Monday, May 7th, 2012

We are very excited to see the release of Ubuntu 12.04 LTS and OpenStack Essex, especially the Essex version of OpenStack Swift, and the brand-new Dashboard. We have not yet seen any performance review on the OpenStack Swift Essex running on Ubuntu 12.o4. The official Dashboard Demo introduced the components of System Panel and Mange Compute, without any details for the Object Store. So, we did an apple-to-apple cloud backup performance comparison between OpenStack Swift Essex on Ubuntu 12.04 LTS and  OpenStack Swift 1.46 + Ubuntu 11.10, as well as demonstrated the functionality of Object Store in the OpenStack Dashboard.

In the following, we will first report our results on some select hardware configurations of proxy and storage node on EC2. Our previous blog (Next steps with the OpenStack Advisor) provides details about these hardware configurations and we use the following four configurations as the example implementations of a “small-scale” Swift cloud.

  • 1 Large Instance based proxy node: 5 Small Instance based storage nodes
  • 1 XL Instance based proxy node: 5 Small Instance based storage nodes
  • 1 CPU XL Instance based proxy node: 5 Small Instance based storage nodes
  • 1 Quad Instance based proxy node: 5 Medium Instance based storage nodes

The Large, XL, CPU XL and Quad instances cover a wide range of CPU and memory selections. For network I/O, Large, XL and CPU XL instances are provisioned with a Gigabit Ethernet (100~120MB/s), while the Quad instance offers 10 Gigabit Ethernet (~1.20GB/s) connectivity.

Again, we use Amanda Enterprise as our application to backup and recover a 10GB data file to/from the Swift cloud to test its write and read throughput respectively. We ensure that one Amanda Enterprise server can fully load the Swift cloud in all cases.

Two systems involved in the comparison are: (1) Ubuntu 11.10 + OpenStack Swift 1.4.6; (2) Ubuntu 12.04 LTS + OpenStack Swift Essex (Configuration parameters of OS, OpenStack and Amanda Enterprise are identical).  In the following, we use 11.10+1.46 and 12.04+Essex as the labels to represent the above two systems.

(1) Proxy node runs on the Large instance and 5 storage nodes run on the Small instances. (Note that the throughput values on y-axis are not plotted from zero)

(2) Proxy node runs on the XL instance and 5 storage nodes run on the Small instances.

(3) Proxy node runs on the CPU XL instance and 5 storage nodes run on the Small instances.

(4) Proxy node runs on the Quad instance and 5 storage nodes run on the Medium instances.

From the above comparisons, we found out 12.04 + Essex performs better than 11.10+1.4.6 in terms of the backup throughput, and the performance gap ranges from 2% - 20% with the average of 9.7%.  For recovery throughput, the average speedup over 11.10+1.4.6 is not as significant as the backup throughput.

We did not dig into as to who (Ubuntu 12.04 LTS or OpenStack Essex) is the cause of this slight improvement on throughput. But we can see that the overall combination performs statistically better. From our initial testing, based on the performance improvements as well as feature improvements, we encourage anyone who is running OpenStack Swift on Ubuntu to upgrade to the latest released versions to take advantages of their new updates. Five years support for 12.04 LTS is a great assurance to maximize ROI for your cloud storage implementation.

Next, we demonstrate the functionality of Object Store within the OpenStack Dashboard.

After we log into the Dashboard and click the “Project” Tab on the left and then the “Containers” under the “Object Store”, we see screen as below:

We can create a container by clicking “Create Container” button and we see the following screen:

After creating a container, we can click the container name and browse the objects associated with that container. Initially, a newly-created container is empty.

We can upload an object to the container by clicking “Upload Object” button:

Meanwhile, we can delete an object from the container by choosing the “Delete Object” from its corresponding drop-down list at the “Actions” column.

Also, we can choose to delete a container by  choosing the “Delete Container” from its corresponding drop-down list at the “Actions” column.

Here, we demonstrate the core functionality of Object Store in OpenStack Dashboard and from the above screenshots, we can observe that Dashboard provides very neat and friendly user interfaces to mange the containers and objects. This saves lot of time to look up command-line syntax for basic functionality.

Congratulations, Ubuntu and OpenStack teams!  The Ubuntu 12.04 + OpenStack Swift Essex Release combination is a great contribution to Open Source and Cloud Storage communities!

Building an OpenStack Swift Cloud: Mapping EC2 to Physical hardware

Friday, May 4th, 2012

As we mentioned in an earlier blog that it may seem ironical that we are using a public compute cloud to come up with an optimized private storage cloud. But ready availability of diverse type of EC2 based VMs, makes AWS a great platform for running the Sampling and Profiling phases of the OpenStack Swift Advisor.

After an optimized Swift Cloud is profiled and designed on the virtualized hardware (for example, EC2 instances in our lab), the cloud builders will eventually want to build it on the physical hardware. The question is: how to preserve the cost-effectiveness and guaranteed throughput of the Swift Cloud on the new physical hardware with new data center parameters?

A straightforward answer is to keep the same hardware and software resources in the new hosts. But, the challenge is:  EC2 (this challenge remains if other cloud compute platforms, e.g. OpenStack Compute were used for profiling) provisions the CPU resource for each type of instance in terms of “EC2 Compute Unit”‘, e.g. Large instance has 4 EC2 Compute Units, Quad instance has 33.5 EC2 Compute Units. The question is: how to translate the 33.5 EC2 Compute Units into GHz when you purchase the physical CPUs on the market for the servers? Another ambiguous resource definition associated with EC2 is the network bandwidth. EC2 has 4 standards of network bandwidth: Low, Moderate, High and Very High and for example, EC2 allocates Low bandwidth to Micro instance and Moderate bandwidth to Small instance. But, what does “Low bandwidth” means in terms of MB/s? EC2 specs provide no answers for those.

Here we want to propose a method to translate these ambiguous resource definitions (e.g. EC2 Compute Units) into the standard specifications (e.g. GHz) that can be referred when choosing the physical hardware. We focus on 3 types of hardware resources: CPU, disk and network bandwidth.

CPU: We first choose a CPU benchmark software (e.g. PassMark) and run it on a certain type of EC2 instance to get a benchmark score. Then, we look up the published benchmark scores of that benchmark software to find out which physical CPU got the similar score. For safety, we can choose the physical CPU with a little higher score to ensure it performs no worse than the virtualized CPU in the EC2 instance.

Disk: We roughly assume the I/O patterns in storage nodes are close to sequential, and we can use the “dd” Linux command to benchmark the sequential read and write I/O bandwidths on a certain type of EC2 instance. Based on the I/O bandwidth results in terms of the MB/s, cloud builders can buy the physical storage drives with the matching I/O bandwidths.

Network: To test the maximum bandwidth of a certain EC2 instance within the Swift Cloud, we setup another EC2 instance with very high network bandwidth. e.g. the EC2 Quad instance. First, we install Apache and create a test file (the size of the file depends on the memory size, as discussed later) on both EC2 instances. Then, in order to benchmark the maximum incoming network bandwidth of the EC2 instance, we issue wget command on that EC2 instance to download the test file hosted on the Quad instance. The wget command will give the average network bandwidth after the download is finished and we will use it as the maximum incoming bandwidth. To test the maximum oncoming network bandwidth, we operate the above test in the reversed direction: the Quad instance downloads the test file from the EC2 instance we want to benchmark. The reason we choose wget (instead of e.g. scp) is that wget involves less CPU overhead. Notice that, to remove the interference from the disk I/Os, we ensure the test file can fit into the memory of the EC2 instance so that there are no read I/Os needed. Also, we always execute the wget with “-O /dev/null” to bypass the write I/Os. Once we get the maximum incoming and oncoming network bandwidths, we can choose the right Ethernet components to provision the storage and proxy nodes.

Memory: As to the virtualized memory in EC2 instance, if 10 GB memory is associated with the instance, then it is straightforward to provision 10GB memory in the physical server. So, we feel that there is no translation needed for virtualized memory.

Other cloud management platforms may offer several types of instances (e.g. large, medium, small) based on their own terminologies. We can use the similar methods as above to benchmark each type of instances they offer and find the matching physical hardware.

To fully ensure that the throughput of the Swift Cloud while mapping from the EC2 instances, we advise the cloud builders to provision the physical hardware with at least 10% better specs than deduced by above translation.

Here, we show an example of how to map an EC2 c1.xlarge instance to physical hardware:

CPU: We run Pass Mark CPU benchmark on c1.xlarge. The CPU score from PassMark is: 7295. Considering to provision 10% more resource when translating from virtualized hardware to physical hardware, some choices on physical CPU include: Intel Xeon E3-1245 @ 3.30 GHz, Intel Xeon L5640 @ 2.27GHz, Intel Xeon E5649 @ 2.53 GHz etc.

Memory: As c1.xlarge instance is allocated 7GB memory,  so we could choose 8GB memory (4GB x 2 or 2GB x 4) in the physical machine.

Disk: By using the “dd” command, we found out c1.xlarge instance has 100-120 MB/s for sequential read and 70-80MB/s for sequential write, which matches to a typical 7,200 RPM based drive. Therefore, most HDDs on the market can be safe to use as data disks in the physical machine.

Network: c1.xlarge instance has around 100 MB/s network bandwidth for both incoming and outgoing traffic, which corresponds to a 1Gigabit Ethernet interface. So, a typical 1Gigabit Ethernet should be enough for networking for the physical machine.

If you are thinking of putting together a storage cloud service, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Next Steps with OpenStack Swift Advisor - Profiling and Optimization (with Load Balancer in the Mix)

Sunday, April 22nd, 2012

In our last blog on building Swift storage clouds, we proposed the framework for the Swift Advisor - a technique that takes two of  the three constraints (Capacity, Performance, Cost) as  inputs,  and provides hardware recommendations as output - specifically count and configuration of systems for each type of node (storage and proxy) of  the Swift storage cloud (Swift Cloud). Plus, we also provided a subset of our initial results for the Sampling phase.

In this blog, we will continue the discussion on Swift Advisor, first focusing on the impact of the load balancer on the aggregate throughput of the cloud (we will  refer to it as “throughput”) and then provide a subset of outcomes for the profiling and optimization phases in our lab.

Load Balancer

The load balancer distributes the incoming API requests evenly across the proxy servers. As shown below, the load balancer sits in front of the proxy servers to forward the API requests to them and can be connected with any number of proxy servers.

load balancer

If a load balancer is used, it is the only entry point of the Swift Cloud and all user data goes through it. So it is a very important component to consider for user visible performance of your Swift Cloud. In case it is not properly provisioned, it will become a severe bottleneck that inhibits the scalability of the Swift Cloud.

At a high-level, there are two types of load balancers:

Software Load Balancer: Runs a software load balancing software (e.g. Pound, Nginx) or round robin DNS on a server to evenly distribute the requests among proxy servers. The server running the software load balancer usually requires powerful multi-core CPUs and extremely high network bandwidth.

Hardware Load Balancer: Leverages the network switch/firewall or dedicated hardware with capability of load balancing to assign the incoming data traffic to the proxy servers of Swift Cloud.

Regardless of whether a software or hardware load balancer is used, the throughput of the Swift cloud cannot scale beyond the bandwidth of the load balancer. Therefore, we advise the cloud builders to deploy a powerful load balancer (e.g. with 10 Gigabit Ethernet) so that its “effective” bandwidth  exceeds the expected throughput of the Swift cloud.  We recommend that you pick your load balancer so that with a fully loaded (i.e. 100% busy) Swift Cloud, the load balancer still has around 50% unused capacity for future planning or sudden needs of higher bandwidth.

To have a sense of how to properly provision the load balancer and how it impacts the throughput of Swift Cloud, we show some results of running the Swift Cloud of c proxy and cN storage server (c:cN Swift Cloud) with the load balancer. (N is the “magic” value for 1:N Swift Cloud found in Sampling phase). These results are the “performance curves” for the profiling phase and can be directed used for optimizing your goal.

The experiments

In our last article, we already used some running examples to show how to get the output results from the Sampling phase. Here, we directly use the outputs (1:N swift cloud) of sampling phase as the inputs of the profiling phase, as seen below,

  • 1 Large Instance based proxy node: 5 Small Instance based storage nodes (N=5)
  • 1 XL Instance based proxy node: 5 Small Instance based storage nodes (N=5)
  • 1 CPU XL Instance based proxy node: 5 Small Instance based storage nodes (N=5)
  • 1 Quad Instance based proxy node: 5 Medium Instance based storage nodes (N=5)

Based on the above 1:5 swift clouds, we profile the throughput curves of c:c5 Swift cloud (c = 2, 4, 6,…) with the following setups of load balancer:

  1. Using one “Cluster Compute Eight Extra Large Instance” (Eight) with  Pound (a reverse proxy, load balancer) as the software load balancer (”1 Eight”), that all proxy nodes are connected to. (Eight Instance is one-level more powerful than Quad Instance. Similar to the Quad Instance, it also equips 10Gigabit Ethernet, but has 2X amount of CPU resources, 2 x Intel Xeon ES-2670, eight-core “Sandy Bridge” architecture, and 2X of memory.)
  2. Using two identical Eight Instances (each runs with Pound) as the load balancers (”2 Eight”). 50% proxy nodes are connected to the first Eight Instance and another 50% proxy nodes are linked to the second Eight Instance. The storage nodes have no sense of the first and second half of proxy nodes and accept all data from all of the proxy nodes.

Again, we use Amanda Enterprise as our application to backup a 20GB data file to the c:c5 Swift Cloud. We concurrently run two Amanda Enterprise servers on two EC2 Quad instances to send data to the c:c5 Swift cloud, ensuring that two Amanda Enterprise servers can fully load the c:c5 Swift cloud in all cases.

For this experiment, we focus on the backup operations, so the aggregate throughput of backup operations is simply regarded as “throughput” (MB/s) measured between the two Amanda Enterprise servers and the c:c5 Swift cloud.

Let’s first look at the throughput curves (throughput on Y-axis, values of c on X-axis) of c:c5 Swift cloud with the two types of load balancers for each of above mentioned configurations of proxy and storage nodes.

(1) Proxy nodes run on the Large instance and the storage nodes run on the Small instance. The two curves are for the two types of load balancers (LB):

Proxy nodes run on the Large instance

(2) Proxy nodes run on the XL instance and the storage nodes run on the Small instance.

Proxy nodes run on the XL instance

(3) Proxy nodes run on the CPU XL instance and the storage nodes run on the Small instance.

Proxy nodes run on the CPU XL instance

(4) Proxy nodes run on the Quad instance and the storage nodes run on the Medium instance.

Proxy nodes run on the Quad instance

From the above 4 figures, we can see that throughput of c:c5 Swift cloud using 1 Eight instance as the load balancer can not scale beyond 140MB/s. While, with 2 Eight instances as the load balancer, the c:c5 Swift Cloud can scale in linear shape (for the values of “c” we tested with).

Next, we combine the above results of “2 Eight” load balancer  into one picture, and look at it from another point of view –  throughput on Y-axis, cost ($) on X-axis. (As you may recall from our last blog, the cost is defined as the EC2 usage cost of running c:c5 swift cloud for 30 days.)

load balancer  into one picture

The above graph tells us several things:

(1) The configuration of using CPU XL instances for proxy nodes and Small instances for Storage node is not a good choice, because when compared with configuration of using XL instances for proxy nodes and Small instances for Storage node, it consumes similar cost, but delivers lower throughput. The reason for this is our observation that XL instances provide better bandwidth than CPU XL instances. AWS marks the I/O performance (including the network bandwidth) of  both XL instance and CPU XL instance as “High”. From our pure network bandwidth testing, XL instance shows maximum 120 MB/s for both incoming and outgoing bandwidth, while CPU XL instance has maximum 100 MB/s for both incoming and outgoing bandwidth.

(2) The configuration of using Large instances on proxy nodes and Small instances on Storage node is the most cost-effective. Since within each throughput group (marked as dotted circle in the figure): low, medium and high, it achieves the similar throughput, but with much lesser cost. The reason  this configuration can be cost-effective is because Large instance can provide the maximum 100 MB/s for both incoming and outgoing network bandwidth, which is similar to the XL and CPU XL instances, but is associated with 2x lower cost than the XL and CPU XL instances.

(3) While using Large instances on proxy nodes and Small instances on Storage node is very cost-effective, but the configuration of using Quad instances on proxy nodes and Medium instances on Storage node is also an attractive option. Especially if you consider the manageability and failure issues. To achieve 175MB/s througput, you can choose either 8 Large instance based proxy nodes and 40 Small instance based storage nodes (total 48 nodes), or 4 Quad instance based proxy nodes and 20 Medium instance based storage nodes (total 24 nodes). Hosting and managing more nodes in the data center may require higher IT-related costs, e.g. power, # of server racks, failure rate and IT administration. Considering those costs, it may be more attractive to setup a Swift Cloud with smaller number of more powerful nodes.

Based on the data in the above figure and considering the IT-related costs, the goal of the optimization phase is to choose the configuration that optimizes your goal best. For example, if you input the performance and capacity constraints and want to minimize the cost, let’s suppose the two configuration: (1) using Large instances for proxy nodes and Small instances for Storage nodes, and (2) using Quad instances for proxy nodes and Medium instances for Storage nodes, can both satisfy your capacity constraint. Now, the only thing left is that you want to figure out which configuration has less cost to fulfill the throughput constraint. The final result depends on your IT management costs. If your IT management cost is relatively expensive, then you may want to choose second configuration, otherwise, the first configuration will likely incur lesser cost.

In the future articles, we will talk about how to map the EC2 instances to the physical hardware so that the cloud builders can build an optimized Swift cloud running on physical servers.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

OpenStack Swift Advisor: Building Cloud Storage with Optimized Capacity, Cost and Performance

Wednesday, April 18th, 2012

OpenStack Swift is an open source cloud storage platform, which can be used to build massively scalable and highly robust storage clouds. There are two key use cases of Swift:

  • A service provider offering cloud storage with a well defined RESTful HTTP API - i.e. a Public Storage Cloud. An ecosystem of applications integrated with that API are offered to the service provider’s customers. Service provider may also choose to only offer a select service (e.g. Cloud Backup) and not offer access to the API directly.
  • A large enterprise building a cloud storage platform for use for internal applications - i.e. a Private Storage Cloud. The organization may do this because it is reluctant to send its data to a third party public cloud provider or to build a cloud storage platform which is closer to the users of its applications.

In both of above cases, as you plan to build your cloud storage infrastructure, you will face one of these three problems:

  1. Optimize my cost: You know how much usable storage capacity you need from your cloud storage, and you know how much aggregate throughput you need for applications using the cloud storage, but you want to know what is the least amount of budget you need to be able to achieve your capacity and throughput goals.
  2. Optimize my capacity: You know how much aggregate throughput you need for applications using the cloud storage, and you know your budget constraints, but you want to know the maximum capacity you can get for your throughput needs and budget constraints.
  3. Optimize my performance: You know how much usable storage capacity you need from your cloud storage, and you know your budget constraints, but you need to know the configuration to get best aggregate throughput for your capacity and budget constraints.

Solving any of the three problems above is very complex because of the myriad choices that the cloud storage builder has to make, e.g. size and number of various types of servers, network connectivity, SLAs etc. We have done extensive work in our labs and with several cloud providers to understand above problems and to address them with rigorous analysis. In this series of blogs we will provide some of the results of our findings as well as description of tools and services which can help you to build, deploy and maintain your storage cloud with confidence.

Definitions Since the terms used can be interpreted differently depending on context, below are the specific definitions used in this series of blogs for the three key parameters:

Capacity: It is the usable storage capacity, i.e. the size of the maximum application data that can be stored on the cloud storage. Usually, for better availability and durability, the data is replicated in the cloud storage across multiple systems.  So, the the raw capacity of the cloud storage should be planned with the consideration of data redundancy. For example, in OpenStack Swift,  each object is replicated three times by default. So, the total size of raw storage will be at least three times larger than the usable storage capacity.

Performance: It is the maximum aggregate throughput (MB/s or GB/s) that can be achieved by applications from the cloud storage. In this blog, we will also use the term throughput to denote aggregate throughput.

Cost: For this discussion we will only consider the initial purchase cost of the hardware for building the cloud storage. We expect that the built cloud storage will be put to use for several years, but we are not amortizing the cost over a period of time.  We will point out best practices to reduce on-going maintenance and scaling costs. For this series of blogs we will use the terms “node” and “server” interchangeably. So, “storage node” is same as “storage server”.

Introducing the framework for the Swift Advisor

The Swift Advisor is a technique that takes two of  the three constraints (Capacity, Performance, Cost) as  inputs,  and provides hardware recommendation as output, specifically count and configuration of systems for each type of node (storage and proxy) of  the Swift storage cloud. This recommendation is optimized for the third constraint: e.g. minimize  your budget, maximize your throughput, or maximize your usable storage capacity.

Before discussing the technical details of the Swift Advisor, let’s first look at a practical way to use the Swift Advisor: In order to build an optimized Swift cloud storage (Swift Cloud), an important feature of Swift Advisor is to consider a very large range of hardware configurations (e.g. a wide variety of CPU, memory, disk and network choices). However, it is unrealistic and very expensive to blindly purchase a large amount of physical hardware upfront and let Swift Advisor evaluate their individual performances as well as the overall performance after putting them together. Therefore, we choose to leverage virtualized and elastic environment offered by Amazon EC2 and build an optimized Swift Cloud on the EC2 instances initially.

While it may seem ironical that we are using a public compute cloud to come up with an optimized private storage cloud, the reasons for choosing EC2 as the test-bed for Swift Advisor are multi-fold: (1) EC2 provides many types of EC2 instances with different capacities of CPU, memory and I/O to meet the various needs. So, the Swift Advisor can try out many types of EC2 instances on the basis of pay-per-use, instead of physically owning the wide variety of hardware needed. (2) EC2 has a well defined pricing structure.

This provides a good comparison point for the cloud storage builders - they can look at the pricing information and justify the cost of owning their own cloud storage in the long run. (3) Specification of each type of EC2 instance, including CPU, memory, disk and network  is well defined. Once an optimized Swift Cloud is built on the EC2 instances with the input constraints, the specifications of those EC2 instances can effectively guide  the purchases of physical servers to build a Swift Cloud running on the physical hardware. In summary, you can use the elasticity of a compute cloud along with Swift Advisor to get specifications for your physical hardware based storage cloud, while preserving your desired constraints.

The high-level workflow of the Swift Advisor is shown below: The high-level workflow of the Swift Advisor There are four important phases and we explain them as follows:

Sampling Phase: Our eventual goal is to build an optimized Swift cloud consisting of quantity A of proxy servers and quantity B of storage severs  - A and B are unknown initially and we denote it as A:B Swift Cloud. In this first phase we focus on performance and cost characteristics of 1:N Swift Cloud. We look for the “magic” value of N that makes a 1:N Swift Cloud with the lowest cost per throughput ($ per MB/s) . The reason why we want to find a 1:N Swift cloud with the lowest $ per MB/s is to remove two potential pitfalls when building a Swift cloud : (1) Under-provisioning: the proxy server is under utilized and can still be attached to more storage servers to improve the throughput. (2) Over-provisioning: the proxy server has been overwhelmed by too many storage servers.

Since the potential combinatorial space for storage and proxy node choices is potentially huge, we use several heuristics to prune the candidates during various phases of the Swift Advisor. For example we do not consider very low powered configuration (e.g. Micro Instances) for proxy nodes.

After the sampling phase, for each combination of EC2 instance sizes on proxy and storage servers, we know the “magic” value of N that produces the lowest $ per MB/s of running a 1:N Swift cloud. You can run the sampling phase on any available virtual or physical hardware, but the larger the sample set the better.

Profiling Phase: Given the “magic” values of N from the sampling phase, our goal in this phase is to profile throughput curves (the throughput verses the size of Swift cloud) of several Swift clouds consisting of c proxy server and cN storage servers (c:cN Swift Cloud) with various values of c.

Please note that each throughput curve corresponds to each combination of hardware configuration (EC2 instance sizes in our case) of the proxy and storage servers. In our experiments, for each combination of EC2 instance sizes of the proxy and storage servers, the profiling starts from 2:2N Swift Cloud and we double the number of proxy and storage servers each time. (e.g. 4:4N, 8:8*N, ….). All cN EC2 instances for storage nodes are identical.

The profiling stops when the throughput of c:cN Swift Cloud is larger than the throughput constraint. After that, we apply a non-linear or linear regression on the profiled throughputs to plot a throughput curve with the X-values of c and Y-values of the throughput. The output of the profiling phase is a set of throughput curves of c:cN Swift Cloud, where each curve corresponds to a combination of EC2 instance sizes of the proxy and storage servers.

Optimization Phase: By taking the throughput curves from the profiling phase and two input constraints, the optimization phase is where we figure out a Swift Cloud optimized for the third parameter. We do this by plotting constraints on each throughput curve and look for the optimized value across all curves.

For example, lets say we are trying to optimize capacity with maximum budget given and minimum throughput requirement:  we will input the minimum required throughput on each throughput curve and find the corresponding values of c, and then reject the throughput curves where the implied hardware cost is more than the budget. Out of the remaining curves we will select the one resulting in maximum capacity based on cN * storage capacity of the system used for storage server.

Validation and Refinement Phase: The validation phase checks if the optimized Swift cloud really conforms to the throughput constraint through a test run of the workloads. If the test run fails a constraint, then the Swift Advisor goes to the refinement phase. The refinement phase gets the average throughput measured from the test run and sends it to the profiling phase.

The profiling phase adds that information to the profiled data to refine the throughput curves. After that, we use the refined throughput curves as the inputs to redo the optimization phase. The above four phases consists of the core of Swift Advisor. However, there are some important remaining issues to be discussed:

(1) choice of the load balancer

(2) mapping between the EC2 instance and the physical hardware when the cloud operators finally want to move the optimized Swift Cloud to physical servers, while preserving the three constraints on the new hosting hardware.

(3) SLA constraints. We will address these and other issues in building an optimized storage cloud for your needs in our future blogs.

Some Sampling Observations

In this blog, we present some of the results based on running Sampling phase on a selected configuration of systems. In future blogs, we will post the results for Profiling phase and Optimization phase.

For our sampling phase, we assume the following potential servers are available to us for proxy node: EC2 Large (Large), EC2 Extra Large (XL), EC2 Extra Large CPU-high (CPU XL) and EC2 Quadruple Extra Large (Quad). While the candidates for storage node are: EC2 Micro (Micro), EC2 Small (Small) and EC2 Medium (Medium).

Therefore, the total number of combinations of  proxy and storage nodes is 4 * 3 =12 and we need to find the “magic” value of N that produces the lowest $ per MB/s of running a 1:N Swift cloud for each combination. We start the sampling for each combination from N=5, and increase it until the throughput of 1:N Swift Cloud stops increasing. Note that a production Swift Cloud implementation requires at least 5 storage nodes. This happens when the proxy node is fully loaded and adding more storage nodes can not improve the throughput anymore.

We use Amanda Enterprise as our application to backup a 10G data file to the 1:N Swift cloud. The Amanda Enterprise runs on an EC2 Quad instance to ensure that one Amanda Enterprise server can fully load the 1:N Swift cloud in all cases. For this analysis we are assuming that the cloud builder is building the cloud storage optimized for backup operations. The user of the Swift Advisor should change the test workload based on the desired mix of expected application workload when the cloud storage goes production. We first look at the throughput for different values of N on each combination of EC2 instance sizes on proxy and storage nodes.

(1) Proxy node runs on EC2 Large instance and the three curves are for the three different sizes for the storage node:

Proxy node runs on EC2 Large instance

Observations with EC2 Large Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(2) Proxy node runs on EC2 XL instance: Proxy node runs on EC2 XL instance

Observations with EC2 XL Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(3) Proxy node runs on EC2 CPU XL instance: Proxy node runs on EC2 CPU XL instance

Observations with EC2 CPU XL Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(4) Proxy node runs on EC2 Quad instance: Proxy node runs on EC2 Quad instance

Observations with EC2 Quad Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 60
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 20
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 10

Looking at above graphs, we can already draw some conclusions: E.g. if the only storage nodes available to you were equivalent to EC2 Micro Instance and you wanted your storage cloud to be able to scale beyond 30 storage nodes (per proxy node), you should pick at least EC2 Quad Instance equivalent proxy node. Let’s look at the figures (1) - (4) from another view: fix the EC2 instance size of storage node and vary the EC2 instance size of proxy node

(5) Storage node runs on EC2 Micro instance and the four curves are for the four different EC2 instance sizes on proxy node: Observations with EC2 Micro Instance based Storage Node:

  1. Large Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  2. XL Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  3. CPU XL Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  4. Quad Instance based Proxy nodes: Throughput stops increasing at # storage node = 60

From the above graphs, we can conclude that, (a) when proxy node runs on the Quad instance, it has the capability, especially the network bandwidth, that can accommodate more storage nodes and achieve higher throughput (MB/s) than using other instances for the proxy node.  (b) Different EC2 instance sizes on storage node load the same proxy node at different speed: for example, when proxy node runs on the Quad instance, we need to use 60 Micro instances as storage nodes to fully load the proxy node.

While, if we use Small or Medium instance size on storage node, we only need 10 storage nodes to fully load the proxy node. Based on the above results on throughput, now we look at the $ per throughput (MB/s) for different values of N on each combination of EC2 instance sizes on proxy and storage nodes. Here, $ is defined as the EC2 usage cost of running 1:N Swift cloud for 30 days. In this blog we are only showing numbers with proxy node set to EC2 Quad Instance. We will publish numbers for other combinations in another detailed report.

(6) Proxy node runs on EC2 CPU Quad instance: Proxy node runs on EC2 CPU Quad instance Observations with EC2 Quad Instance based Proxy Node:

  1. Micro Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 60
  2. Small Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 15
  3. Medium Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 5

Overall, the lowest $ per MB/s in the above figure  is achieved by using Medium Instance based Storage nodes at # storage node = 5 This specific result will provide input to the profiling phase of N=5, 15 and 60 for proxy/storage node combination EC2 Quad/Medium, EC2 Quad/Small and EC2 Quad/Micro respectively.

So, one can conclude that when using 1 Quad Instance based Proxy node it may be better to use 5 Medium based Storage nodes to achieve the lowest $ per MB/s, rather than using more Micro Instance based storage nodes. Above graphs are a small subset of the overall performance numbers achieved during the Sampling phase.

The overall objective here is to give you a summary of our recommended approach to building an optimized Swift Cloud. As mentioned above, we will publishing detailed results in another report, as more conclusions and best practices in future blogs in this series.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

MySQL Backup Updated

Tuesday, April 10th, 2012

As MySQL continues to grow (as a technology and as an ecosystem) the need and importance of creating and deploying robust MySQL backup solutions grows as well. In many circles Zmanda is known as “The MySQL Backup Company”. While we provide backup of a wide variety of environments, we gladly take the label of backing up the most popular open source database in the world, especially as we kick off our presence at the 2012 MySQL Conference.

Here are some of the updates to our MySQL backup technologies that we are announcing at the conference:

Announcing Zmanda Recovery Manager 3.4

We have updated the popular Zmanda Recovery Manager (ZRM) for MySQL product for scalability. Our customers continue to deploy ZRM to backup ever larger MySQL environments. Some of the scalability features include: Better support for hundreds of backup sets within one ZRM installation, support for more aggressive backup schedules, better support for site-wide templates, and deeper integration with NetApp’s snapshot mechanisms. We have also added support for the latest versions of XtraBackup and MySQL Enterprise Backup. We have also added experimental support for backing up Drizzle (via XtraBackup). If you are deploying Drizzle in your environment, we are looking for beta customers.

Many of our customers store their MySQL databases on NetApp storage. ZRM can be used in conjunction with NetApp Snapshot and SnapVault products to create database consistent backups without moving the data out of NetApp storage. ZRM creates snapshots of MySQL database volumes, which it can then move to another Netapp storage using Netapp SnapVault. SnapVault moves the data efficiently between various NetApp filers. This provides customers a way to protect the backups without impacting their corporate LAN. ZRM uses SnapRestore functionality to quickly restore the databases in case of a failure.

Announcing MySQL Backup Agent for Symantec NetBackup

If you have Symantec NetBackup deployed in your environment, and you would like to consolidate your MySQL backups within the umbrella of NetBackup based backup infrastructure, now you have a well integrated solution. We have released MySQL backup Agent, which is deeply integrated with Symantec NetBackup. This agent allows you do perform live backups of your MySQL databases directly from your MySQL servers to your NetBackup server.

NetBackup MySQL Agent


Backup of your MySQL databases to the Cloud

Public or Private Cloud Storage is a great choice for offsite store for backup archives. You can also use compute clouds as inexpensive DR site for your MySQL databases. For MySQL databases running on Windows, our Zmanda Cloud Backup product provides a very easy and inexpensive way to backup to Amazon S3 or Google Cloud Storage.

If you have MySQL databases running on Linux or heterogeneous environments, you have two choices for backing up to the cloud: You can use our Amanda Enterprise product with Amazon S3 or Google Cloud Storage option to move backup images created by ZRM to the cloud. Second option is to use the recently released Amazon Storage Gateway in conjunction with ZRM.

ZRM Backing Up To AWS Gateway Storage

We have published an integration report (available on Zmanda Network under the MySQL Backup section - free registration required) to show how you can deploy AWS Gateway to asynchronously upload backup files created by ZRM to Amazon S3.

As you can see, we have been busy updating our MySQL backup solutions. All of above improvements and feature additions have been done based on feedback provided by MySQL DBAs. If you are visiting the MySQL user conference this week, please do visit us at our booth - we would love to understand and discuss your MySQL backup challenges.

Cloud Backup Your Way! (Releasing ZCB 4.1)

Monday, February 27th, 2012

Today, we released ZCB 4.1, a major update to our prior version 4.0.2. In addition to significant polish and general fixes, ZCB 4.1 has several features requested by our customers.

Here is a walkthrough:

Better utilize your Internet bandwidth based on your work schedule

Traditionally, data backup is seen as an activity to be completed during nights or weekends – when users are not actively using their systems. But today, this practice is difficult to follow for two main reasons: First, the available time-window for backups has now shrunk as people now work from different offices at different times of the day. Second, with improvements in Internet speeds still lagging behind the growth in data volumes, hoping to upload everything during the weekend looks like an eventual impossibility.

So how does one cope with this changed reality? While we can’t perhaps solve the problem of lack of bandwidth, we can try dividing it better between our production work and backup procedure. And this is exactly what ZCB 4.1 offers through its new feature which allows you to specify the bandwidth throttling limits down to the granularity of 15 minute intervals of the day. This essentially means that you can control ZCB’s usage of bandwidth to exactly fit your environment’s unique network utilization pattern.

To help you understand how this feature can be used, we have also included intelligent predefined templates such as “Throttle on weekdays” and “Gradual throttle on weekdays”. The latter template, for example, limits ZCB’s usage of bandwidth during peak weekday hours and relaxes the limits as it begins to close. This is shown in the figure below:

Time Window

(In the above screenshot, green bars represent full bandwidth usage, red bars represent significantly throttled bandwidth and other bars represent a value in between these two extremes).

You can also work with Zmanda’s support team to customize these templates to exactly fit your needs.

ZCB now supports seven backup locations in four continents

ZCB 4.1 supports the two newest regions of Amazon S3 - US West (Oregon) and South America (São Paulo). If you are near these regions and/or wish to use them - you can celebrate a more, since all usage charges for these two regions are waived off until March 20th, 2012!

With this update, ZCB now supports seven convenient regions (spread across four continents!) to backup to – making backups of our users across the globe more efficient, convenient and practical. And when we are talking about a global user-base, let me add that in addition to English, German, Chinese (Traditional and Simplified) and Japanese languages, ZCB UI is now available in Korean language too. ZCB will speak more languages soon - stay tuned!

Cloud Locations

Backup Oracle servers

ZCB 4.1 includes the support for backing up Oracle 11g databases running on Windows servers. All backup levels - Full, Differential and Incremental are supported.

Oracle

Backup more, at the same time

ZCB 4.1 supports parallel backups across backup sets. This means you don’t have to wait for your ongoing backup to finish before future backups begin. This allows you to schedule backups independently and easily. This item was on our radar for quite some time and we have finally added this support in ZCB 4.1.

Faster and more efficient restores

When all you want is to restore a few specific files from your backups, why should the whole backup archive get downloaded from the cloud? Yes, now ZCB only downloads the specific chunk of data from within the backup archive which it needs to complete the requested restore, so that you can recover your data faster - with minimal downtime.

Save costs with a finer grained control on retention policy

ZCB 4.1 allows users to have different retention policies for full and incremental/differential backups. What this means is that you can choose just how long you want a particular kind of backup to remain on the cloud. This new feature will be very useful in ensuring judicious use of your cloud storage, which could, in turn, translate to significant reductions your backup costs. An instance of such a backup scheme is:

retention

Here, full backups are scheduled every week and retained as per default retention policy (two weeks). But the user doesn’t want to retain incremental backups for so long and wishes to delete them after 8 days since the backup cycle is of 1 week.

The above feature further illustrates the design of ZCB to let you backup your way, and also have control in deciding how much you want to pay for your backups. To see more such examples, you may want to read my earlier blog post.

Bulk-deploy ZCB on multiple machines with the new Configuration Cloning utility

ZCB 4.1 includes a new utility to help you in deploying ZCB on multiple of your machines more efficiently. If you are looking to protect multiple of your systems using ZCB, you may be interested in exploring this new feature. For more details, please look at our knowledgebase article here: http://network.zmanda.com/lore/article.php?id=576

So this was a brief walkthrough of ZCB 4.1. If you are an existing customer and find these features interesting, download it from your account on http://network.zmanda.com and upgrade now (the release notes can be found here). And if you are yet to purchase ZCB, well, let us know what’s been holding you back!

Our engineering is aggressively working on many more cloud backup innovations. If you would like to request a feature or have some feedback, we would love to hear from you at zcb@zmanda.com.

Optimizing the cost of your cloud backup

Thursday, January 5th, 2012

A well-known challenge of new technologies such as cloud backup is that there are no set standards. Take pricing, what would you expect to pay for storing 10 GB of your data on cloud today? Given that the answer can be anything from zero to a few hundred dollars, how do you know that you are not paying more than you really should for your requirements? The question worth asking essentially is - since businesses are different and have different backup needs, why shouldn’t they be allowed to control how much they want to pay for cloud backup?

We broached this question in our recent Zmanda Cloud Backup (ZCB) webinar titled “How to get the maximum out of ZCB” (recording available here) and looked at ways to optimize ZCB costs for one’s requirements. While exploring different options, we realized something interesting – ZCB’s flexibility not only makes it very versatile, but when combined with its pay-as-you-go pricing model, it also allows great leeway in optimizing backup costs. In this post, I will try to explore the options available in ZCB to do just that.

Before we begin, allow me to clarify – while the bulk of this post focuses on cost optimization options with ZCB, the intent is to provide a systematic way of thinking about cloud backup costs. If you are a ZCB user, you can use these options directly. And if you are not a ZCB user, you can map some of these options to your backup solution (and for the benefit of all of us, please do remember to post your results in a comment below!) and see how better (or worse) it fares.

First, a look at the ZCB pricing model

ZCB’s pricing model has two components:

  • Fixed monthly license fee: $4.95 per month
  • Usage based fee:
    • Storage: $0.15 per GB-month
    • Upload to cloud: $0.15 per GB
    • Data download from cloud: Free

Admittedly, this does look more complicated than a fixed monthly cost, but its complexity really emerges from its flexibility, which does leave a lot of room for optimizing costs. Let’s see how.

Step 1: “Divide and conquer” the monthly license fee!

Got multiple machines to back up? Congratulations! Unlike most other backup services, which charge a fee per machine, ZCB allows a single ZCB license to be used to protect an unlimited number of systems. So if you have, say, 5 or 10 machines to be backed up, the fixed monthly cost per backup system becomes non-significant. (However, just be aware that machines that share a ZCB license can potentially access the backup data of each other – although use of encryption can alleviate the potential privacy issue).

Step 2: Optimize the usage based fee!

The usage based fee with ZCB simply means you pay for data storage and data uploads. Thus, optimizing this fee can involve two steps:

Step 2.1: Optimize your total backup size

Let’s first try to see how much data you really need to backup and how to shrink the size of backup media to store the backed up data. ZCB offers following options here:

  • Carefully choose what data needs to be backed up: While backing up applications such as Exchange with ZCB, you can select specific datastores instead of all datastores. For file system backups, you entirely control what gets backed up (ZCB does NO automatic selection of *.mp3, *.jpg files etc.) and you can also specify an “exclude list” to skip backing up large user files by mentioning patterns such as *.mp3 or *.mov. This point may look obvious, but doing this is not easily allowed by many cloud backup applications which attempt to maximize your backup data size, for obvious reasons ;).

    Figure 1: Exclude list

    Figure 1: Exclude list

  • Use backup levels: Incremental and differential backups contain only the data which changed since a last backup and hence reduces backup size. Use incremental and differential backups judiciously to reduce data size while still adhering to your backup strategy.

    Figure 2: Differential backups – backup changed data since last full backup

    Figure 2: Differential backups – backup changed data since last full backup

    Figure 3: Incremental backups – backup changed data since any last backup

    Figure 3: Incremental backups – backup changed data since any last backup

  • Choose backup frequency: How often you backup directly impacts your total backup data size. So you need to choose the right backup frequency which fits your backup requirements but also keeps your total data size manageable. With ZCB you can choose to do only manual backups (when you want) or choose from powerful scheduling options to perform backups every 15 mins to a certain date in every year.

    Figure 4: Choose backup frequency

    Figure 4: Choose backup frequency

  • Enable compression: Depending on your data type (document, text files are more compression friendly), enabling compression may help you shrink your storage requirement by about 10-50%.Here is a figure which summarizes all of these ZCB options:

    Figure 5: Summary of all options to optimize total backup size

    Figure 5: Summary of all options to optimize total backup size

Step 2.2: Optimize how much cloud storage is used to store backup data

Now that you have optimized the total backup data size, let’s see how you can reduce the storage required on the cloud for keeping this backup data. Here are your options with ZCB:

  • Blend cloud storage with local storage: ZCB allows you to store all or some of your backup data to your local or network storage. e.g. you may choose to store only certain full backups on cloud storage while using your local/network disk storage for your primary/frequent backups. Below is an example:

    Figure 6: A sample backup strategy to minimize cloud storage (only monthly backups go to cloud, rest all backups go to local/network storage)

    Figure 6: A sample backup strategy to minimize cloud storage (only monthly backups go to cloud, rest all backups go to local/network storage)

  • Judiciously choose the cloud data retention policy: ZCB allows complete control over the retention period for your backup data. So you can choose to adopt as aggressive retention policy as your backup policy allows, such as “retain full backups for 2 weeks and retain incremental backups for 2 days”.
  • Monitor, Monitor and Monitor: Monitor your cloud usage regularly and purge old backup runs which you don’t require. For monitoring, you can use Amazon bills, ZCB Global Dashboard and jets3t tool. And for purging old backup data which is not required, you can click on File > Purge Backup Runs Before and select a historic date all the backup runs before which will be deleted by ZCB. Do note that deleting any data which is required by subsequent backup runs (such as deleting full backups while retaining incremental/differential backups) may make your dependent backups useless for any restoration requirement in future.

    Figure 7: Purging old backup data which is no longer required

    Figure 7: Purging old backup data which is no longer required

  • Exploit the ZCB free tier: ZCB offers 5 GB free cloud storage and uploads for each of the 5 Amazon S3 regions, making it possible to use up to 25 GB free cloud storage across all 5 regions completely free! You can scatter your data across all the 5 AWS regions to fully exploit this free tier. (With two more AWS regions supported in ZCB 4.1, this free tier will soon become 35 GB free tier and hence this option becomes even more effective!).

    Figure 8: The ZCB free tier

    Figure 8: The ZCB free tier

Quite a handful ways to optimize costs, isn’t it? And perhaps the best part is - since ZCB as well as the pricing model is super-flexible, the above is not even an exhaustive list!

Are you a ZCB user? If yes, do consider these steps and let us know if/how they worked for you. And if you are not a ZCB user, I’m very curious to know how you are optimizing your costs with your current solution?

Have a “cost effective” new year!

-Nik

Drop the box and start backing up!

Tuesday, November 22nd, 2011

Okay first let me say this: I love Dropbox and like many of you depend on it each day to seamlessly access my important files from office/home/shared computers and from my cell phone. Also ever since Dropbox released the developer APIs, an increasing number of innovative applications (see here and here for a few examples) are coming to the fore that extend Dropbox beyond its “native” features of syncing, sharing and collaboration.

This is great but creates a potential problem. With all this excitement it is easy to get carried away and think of using Dropbox to solve a problem which it was never designed to solve – a robust cloud backup. Even at a conceptual level, classic data backup technology based tools such as Zmanda Cloud Backup (ZCB) and sync-sharing tools such as Dropbox solve very different needs of businesses. To most of backup administrators it would seem outlandish to even suggest that one can be used in place of the other (a silicon valley based system administrator, I tossed this idea to, frowned upon it and found the comparison so illogical that he spent a few seconds thinking about where to begin his explanation from!).

But yet over the last few months, since the same time since Dropbox started gaining mass acceptance, we’ve been seeing this confusion pop up in the heads of some of our prospective users. Thanks to the (well-deserved) widespread attention which Dropbox has gathered in recent times, such users would begin comparing ZCB with Dropbox for solving their data backup problems. And so far, to clear up the matter, we largely just tried to remind them about the fundamentals of disaster recovery and how Dropbox is an excellent tool to share and synchronize data but a very primitive tool to perform data backups.  I can’t tell how far we’ve succeeded in conveying this, but I know some of them indeed saw our point (they became our customers!).

But this post became unavoidable, since the plot seems to have thickened with the recent introduction of Dropbox for teams. With this latest offering, Dropbox now consciously targets businesses by offering them huge shared storage (1 TB) along with some administrative tools to manage the service. Not a bad idea really. The problem however is that to sell SMBs this much storage, Dropbox now seems to be telling them to use this storage for data backups, something it never claimed to do well so far.

So let’s scratch the surface a bit here to see what Dropbox is and what it can or can’t backup.

At the outset, let’s try to see what problems Dropbox has been designed to solve and how data backup was not one of those problems. This is how the Wikipedia defines Dropbox:


Dropbox is a Web-based file hosting service operated by Dropbox, Inc. that uses cloud computing to enable users to store and share files and folders with others across the Internet using file synchronization.

This is what it really is. You give Dropbox some files which you want to share and it laps them up, stores them on its cloud storage and shares them among multiple Dropbox clients:

Dropbox at work

Source: http://www.dropbox.com/static/images/install_graphic.gif

And when any of your files change from any shared machine, the changes are instantaneously replicated across all the shared devices. So what’s the secret sauce? Well the steadfast decision process to keep things simple for syncing and sharing the user files. See such an instance of decision making on this page.

On the other hand, a true backup solution, such as ZCB, exists to ensure that all your data gets backed up regularly and you can go back to any of the backed-up states of your machine when the sky comes crumbling down. This may sound similar, so let’s see why this goal is not achievable with Dropbox:

  1. Completeness: At a higher level, the data in your computer can be classified in following categories:
    1. User files: These are independent files like documents, presentations, spreadsheets which are created by users for their official or personal work.
    2. File system/Interlinked files: These can be your entire directory structure such as D:\, a particular special directory such as “My Documents” or a set of some files which are inter-linked – for e.g. a bunch of website files or a spreadsheet with embedded images or macros.
    3. Application data: The data created and used by your business applications such as SQL Server or Outlook. These can be databases, configuration files, temporary files, etc. and are generally created in the installation directory of these applications. Also these files are “open” when the application is running.
    4. Applications: Binaries and configuration files of applications which are installed such as Microsoft office and Adobe PDF suite.
    5. Operating system and system configuration: The installed operating system, its configuration (“System State” in Windows) and other system information such as partition table, etc.

    Now looking at the above, it is obvious that Dropbox can only be considered for data in the first and second categories. And even in second category, some special folders (e.g. C:\Program Files) can’t even be put inside your Dropbox folder. And those which can be, you are likely to have problems during restores. With many interlinked files, how are you going to find a logically consistent set of interlinked files as it existed at a particular historic point in time?

    A true backup solution such as ZCB, on the other hand, backs up almost all the above categories of data (ZCB backups Windows system state though not the operating system and boot loader/partitioning information), and the backup archives represent logical and consistent states at particular points in time.

  2. Modification/Deletion of original copy of data: A true backup solution never modifies the original copy of data, let alone deleting it. In fact even changing a file’s meta-data (archival bit, modification time etc) has been considered unacceptable by many backup administrators, since that may interfere with some other installed applications.

    But since the primary goal of Dropbox is to “synchronize” data across multiple machines, it will do all which is necessary to accomplish this goal. So if a file gets accidently deleted or corrupted on one system, Dropbox will gleefully and promptly propagate that accident to all the shared machines.This is obviously a serious problem and hence in its paid versions, Dropbox offers an “unlimited undo history” feature to allow you to undelete files. Though this surely helps, but from Disaster Recovery standpoint it still is a risky situation, since this would mean that you have lost all your local copies and now have only one remaining copy of your original data. What’s worse - it is only available on the cloud, so if you need it when you have no or poor internet connectivity, you are out of luck.

    On the other hand, a true cloud backup solution such as ZCB supports smart redundancy options where you can keep backup data on local as well as cloud storage. Since you will have 3 copies of your data (original + 2 separate copies), even if you accidently delete your original copy of file you still have two redundant copies to restore from.

  3. Security: The tricky thing about security is that it’s like insurance – you may not care for it in steady state but it can be catastrophic when something goes wrong. And security has been the number one reason why Dropbox is still unwelcome in many enterprises today. Some issues:
    1. True data privacy: Dropbox encrypts your data on the Amazon S3 cloud using an encryption key which is unique to your Dropbox account. Also note that this encryption key is known to Dropbox. This means two things. First, your data is not truly private as Dropbox personnel can potentially see your data (Of course, we believe that this is unlikely). Second, it means you can’t have any data privacy between two of your users sharing the same Dropbox account.

      The only way out here would be to use a separate file/volume level encryption tool on top of Dropbox (such as TrueCrypt). But in addition to burdening your users with new workflows related to encryption/decryption, this would most probably also make the Dropbox synchronizations inefficient, thus defeating the whole purpose of using Dropbox in first place. I recommend checking out the experiences of the commenters on this blog for the gory details of such problems if you are indeed thinking of going down this path.

      In comparison, a true backup solution like ZCB offers asymmetric encryption with the user generated certificates, making it virtually impossible for anyone else to see your encrypted data.

    2. The disadvantage of being a public “data sharing” service: Dropbox was designed to support data exchanges among multiple devices and multiple users over the internet. You can imagine that such a service needs to have somewhat relaxed rules when it comes to authentication, access rules, open ports, etc. Dropbox has already had its share of such issues – see this page and this page for examples.

    Again, in contrast, a true backup application such as ZCB has much more tighter security mechanisms. It can securely encrypt your data with user-generated keys as soon it is backed up, can send the data over a SSL tunnel to the cloud which is protected by multiple layers of authentication for gaining access. This ensures that your backup data is safe and secure; irrespective of its location - on local disk or cloud.

  4. Flexibility in choosing data retention policy: Choosing retention policy is a very important decision variable for your Disaster Recovery plan as it decides the oldest historic time you can restore to and has direct implication on your storage costs.But since Dropbox has the “unlimited undo history” feature, why should one even worry about this? My doubts about the long term sustainability of a truly “unlimited” deleted file history notwithstanding, there are at least two reasons why data retention policy still is an issue with Dropbox:
    1. There is no automatic management of your storage quota – so you need to manually delete the older files manually to free up space for newer data. With multiple users working on your shared data, won’t it be challenging to identify what data is too old and delete it manually? Until of course you buy a storage quota which is multiple times of your actual storage requirement, so you never have to delete anything!
    2. In addition, many organizations need to abide with the data storage laws which stipulate which geographical location to store data and even the maximum time customer data can be retained by a business. You don’t have any such control with Dropbox.
  5. Scheduling uploads for making them efficient and unobtrusive: One key issue for many businesses while considering cloud backup is the lack of adequate internet bandwidth. During normal business hours there is only so much bandwidth which you can devote for data backups. This is why many administrators like to schedule the backup uploads to run during the idle times such as weekends.

    Telling Dropbox when to sync is not possible, and even if it is made possible, it surely defeats the whole purpose of using such a sync tool. Yet another problem (feature!) with Dropbox is that it immediately syncs every change of your data. So if you make frequent changes to your files during the day, each of them will be synced across all your devices thus wasting your bandwidth, even though you may have just wanted to make a copy of your file at the end of the day. Again, for syncing and sharing this “churn” is the necessity and one of the core benefits of Dropbox but for backups, it is nothing but “noise” which is wasteful and disruptive for your normal business network traffic.

As you can see, the above list is by no means an exhaustive one. As you go deeper into this, more such differences pop up. But the question is – is that surprising? Given that Dropbox was conceived, designed and implemented to solve the need of syncing and sharing and not robust cloud backup, isn’t trying to do the latter is more of a “hack” than a true solution?

And did I mention that we have a webinar coming up on Dec 7th, 2011 in which we will be discussing how to get the maximum out of your ZCB installation and will also be taking some of the above issues for discussion? Please register for this webinar here. Hope to see you then!

-Nikunj

Die nächste Generation der Sicherung in dezentralen Rechenzentren (Cloud) - ZCB 4 ist hier!

Wednesday, August 31st, 2011

Heute haben wir die sofortige Verfügbarkeit von Zmanda Cloud-Backup (ZCB) 4, unserem umfassenden Sicherungsprogramm für Windows-Server, Desktops und Laptops angekündigt. Das ermöglicht Ihnen, Ihre Daten und Systeme in dezentraken Rechenzentren (Cloud) zu sichern. Innerhalb der letzten Wochen ist ZCB 4 als ein begrenztes Beta-Programm ausgiebig von vielen Endkunden und Wiederverkäufern getestet worden. Wir haben ein tolles Feedback, das zu Fehlerbehebungen und vielen Verbesserungen geführt hat, bekommen. Vielen Dank an alle, die teilgenommen haben!

Die Idee der Sicherung auf Cloud ZCB 4 ist ein zukunftsweisender Schritt nach vorn. Ja, wir sind in diesem Punkt sehr selbstbewusst, aber dieses Selbstbewusstsein ruht auf dem Feedback von Tausenden von Zmanda-Kunden.

Da wir die auf dem Market verfügbaren Lösungen und die Wünsche der Anwender stets im Auge behalten haben, war es uns möglich, verschiedene Lücken in mehreren Sicherungsprogrammen zu identifizieren und sie in ZCB 4 zu beheben.

  1. Flexibilität durch die Wahl, wo die gesicherten Daten gespeichert werden sollen:

    Die Nutzer einer Sicherung in der Cloud (Cloud Backup) haben unterschiedliche Bedürfnisse.

    Manche möchten eine zusätzliche Schutzvorsorge für ihre Daten treffen und somit die Sicherungskopien sowohl auf einem lokalen Datenträger als auch in einem dezentralen Rechenzentrum (Cloud) speichern.

    Für andere Nutzer soll die Cloud als ihre primäre und einzige Speicherstelle dienen und somit brauchen sie eine Lösung, die ihre Daten sichert und direkt auf Cloud speichern kann. Genau das war die erste Anwendung von ZCB - eine Sicherungskopie direkt auf Cloud zu speichern. Vorher wurden die Sicherungskopien auch auf lokalen Festplatten gespeichert. Eine der Verbesserungen die Sie in ZCB 4 vorfinden werden, ist die Möglichkeit, eine Sicherungskopie direkt in der Cloud zu erstellen, ohne ihren lokalen Speicherplatz in Anspruch zu nehmen.
    Diese Erweiterung ermöglicht es Ihnen, auch bei einer geringen Kapazität Ihrer Speicherplatte, Ihre Daten zuverlässig zu sichern!

    Sie können also Ihre Sicherungskopien lokal speichern und dann in die Cloud hochladen. Die Daten auf der Festplatte könnten dann gelöscht oder beibehalten werden, oder Sie erstellen direkt eine Sicherungskopie in der Cloud, je nachdem welche der beiden oben genannten Anwendungsfälle für Sie in Frage kommen.


    Cloud Backup new operation

  2. Die Verbesserung der Übertragungsgeschwindigkeiten: Benutzer, die entweder eine große Menge von Daten zu sichern oder weniger Internet-Bandbreite haben, sind durch ein grundlegendes Problem betroffen - wie kann man die Daten in die und von der Cloud innerhalb der geforderten Fristen übertragen? Wir entdeckten auch Fälle, in denen Nutzer die notwendige Bandbreite zur Verfügung hatten, die Sicherungsprogramme aber entweder nicht in der Lage waren die Bandbreite vollständig zu nutzen oder die Anbieter der Sicherungslösungen bestimmte Beschränkungen für die Hoch- und Runterlade- Geschwindigkeiten vorgeschrieben hatten.

    ZCB setzt keine Grenzen für die Übertragungsgeschwindigkeit und ist immer bemüht, aus allen Ressourcen maximalen Nutzen zu erzielen. Mit ZCB 4 haben wir es möglich gemacht, verschiedene Pfade fürs Hoch- und Runterladen zu nutzen. Diese Eigenschaft erlaubt mehrere Verbindungen zu Amazon S3 Cloud gleichzeitig zu nutzen uns so so die Bandbreite, die Sie schon immer zur Verfügung gehabt haben. Und so haben wir getreu unserem Versprechen, die Flexibilität für die Nutzer zu erweitern, diese Eigenschaft komplett konfigurierbar gemacht.


    Cloud Backup multithreading

    Standardmäßig verwenden wir drei gleichzeitige Verbindungen für die Datenübertragung. Wenn Sie möchten, können Sie diesen Wert ändern um zu experimentieren und herauszufinden, was in Ihrer Arbeitsumgebung an besten funktioniert. Eine höhere Pfadenanzahl kann vorteilhaft sein, wenn Sie eine freie Bandbreite und CPU-Ressourcen für die Datensendung und den Datenempfang zur Verfügung haben.

  3. Verwaltbarkeit und Benutzerfreundlichkeit: Wir glauben, dass die Benutzerfreundlichkeit der Kern der Idee einer Sicherung von Daten in der Cloud ist. Da es hier um die wichtigen Entscheidungen - wann, wo und wie Ihre Daten/Systeme gesichert werden sollen - geht, müssen Anwender volle Freiheit haben, um diese Entscheidungen treffen zu können. Andererseits soll es nicht zu schwer sein, die Sicherungsvorgänge zu konfigurieren und zu beobachten. So haben wir in ZCB 4, haben wir unsere Benutzeroberfläche neugestaltet und manche Verbesserungen gemacht, um die Bedienbarkeit intuitiver und einfacher zu machen. Bitte schauen Sie sich die neuen ZCB Screenshots an.

Zusätzlich zu den oben genannten Eigenschaften hat ZCB 4 Folgendes zu bieten:

  • Vollständige Lokalisierung auf Deutsch
  • Sicherung /Wiederherstellung von selektiven Datenbanken in SQL-Servern und Exchange-Servern
  • Differentielle Sicherung von SharePoint-Servern
  • Parallel-Operationen über mehrere Sicherungssets
  • Umfassende Berichte über mehrere Sicherungssets
  • Hunderte von weiteren Verbesserungen

ZCB 4 bringt eine umfassende, flexible und praktische Lösung für die Daten- und System-Sicherung, sowohl auf lokalen Datenträgern als auch in der Cloud auf den Markt.

Wir arbeiten intensiv an der nächsten ZCB-Version und werden in Kürze einige spannende Ankündigungen machen. Wenn Sie weitere Fragen oder einen Vorschlag für uns haben sollten, wenden Sie sich bitte an tatjana@zmanda.com!