Archive for the ‘Chander Kant’ Category

Amanda Enterprise 3.3.5 Release with Hyper-V and Vaulting Support

Friday, December 6th, 2013

Our objective with Amanda Enterprise is to continue to widen its umbrella to protect most common IT infrastructure elements being deployed in the modern data center. We also seek to support various workflows for secondary and tertiary data as desired by the IT administrators. The new release of Amanda Enterprise (3.3.5) comes with two key features for these objectives.

Hyper-V Support

Hyper-V is a native hypervisor, which is getting increasingly popular especially in Windows server-centric IT shops. Amanda Enterprise now supports image level backup of live VMs running on Hyper-V. We support Hyper-V which is installed as a role either on Windows Server 2012 or Windows Server 2008.

One license for Zmanda Windows Client and Zmanda Client for Microsoft Hyper-V enables you to backup an unlimited number of VMs on one hypervisor. You need to run the Zmanda Windows Client on the hypervisor, and configure backups of VMs via Zmanda Management Console (GUI for Amanda Enterprise). ZMC discovers the VMs running on each hypervisor. You can choose whether you want to backup all or a selected set of VMs on each hypervisor.

A full consistent image of the VM is backed up to media of your choice - disk, cloud storage or tape.

Of course, you can continue to install and run separately licensed Zmanda Clients within your virtual machines (guests) to perform finer granular backups. This enables you to recover from smaller disasters (like accidental deletion of a file) without recovering the whole VM. You can consider having different schedules for your file-level or application-level backups and VM level backups - e.g. file-level backups may be performed daily and VM level backups may be performed each weekend.

Vaulting Support

While many of our customers have already implemented disk-to-disk-to-cloud (d2d2c) or disk-to-disk-to-tape (d2d2t) data flows using our products, we have formally introduced Vaulting as a feature supported via Zmanda Management Console. There are two key use cases supported by this feature:

  • Backup to disk and then Vault to lower cost disk or remote disk (via NFS)
  • Backup to disk and then Vault to cloud or tape for offsite storage

Our implementation allows very flexible media duplication (e.g., you can vault from multiple disk volumes to a smaller number of physical tapes). You can also choose to vault only the latest full backups for off-site archiving purposes. All storage devices supported by Amanda Enterprise can be a source or destination for vaulting (e.g., for one of the backup sets you can vault to a tape library and for another backup set on the same backup server you can vault to cloud storage). The Backup catalog of Amanda Enterprise keeps tracks of various backup copies and will allow for restoration from either the backup media volumes or vaulted volumes.

Note that vaulting is most useful when you want to duplicate your backup images. If you want to have a data flow to periodically move your backups e.g. from disk to cloud, you can simply use the staging area in Amanda Enterprise and treat the storage on staging area as your first backup location.

In addition to above new features, Amanda Enterprise 3.3.5 has several bug fixes and usability improvements. If you are interested in learning more, you can join one of our upcoming webinars (next one scheduled on December 18th): http://zmanda.com/webinars.html

If you are already on the Amanda Enterprise platform, you can discuss upgrading to 3.3.5 with your Zmanda support representative. If you are interested in evaluating Amanda Enterprise for your organization, please contact us at zsales@zmanda.com

Amanda Enterprise 3.3 brings advanced backup management features

Wednesday, March 20th, 2013

Built on extensive research and development, combined with active feedback from a thriving open source community, Amanda Enterprise (AE) 3.3 is here! AE 3.3 has significant architecture and feature updates and is a robust, scalable and feature-rich platform that meets the backup needs of heterogeneous environments, across Linux, Windows, OS X and Solaris-based systems.

As we worked to further develop Amanda Enterprise, it was important to us that the architecture and feature updates would provide better control and management for backup administration.  Our main goal was to deliver a scalable platform which enables you to perform and manage backups your way.

Key enhancements in Amanda Enterprise include:

Advanced Cloud Backup Management: AE 3.3 now supports use of many new and popular cloud storage platforms as backup repositories. We have also added cloud backup features to give users more control over their backups for speed and data priority.

 Backup Storage Devices Supported by Amanda Enterprise 3.3


Backup Storage Devices Supported by Amanda Enterprise 3.3

Platforms supported now include Amazon S3, Google Cloud Storage, HP Cloud Storage, Japan’s IIJ GIO Storage Service, and private and public storage clouds built on OpenStack Swift. Notably, AE 3.3 supports all current Amazon S3 locations including various locations in US (including GovCloud), EU, Asia, Brazil and Australia.

Cloud Storage Locations Supported by Amanda Enterprise


Cloud Storage Locations Supported by Amanda Enterprise

In addition to new platforms, now, you can control how many parallel backup (upload) or restore (download) streams you want based on your available bandwidth. You can even throttle upload or download speeds per backup set level; for example, you can give higher priority to the backup of your more important data.

Optimized SQL Server and Exchange Backups: If you are running multiple SQL Server or Exchange databases on a Windows server, AE 3.3 allows selective backup or recovery of an individual database. This enables you to optimize the use of your backup resources by selecting only the databases you want to back up, or to improve recovery time by enabling recovery of a selected database. Of course, the ability to do an express backup and recovery of all databases on a server is still available.

Further optimizing, Zmanda Management Console (which is the GUI for Amanda Enterprise) now automatically discovers databases on a specific Windows server, allowing you to simply pick and choose those you want to backup.

Improved Virtual Tape and Physical Tape Management: Our developers have done extensive work in this area to enhance usability, including seamless management of available disk space. With extensive concurrency added to the Amanda architecture, you can eliminate using the staging disk for backup-to-disk configurations. AE 3.3 will write parallel streams of backups directly to disk without going through the staging disk. You can also choose to optionally configure staging disk for backup to tapes or clouds to improve fault tolerance and data streaming.

Better Fault Tolerance: When backing up to tapes, AE 3.3 can automatically withstand the failure of a tape drive. By simply configuring a backup set to be able to use more than one tape drive in your tape library, if any of the tape drives is not available, AE will automatically start using one of the available drives.

NDMP Management Improvements: AE 3.3 allows for selective restore of a file or a directory from a Network Data Management Protocol (NDMP) based backup. Now, you can also recover to an alternative path or an alternative filer directly from the GUI. Support for compression and encryption for NDMP based backups has also been added to the GUI. Plus, in addition to devices from NetApp and Oracle, AE now also supports NDMP enabled devices from EMC.

Scalability, Concurrency and Parallelism: Many more operations can now be executed in parallel. For example, you can run a restore operation, while active backups are in progress. Parallelism also has been added in various operations including backup to disk, cloud and tapes.

Expanded Platform Support: Our goal is to provide a backup solution which supports all of the key platforms deployed in today’s data centers. We have updated AE 3.3 to support latest versions of Windows Server, Red Hat Enterprise Linux, CentOS, Fedora, Ubuntu, Debian and OS X. With AE, you have flexibility of choosing the platforms best suited for each application in your environment – without having to worry about the backup infrastructure.

Want to Learn More?

There are many new enhancements to leverage! To help you dive in, we hosted a live demonstration of Amanda Enterprise 3.3. The session provides insights on best practices for setting up a backup configuration for a modern data center.

Zmanda Recovery Manager for MySQL - What’s New in Version 3.5

Wednesday, March 20th, 2013

As we continue to see MySQL being implemented in bigger and more challenging environments, we are working to ensure Zmanda Recovery Manager for MySQL (ZRM) matches this growth and provides a comprehensive, scalable backup management solution for MySQL that can easily integrate into any network backup infrastructure.

The latest release of ZRM for MySQL is a significant next step, bringing disk space and network usage optimization and enhanced backup reporting, along with simplified management to help configure backups quickly and intelligently.   Additionally, ZRM for MySQL 3.5 now supports backup of MySQL Enterprise Edition, MySQL Community Edition, SkySQL, MariaDB, and MySQL databases running on latest versions of Red Hat Enterprise Linux, CentOS, Debian, Ubuntu and Windows – giving you an open choice for your MySQL infrastructure, now and in future, with confidence that your backup solution will continue to work.

Here is a look at the key updates in ZRM:

Optimization of Disk Space: We’ve implemented streaming for various backup methods so that you don’t need to provide additional disk space on the systems running MySQL servers. This will allow you to do hot backup of your MySQL databases without having to allocate additional space on the system running MySQL. Backup data will get directly stored on the ZRM server.

Optimization of Network Usage: We have implemented client-side compression for various backup methods so you can choose to compress backup data even before it is sent to the ZRM server. Of course, you also have the choice to compress on the backup server; for example, if you don’t want to burden the MySQL server with backup compression operation.

Enhanced Backup Reporting: Backup is often where IT meets compliance. ZRM allows you to generate backup reports for all of the MySQL databases in your environment. With the latest version, now you can generate unified backup reports across backup sets too.

Simplified Management: One of the key features of ZRM is that it hides nuances of particular types of backup method for MySQL behind an easy-to-use GUI, the Zmanda Management Console (ZMC). With the new release, ZMC brings new features for applicable backup methods, such as parallelism, throttling, etc. You will also find several tool tips to help you configure your backups quickly and intelligently, without having to dig through documentation on specific backup methods.

Broad Platform Coverage: MySQL gets implemented in various shapes and forms on various operating systems. We continue to port and test all variants of MySQL on all major operating system platforms. ZRM 3.5 supports backup of MySQL Enterprise Edition, MySQL Community Edition, SkySQL and MariaDB. Backup of MySQL databases running on latest versions of Red Hat Enterprise Linux, CentOS, Debian, Ubuntu and Windows is supported.

Seamless Integration with Backup Infrastructure: ZRM is architected for the MySQL DBAs. In order for DBAs to integrate and comply with the overall backup methodology of their corporate environment, we have made sure that ZRM can integrate well into any of the network backup infrastructures being used. While ZRM is already known to work well with almost all network backup environments, we have completed specific integration and testing of ZRM 3.5 with Amanda Enterprise, Symantec NetBackup, and Tivoli Storage Manager.

If you are putting together a new MySQL based environment, or looking to add a well managed backup solution to your existing MySQL infrastructure, our MySQL backup solutions team is ready to help: zsales@zmanda.com

Zmanda - A Carbonite Company

Wednesday, October 31st, 2012

I am very excited to share that today Zmanda has combined forces with Carbonite - the best known brand in cloud backup. I want to take this opportunity to introduce you to Carbonite and tell you what this announcement means to the extended Zmanda family, including our employees, customers, resellers and partners.

First, we become “Zmanda - A Carbonite Company” instead of “Zmanda, Inc.” and I will continue to lead the Zmanda business operations. Carbonite will continue to focus on backing up desktops, laptops, file servers, and mobile devices. Zmanda will continue to focus on backup of servers and databases. Carbonite’s sales team will start selling Zmanda Cloud Backup directly and through its channels. Since Carbonite already has a much larger installed base of users and resellers, our growth should accelerate considerably next year which will allow us to innovate at an even higher level than before. Zmanda’s direct sales team and resellers will continue to offer the Zmanda products they respectively specialize in.

I’ve gotten to know Carbonite over the last few months and I am very impressed with their organization and am looking forward to joining the management team. One of the things that attracted me to Carbonite was its commitment to customer support. Carbonite has built a very impressive customer support center in Lewiston, Maine, about a two hour drive north of their Boston headquarters, where it now employs a little over 200 people. We’ll be training a technical team in Maine to help us support Zmanda Cloud Backup, and of course we’ll also be keeping our support teams in Silicon Valley and in Pune, India for escalations and support of Amanda Enterprise and Zmanda Recovery Manager for MySQL. Please note that at this point, all our current methods of connecting with customer support including Zmanda Network, will continue as is.

Another thing that makes Carbonite a good fit for us is its commitment to ease-of-use. Installing and operating Carbonite’s backup software is as easy as it gets. We share this goal, and we hope to learn a thing or two from the Carbonite team on this front - as we continue to build on aggressive roadmap of all our product lines.

We’ve worked hard to make Zmanda products as robust as possible. Our technologies, including our contributions to the open source Amanda backup project, have been deployed on over a million systems worldwide. Amanda has hundreds of man years of contributed engineering. We believe it is one of the most solid and mature backup systems in the world. Much of what we have done for the past five years has been to enhance the open source code and provide top notch commercial support. Carbonite, too, understands that being in the backup business requires the absolute trust of customers and I believe that every day the company works hard to earn that trust: it respects customer privacy, is fanatical about security, and has made a real commitment to high quality support.

I and the other Zmanda employees are very enthusiastic and proud to be joining forces with Carbonite. We look forward to lots of innovation in the Zmanda product lines next year and hope that you will continue to provide us with the feedback that has been so helpful in the evolution of our products.

How swift is your Swift? Benchmarking OpenStack Swift.

Monday, October 8th, 2012

The OpenStack Swift project has been developing at a tremendous pace. The version 1.6.0 was released in August followed by 1.7.4 (Folsom) just after two months!  In these two recent releases, many important features have also been implemented, for example the optimization for using SSD, object versioning, StatsD logging and much more – many of these features have significant implications for performance planning for the cloud builders and operators.

As an integral part of deploying a cloud storage platform based on OpenStack Swift, benchmarking a Swift cluster implementation is essential before the cluster is deployed for production use. Preferably the benchmark should simulate the eventual workload that the cluster will be subjected to.

In this blog, we discuss following Swift benchmarking concepts:
(1)    Benchmark Dimensions for Swift cluster: performance, scalability and degraded-mode performance (e.g. when hardware and software failures happen).
(2)    Sample workloads for Swift cluster

Benchmark Tools for Swift

There are currently two Swift benchmark tools available: swift-bench and COSBench.

swift-bench is a command-line benchmark tool that is shipped along with Swift distribution. Recently,  we improved swift-bench to allow for random object sizes and better usability.

COSBench is a fairly new web-based benchmark tool, led by the researchers at Intel. Fortunately, we obtained a trial version of COSBench. Based on our initial experience with COSBench, we believe it represents a very helpful tool, and may become the the de facto Swift benchmarking tool in the future.

Benchmark Dimensions

Dimension 1 – Performance

The performance dimension is to measure the performance of the Swift cluster when it is under a certain load. The performance metrics can be specified in many ways. In most cases, the cloud operators will be interested in the following four performance metrics:

(1)    The average throughput (number of operations per second)
(2)    The average bandwidth (MB/s)
(3)    The average response time of all requests.
(4)    Response time for a certain percentage of requests (e.g. 95 percentile).

To measure the performance, we first need to populate a Swift cluster with some data (i.e. objects) to simulate an initial stage. The size of the initially loaded objects can be controlled by the inputs of the benchmark client. Subsequently, a pre-defined workload is executed against the Swift cluster while the performance is measured.

When measuring the performance, there is one key issue we need to pay attention to:  First, we need to carefully adjust the number of threads because it determines how much workload the benchmark clients will generate against the Swift cluster. Since we want to measure the performance of the Swift cluster when it is under load or saturated, we need to increase the number of threads, until the point at which the bandwidth/throughput becomes stable and the average response time starts to increase very sharply.

As the number of threads increases, the benchmark client will get busier. We need to make sure that it has enough resources (CPU, memory, network bandwidth) to use and should not be the performance bottleneck.

While the performance of the client software (Cyberduck, Cloud Backup software etc.), that is connecting with Swift, is an important factor in the overall usability of the storage cloud, the scope of this blog is the performance of the storage cloud platform itself.

Dimension 2 – Scalability

The benchmark on scalability is to test if a Swift cluster can scale out gracefully by adding more servers and other resources. We can conduct this benchmark in the following steps:  we proportionally add more servers for each type of node in the Swift cluster. For example, we double the number of the storage nodes and proxy nodes with the same hardware and software configurations. Then, we run the same workloads to measure the performance. If a Swift cluster can scale out nicely, then its bandwidth/throughput will be increased in proportion to the number of new servers we added in. Otherwise, the cloud operators should analyze what is the bottleneck to prevent it from scaling well.

To simulate a real-world scenario, we need to test the scalability of a Swift cluster while it is running. As suggested by a blog from SwiftStack, cloud operators may consider adding new servers gradually in order to avoid the performance degradation because of the data movement between the existing and new servers. During the measurement, we want to observe: (1) if the Swift cluster operates normally (i.e. no period of service disruption) and (2) the increase on performance when the new servers are added into the Swift cluster.

Dimension 3 – Degraded Mode Performance

The cloud operators will face hardware or software failures at some points. If their objective is to ensure that their clusters will perform at a certain level (e.g. abide by the performance SLA) even in face of the failures, they should benchmark their Swift cluster appropriately upfront.

The most straightforward way to measure the availability of a Swift cluster is to intentionally shut down some nodes and measure the number of errors (e.g. failed operations) and performance degradation when the Swift is running in the degraded mode.

There are some factors that increase the complexities of benchmarking the degraded Swift cluster. For example, the failures can happen at every possible system level. For example, I/O devices, OS, Swift processes or even the entire server. The impact of failures is different when they occur at different levels. So, the failure scenarios at all system levels need to be considered. Such as, to simulate a disk failure, we may intentionally umount the disk; To simulate a Swift process failure, we need to kill some or all Swift processes on a node; To simulate an OS or entire server failure, the server could be temporarily powered off; Or a whole zone could be powered off (to simulate power failure of an entire rack of servers).

By combining the above considerations together, we notice that the total problem space for analyzing all failure scenarios may be very huge for a large-scale Swift cluster. So, it is more practical to prioritize those failure scenarios. For example, only the worst scenarios or more common scenarios are evaluated first.

In our presentation at the coming OpenStack Summit, we will present our empirical results to show how a Swift cluster performs when the hardware failures occur.

Sample Workloads

The COSBench tool allows users to define a Swift workload based on the following two aspects: (1) range of the object sizes in the workload (e.g. from 1MB to 10MB). (2) the ratio of PUT, GET and DELETE operations (e.g. 1:8:1).

The object sizes in a workload may have certain distributions. For example, uniform, Zipfan and more. At this point, based on our experiences with COSBench, it assumes the object sizes are uniformly distributed within the pre-defined range. Plus, it assumes all objects have the equal possibility to be accessed by the GET operation. It may be a good direction for COSBench to add more choices on the distribution when the users want to specify the object size and access pattern.

In the following table, we provide some sample Swift workloads in the following table.

Upload Intensive

Download Intensive

Small Objects (size range:1KB-100KB)

GET: 5%, PUT: 90%, DELETE:5%

Example: Online gaming hosting service — the game sessions are periodically saved as the small files which record the user profiles and game information in the order of the time series.

GET: 90%, PUT: 5%, DELETE:5%

Example: Website hosting service — once a new webpage is published by the owner, lots of read requests will hit on the new webpage.

Large Objects (size range:1MB – 10MB)

GET: 5%, PUT: 90%, DELETE:5%

Example: Enterprise Backup — small files are compressed into large trunk of data and backed up to cloud storage. Occasionally, the recovery and delete operations are needed.

GET: 90%, PUT: 5%, DELETE:5%

Example: Online video sharing service — once the new video clips are uploaded, lots of download traffic will be generated when people watch those new video clips.

Plus, the benchmark users are free to define their own favorite workloads based on the two inputs: range of object sizes and ratio between PUT, GET and DELETE operations.

We will discuss above dimensions and benchmarks workloads in detail in future blogs, as well as at our presentation at the OpenStack Summit in San Diego (Presentation at 4:10PM on October 18th). We hope to see you there.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Storing Pebbles or Boulders: Optimizing Swift Cloud for different workloads

Thursday, August 23rd, 2012

While many storage clouds are built as multi-purpose clouds, e.g. to store backup files, images, documents etc., but a cloud builder may be tasked to build and optimize the cloud for a specific purpose. E.g. a storage cloud made for cloud backup may optimize for large object sizes and frequent writes, whereas a storage cloud built for storing images may be optimized for relatively smaller objects with frequent reads.

OpenStack Swift provides a versatile platform to build storage clouds for various needs. As we discussed in our last blog, a cloud builder can choose faster I/O devices for storing their Container database to enhance performance under some scenarios. However, a careful analysis is required to determine under what scenarios the investment in the faster I/O devices for the container DB makes sense. More broadly, we are interested in how to properly provision the Swift cloud for different workloads.

In the first part of this blog, we will focus on how to provision the I/O devices for the container DB. After that, our discussion will be generalized on how to provision the storage nodes under the workloads that contain either small or large objects. We understand that in the real world, the object sizes in a workload may be varied in a wide range. However, in order to study the broad question of provisioning the Swift cloud, it is instructive to consider and separate two extreme workloads in which most objects in a workload are either pebble-sized or boulder-sized.

We will first present the experiments to show how to provision the I/O devices for the container DB with the workloads differing in object sizes.

Experimental Results

Workload Generator

As we did in our last blog, we use Swift-bench as the workload generator to benchmark the Swift cloud in terms of # PUT operations per second. We configured Swift bench for our experiments as follows:

object_size: we use 10KB or 1MB as the average object size to simulate two different workloads: (1) the average size of the objects in the workload is relatively small. (2) The average size of the objects in the workload is relatively large. Some real-world examples of small objects could be the PDF, MS Word documents or the JPEG-format pictures. While the backup or archiving data is usually large in size. (Note that: the real workloads in productions may have even larger average object size. But comparing Swift’s behavior for 10KB sized objects vs. 1MB sized objects provides useful insights to predict behavior as size of objects gets larger. Also, an application like Amanda Enterprise will typically chunk the archives into smaller objects before transferring to the cloud.)

concurrency: we set this parameter to 500 in order to saturate the Swift cloud.

num_container: we use 10 containers in our experiments. This may e.g. imply that there are 10 users of this storage cloud.

num_put: when the object size is 10KB, we upload (PUT) 10 million of such objects to the Swift cloud. However, when the object size is 1MB, we upload (PUT) 100K of such objects. As discussed in [2], the performance of container DB degrades (e.g. to 5-10 updates per second for the container DB) when the number of objects in each container is in the order of magnitude of millions. Since we have 10 containers, our target is to have 1 million objects in each container. So, we set 10 million for the num_put parameter for 10 KB objects. In order to have an equivalent total upload size, we set the num_put parameter to 100K when we upload 1MB objects.

Testing bench

Two types of EC2 instances are used to implement a small-scale Swift cloud with 1 proxy node and 4 storage nodes.

Proxy node: EC2 Cluster Compute Quadruple Extra large Instance (23 GB of memory, 33.5 EC2 Compute Units)

Storage node: High-CPU Extra large Instance (7GB of memory, 20 EC2 Compute Units)

Recently, AWS released the new EBS volume based on the Provisioned IOPS, which lets the AWS user specify the IOPS ( from 100 to 1000) for each EBS volume that will be attached to an EC2 instance. For example, an EBS with 1000 IOPS indicates that it can achieve a maximum of 1000 IOPS (for 16KB I/O request size) regardless of the I/O access pattern. So, a cloud builder can experiment with an EBS volume with higher IOPS to simulate a faster I/O device.

As mentioned in our last blog, the current version of Swift-bench only allows using 1 account. But an unlimited number of containers can be stored in that account. So, our benchmark is executed based on the following sequence: log into 1 existing account, then create 10 containers, and then upload 10 million or 100K objects (depending on the object size). In our experiments, we measure the upload (PUT) operations per second.

Two implementations of Swift cloud are compared: (1) Swift with 1000-IOPS based container DB (We call this 1000-IOPS Swift) and (2) Swift with 500-IOPS based container DB (We call this 500-IOPS Swift).

The 1000-IOPS Swift is implemented with 1 proxy node and 4 storage nodes. Each storage node attaches 9 of 1000-IOPS EBS volumes for storing all objects, 1 of 200-IOPS EBS volume for storing the account DB and 1 of 1000-IOPS EBS for storing the container DB.

The 500-IOPS Swift is implemented with 1 proxy node and 4 storage nodes. Each storage node attaches 9 of 1000-IOPS EBS volumes for storing all objects, 1 of 200-IOPS EBS volume for storing the account DB and 1 of 500-IOPS EBS for storing the container DB.

The proxy node has 10Gbps Ethernet, while the storage node has 1Gbps Ethernet.

Software Settings

We use OpenStack Swift version 1.6.1 and the authentication method on proxy node is TempAuth. All proxy, container, account and object-related parameters are set to Defaults, except: in proxy-server.conf, #worker = 500; in account-server.conf, # workers = 32; in container-server.conf, #worker = 32 and db_preallocation = on; in object-server.conf, # workers = 32.

The Swift-bench, proxy and authentication services run on the proxy node and we ensure that the proxy server is never the bottleneck of the Swift cloud. The account, container, object and rsync services run on the storage nodes.

The number of replicas in the Swift cloud is set to two and the main memory of each node is fully utilized for caching the files and data.

Benchmark results

Figure 1 show the operation rate (operations per second on the Y-axis) of the PUT operation for the two Swift implementations over the benchmark window, when the object size is 10KB. Overall, as seen from Figure 1, we notice that when the object size is set to 10KB, the 1000-IOPS Swift achieves higher operation rate than the 500-IOPS, and 68% extra operation rate is observed when 10 million objects have been uploaded.

Figure 1: Comparing two Swift implementations when the object size is 10KB


To compare with Figure 1, we also plot the operation rate of the PUT operation when the object size 1MB, as shown in Figure 2.

Figure 2: Comparing two Swift implementations when the object size is 1MB


In contrast with Figure 1, the two Swift implementations show the same performance when the object size is 1MB. Moreover, in Figure 1, when the objects (10KB size) are being uploaded, the performance of the two Swift implementations kept decreasing from first object upload onwards. . However, when the object size is 1MB (see Figure 2), the performance of the two Swift implementations increases initially and then becomes stable after the initial stage.

We conclude the results in Figure 1 and Figure 2 as follow:

(1) For the upload workload that mostly contains small objects (e.g. 10KB in our test), it is a good practice to use faster I/O devices for the container DB, because each small object can be quickly uploaded to I/O devices, the container DB should have a faster I/O device to keep up with the fast speed of uploading small objects.

(2) For the upload workload that mostly contains larger objects, using faster I/O devices for the container DB does not make much sense. This is because the storage node spends more time on storing the large objects to the I/O devices and consequently, the update frequency of the container DB is relatively slow. So, there is no need to supply the container DB with faster I/O devices.

Besides the discussion on how to provision the I/O device for the container DB, we also want to discuss how to provision other types of resources in the storage node for these two workloads. To this end, we also monitored the CPU usage, network bandwidth and the I/O devices (that are used for storing the objects) of the storage node during the runs of our benchmarks and summarize our observations below.

CPU: Comparing to case of uploading large objects, we note that the CPU usage is higher when the small objects are being uploaded. The reason is the object service is much busier to handle the newly uploaded small objects every second. (2) the container service has to deal with more updates generated from the container DB. Thus, more CPU resource in the storage node will be consumed when uploading the small objects.

Network bandwidth: Uploading large objects will consume more network bandwidth. This is can be verified by Figure 1 and Figure 2: in Figure 1, when the 10 millions of objects are uploaded, the operation rate of 1000-IOPS Swift is 361 and the object size is 10KB, so the total network bandwidth is about 3.5 MB/s. However, while uploading the large objects (see Figure 2), when 100K of objects are uploaded, the operation rate of 1000-IOPS Swift is 120 and the object size is 1MB, so the total network bandwidth is around 120 MB/s.

I/O devices for storing the objects: The I/O pattern of those I/O devices is more random when the small objects are being uploaded. This can be verified by Figure 3, where we plot the distribution of the logical block distance (LBN) distance between two successive I/Os. As seen from Figure 3, when uploading the objects of 1MB size, only 9% of successive I/Os are separated more than 2.5 million LBN away. However, for the case of uploading the objects of 10KB size, about 38% of successive I/Os are more than 2.5 million LBN away. So, this comparison shows that the I/O pattern generated by uploading 1MB objects is much less random. For reference we also plot the pattern for a large sequential write on the same storage node. We observe that for the case of uploading 1MB objects, 70% of successive I/Os are more than 80 and less than 160 LBN away, which is also the range where most of the successive I/Os for Sequential Write fall into.

Figure 3: The distribution of logical block number (LBN) between two successive I/Os for the 1MB object size and 1KB object size. (“M” denotes Million in x-axis)


To summarize, the important take-away points from the above discussion are:

For the upload workload that mostly contains small objects (pebbles), it will be rewarded for higher operation rate by provisioning the storage node with faster CPU, faster I/O devices for the container DB and only moderate network bandwidth. We should avoid using I/O devices with very low random IOPS for storing the objects, because the I/O pattern from those I/O devices is not sequential, and this will become the bottleneck of the storage node. So, use of SSDs can be considered for this workload.

For the upload workload that mostly contains the large objects, it is adequate to provision the storage node with the commodity CPU and moderate I/O speed for the container DB. In order to have better throughput (MB/s) of the Swift cloud, it is recommended to choose a large bandwidth network, and the I/O devices with high sequential throughput (MB/s) for storing the objects. IOPs are not critical for this workload, so standard SATA drives may be sufficient.

Of course, these choices have to be aligned with higher level choices e.g. number of storage and proxy nodes. Overall, cloud builders can benefit from the optimization practices we mentioned in our Swift Advisor blog.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Sumo nodes for OpenStack Swift: Mixing storage and proxy services for better performance and lower TCO

Monday, June 25th, 2012

The ultimate goal of cloud operators is to provide high-performance and robust computing resources for the end users while minimizing the build and operational costs. Specifically OpenStack Swift operators want to build storage clouds with higher throughput bandwidth but with lower initial hardware purchase cost and lower ongoing IT-related costs (e.g. IT admin, power, cooling etc.).

In this blog, we present a novel way to achieve the above goal by using nodes in a Swift cloud which mix storage and proxy services. This is contrary to the existing common practice in Swift deployments to exclusively run proxy and storage services on separate nodes, where the proxy and authentication services are suggested to run on hardware with high-end CPU and networking resources, and the objects stored on storage nodes without the need for similar compute and networking resources as the proxy nodes. However, our proposal is to provide a new alternative for building cost-effective Swift clouds with higher bandwidth, but without compromising the reliability. This method can be easily adopted by cloud operators who have already built a Swift instance on some existing hardware or who are considering to buy new hardware to build their Swift clouds.

Our idea is based on following principle: when uploading an object to Swift with M replicas, (M/2)+1 out of M writes need to be successful before the uploader/client is acknowledged that the upload is successful. For example, in a default Swift environment which enforces 3 data replicas, when a client sends an object to Swift, once 2 writes complete successfully, Swift will declare a success message to the client - the remaining 1 write can be a delayed write and Swift will rely on the replication process to ensure that the third replica will be successfully created.

Based on the above observation, our idea is to speed up the 2 successful writes so that Swift can declare a success as soon as possible in order to increase the overall bandwidth. The third write is allowed to be finished at a slower pace and its goal is to guarantee the data availability when 2 zones fail. To speed up the 2 successful writes, we propose to run storage, proxy and authentication services on high-end nodes.

We call these mixed nodes Sumo Nodes: Nodes in a Swift cloud which provide proxy and authentication services as well as provide storage services.

The reason why we want to mix the storage and proxy services on these high-end nodes is based on the following observations: Proxy nodes are usually provisioned with high-end hardware, including 10Gbps network, powerful multi-core CPUs and large amount of memory. In many scenarios these nodes are over-provisioned (as we discussed in our last “pitfall” blog). Thus, we want to take full advantage of their resources by consolidating both proxy and storage services on one sumo node, with the goal to complete the 2 successful writes faster.

Sumo nodes will typically be interconnected via 10Gbps network. Writes to sumo nodes will be done faster because they are either local writes (proxy server sends data to storage service within the same node) or the write is done over a 10Gbps network instead of transferring to a storage node connected via a 1Gbps network. So, if two out of three writes for an upload are routed to the sumo nodes, the Swift cloud will return success much faster than the case when an acknowledgement from a traditional storage node is needed.

In the following discussion, we consider a production Swift cloud with five zones, three object replicas and two nodes with proxy services. (Five zones provide redundancy and fault isolation to ensure that three-replica requirement for objects is maintained in the cloud even when an entire zone fails and the repair for that zone takes significant time.)

Key considerations when designing a Swift cloud based on sumo nodes

How should I setup my zones with sumo nodes and storage nodes?

Each sumo node needs to belong to a separate zone (i.e. one zone should not have two sumo nodes). In our example Swift cloud the two sumo nodes represent two zones and rest of the three zones are distributed on the three storage nodes.

Can you guarantee that all “2 successful writes” will be done on the sumo nodes?

No. Upon each upload, every zone will have a chance to take one copy of the data. Based on the setup of two zones on the two sumo nodes and three zones on the storage nodes, it is even possible that all 3 writes will go to the three zones on the storage nodes.

However, we can increase the probability of having “2 successful writes” done on the sumo nodes. One straightforward but potentially costly way is to buy and use more sumo nodes. But, another way is to optimize the “weight” parameter associated with the storage devices on sumo nodes. (The weight parameter will decide how much storage space is used on a storage device, e.g. a device with 200 weight will take 2X amount of data than a device with 100 weight.) By giving higher weight value to the sumo nodes as compared to the storage nodes, more data will be written into the sumo nodes, thereby increasing the probability of “2 successful writes” being done on the sumo nodes.

What is the impact of using higher weight value on the sumo nodes?

As discussed above, when using a higher weight value on the sumo nodes, the probability of “2 successful writes” done at those nodes will be higher, so the bandwidth of the Swift should be better. However, using higher weight value will also speed up the device/disk space usage on the sumo nodes causing these nodes to reach their space limit quickly. Once the space limit on sumo node is reached, all remaining uploads will go the storage nodes. So, the performance of the sumo-node based Swift cloud would be downgraded to a typical Swift cloud (or even worse if the storage nodes being used are wimpy). However, the bottom line is that the storage nodes should have enough storage space to guarantee the data can be completely replicated on them and be able to accommodate failure of a sumo node. For example, assuming each sumo node has 44 available HDD bays and each HDD bay is populated with 2 TB HDD. Once the available 88 TB of space is used up on the sumo nodes, the users of the Swift cloud will stop enjoying the performance improvements because of sumo nodes (the “sumo effect”) even though there might be additional usable storage capacity on the cloud.

The disk usage of sumo nodes can be easily controlled by the weight value. Overall, the right value of the weight parameter will be determined by the individual cloud operators depending on (1) how much total data they want to save in the sumo-node based Swift cloud and (2) how much is the raw space limit on sumo nodes.

In a sumo-based Swift cloud, the function of storage nodes is to help guarantee the data availability. We expect the storage nodes to be wimpy nodes in this configuration. However, the storage nodes should have enough total disk space to handle the capacity requirements of the cloud. At a minimum, storage nodes should have the combined capacity of at least one sumo node.

While performance will increase when using sumo nodes, but what is the impact on the total cost of ownership of the Swift cloud?

Since the sumo nodes provide functionality and capacity of multiple storage nodes, cloud builders will save on the cost of purchasing and maintaining some storage nodes.

Using sumo nodes will not compromise any data availability because we still keep the same number of zone and data replicas at all times as compared to the traditional setup of Swift. However, we should pay attention to the robustness and health status of the sumo nodes. Failure of a sumo node is much more impactful than failure of a traditional proxy node or a storage node. If a sumo node fails, the cloud will lose one of the proxy processes as well as will need to replicate all of the objects stored on the failed sumo node to remaining nodes in the cloud. Therefore, we recommend investing more on the robustness features on the sumo nodes, e.g. dual fans, redundant power supply etc.

To choose the right hardware for sumo nodes and storage nodes, based on your cost, capacity and performance considerations, we recommend using the techniques and process of the Swift Advisor.

What use cases are best suited for sumo-based Swift configurations?

Sumo-based Swift configurations can be best applied in the following use cases: (1) high performance is a key requirement from the Swift cloud, but a moderate amount of usable capacity is required (e.g. less than 100TB) or (2) the total data stored in the Swift cloud will not change dramatically over time, so a cloud builder can plan ahead and pick the right sized sumo nodes for their cloud.

Sumo Nodes and Wimpy Storage Nodes in action

To validate the effectiveness of the sumo based Swift cloud, we ran extensive benchmarks in our labs. We looked at the impact of using Sumo nodes on the observed bandwidth and net usable storage space of the Swift cloud (we assume that cloud operators will not use the additional capacity, if any available, in their Swift cloud once the “sumo effect” wears off).

As before, we use Amanda Enterprise as our application to backup and recover a 15GB data file to/from the Swift cloud to test its write and read bandwidth respectively. We ensure that one Amanda Enterprise server can fully load the Swift cloud in all cases.

Our Swift cloud is built on EC2 compute instances. We use Cluster Compute Quadruple Extra Large Instance (Quad Instance) for the sumo nodes and use Small or Medium Instance for the storage nodes.

To build a minimal production-ready sumo-node based Swift cloud, we use 1 load balancer (pound-based load balancer, running on a Quad Instance), 2 sumo nodes and 3 storage nodes. We set 2 zones on the 2 sumo nodes and the remaining 3 zones are on the 3 storage nodes. To do an apples-to-apples comparison, we also setup a Swift cloud with traditional setup (storage services running exclusively on designated storage nodes) using the same EC2 instances on the load balancer and proxy nodes (Quad instance), and storage services running on five storage nodes (Small or Medium Instance). All storage nodes in traditional setup have the same and default values of the weight parameter.

We use Ubuntu 12.04 as base OS and install the Essex version of Keystone and Swift across all nodes.

We first look at the results of upload workload with different weight values on sumo nodes. In sumo-based Swift, we set the weight values on the storage nodes to be always 100. We use “sumo (w:X)” in Figure 1 and 2 to denote that the value of weight parameter on sumo node is X. E.g. sumo (w:300) represents a Swift cloud with sumo nodes with weight value set at 300.

Figure 1 shows the trade-offs between the observed upload bandwidth and projected storage space of a sumo-node based Swift cloud before the space limit of sumo nodes is reached – this is the point when the “sumo effect” of performance improvement tapers off. The projected storage space is calculated by assuming 44 HDD bays on each node and each HDD bay is populated with a 2TB HDD. The readers can have their own calculations on the storage space depending on their specific situation.

As we can see from the above Figure 1, with either Small or Medium Instances being used for storage nodes, the sumo-based Swift cloud with weight 300 always provides the highest upload bandwidth and lowest usable storage space. On the other hand, the traditional setup and “sumo (w:100)” both provide the largest usable storage space – but even in this case the sumo-based Swift cloud has higher upload bandwidth as it puts some objects on the faster sumo nodes.

In Figure 1, we also observe that: as the weight increases on the sumo nodes, the bandwidth gap between using Small Instances for the storage nodes and using Medium Instances for the storage nodes shrinks. For example, the bandwidth gap at “sumo (w:200)” is about 24 MB/s, but the it is reduced to 13 MB/s at “sumo (w:300)”. This implies that in a sumo-based Swift cloud, you may consider using wimpy storage nodes, as it may be more cost-effective as long as your performance considerations are met. The reason is that the effective bandwidth of a sumo-based Swift cloud is mostly determined by the sumo nodes when the weight of the sumo nodes is set to large.

For completeness, we also show the results of download workload in Figure 2. Similar to the Figure 1, the sumo-based Swift cloud delivers the highest download bandwidth and provides the lowest storage space when using weight 300 for sumo nodes. On the other hand, when using weight 100, it has the same usable capacity as the traditional setup, but still has higher download bandwidth as compared to the observed download bandwidth with traditional setup.

From the above experiments, we observe interesting trade-offs between bandwidth achieved and usable storage space until the point the Swift cloud is enjoying the “sumo effect”. Overall, the sumo-based Swift cloud is able to (1) Significantly boost the overall bandwidth with relatively straightforward modifications on the traditional setup and (2) Reduce the TCO by using smaller number of overall nodes, and requiring wimpy storage nodes, while not sacrificing the data availability characteristics of the traditional setup.

Above analysis and numbers should give Swift builders a new option to deploy and optimize their Swift clouds and seed other ideas on how to allocate various Swift services across nodes and choose hardware for those nodes.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at swift@zmanda.com

Building a Swift Storage Cloud? Avoid Wimpy Proxy Servers and Five other Pitfalls

Wednesday, May 23rd, 2012

We introduced OpenStack Swift Advisor in a previous blog: a set of methods and tools to assist cloud storage builders to select appropriate hardware based on their goals from their Swift Cloud. Here we describe six pitfalls to avoid, when choosing components for your Swift Cloud:

(1) Do not use wimpy servers for proxy nodes

The key functionality of  a proxy node is to process a very large amount of API requests, receive the data from the user applications and send them out to the corresponding storage nodes. Proxy node makes sure that a minimum number of required replicas get written to storage nodes. Reply traffic (e.g. Restore traffic, in case the Swift Cloud is used for Cloud Backup) also flows through the proxy nodes. Moreover, the authenticating  services (e.g. keystone, swauth) may also be integrated into the proxy nodes.  Considering these performance critical functions being performed by proxy nodes, we strongly advise cloud storage builders to consider powerful servers as the proxy nodes. For example, a typical proxy node can be provisioned with 2 or more multi-core Xeon-class CPUs, large memory and 10G Ethernet.

There is some debate on whether a small number of powerful servers or a large number of wimpy servers should be used as the proxy nodes. It is possible that the initial cost outlay of a large number of wimpy proxy nodes may be lower than a smaller number of powerful nodes, while providing acceptable performance. But for data center operators, a large number of wimpy servers will inevitably incur higher IT related costs (personnel, server maintenance, space rental, cooling, energy and so on). Additionally, more servers will need more network switches, thus decreasing some of the cost benefits as well as increasing failure rate. As your cloud storage service gets popular, scalability will be challenging with wimpy proxy nodes.

(2) Don’t let your load-balancer be overloaded

Load-balancer is the first component of a Swift cluster that directly faces the user applications. Its primary job is to take all API requests from the user application and evenly distribute them to the underlying proxy nodes. In some cases, it has to do the SSL termination to authenticate the users, which is a very CPU and network intensive job. An overloaded load-balancer inherently defeats the purpose by becoming the bottleneck of  your Swift cluster’s performance.

As we have discussed in a previous blog (Next Steps with OpenStack Swift Advisor), the linear scalability of a Swift cluster on performance can be seriously inhibited by a load-balancer which doesn’t keep up with the load. To reap the benefits of your investment in proxy and storage nodes, you should make sure that the load-balancer is not underpowered especially for peak load conditions on your storage cloud.

(3) Do not under-utilize your proxy nodes

Proxy node is usually one of the most expensive component in the Swift cluster. Therefore, it is desirable for cloud builders to fully utilize the resources in their proxy nodes. A  good question being asked by our customers is: how many storage nodes should I attach to a proxy node ? or what is the best ratio between the proxy and storage nodes ? If your cloud is built with fewer storage nodes per proxy node, you may be  under-utilizing your proxy nodes, as shown in the following Figure 1(a). (While we have simplified the illustrations, the factors of performance changes indicated in following figures are based on actual observations in our labs.) In this example, initially, the Swift cluster consists of 3 nodes: 1 proxy node and 2 storage nodes (we use capital P and S in the picture to denote proxy and storage nodes respectively). The write throughput of that 3-node Swift cluster is X MB/s. However, if we add two more storage nodes to that Swift cluster, as shown in Figure 1(b), the throughput of the 5-node Swift cluster becomes 2X MB/s.  So the throughput along with capacity of Swift cluster can be doubled (2X) by simply adding in two storage nodes. In terms of the cost per throughput and cost per GB,  the 5-node Swift cluster in this example will likely be more efficient.

(4) Do not over-utilize the proxy nodes

On the other hand, you can’t keep attaching the storage nodes without increasing your proxy nodes at some point. In Figure 2(a), 1 proxy node has been well-utilized by the 4 storage nodes with 2X MB/s throughput. If more storage nodes are attached to the proxy node,  as shown in Figure 2(b), its throughput will not increase because the proxy node is already busy with the 4 storage nodes. Therefore, attaching more storage nodes to a well-utilized (nearly 100% busy) proxy node will only make the Swift cluster less efficient in terms of the cost per throughput. However note that you may decide to over-subscribe proxy nodes, if you are willing to sacrifice potential performance gains by adding more proxy nodes, and you simply want to add more capacity for now. But to increase capacity, first look into making sure you are adding enough disks to each storage node, as described in the next pitfall.

(5) Avoid disk-bounded storage nodes

Another common question we get is: how many disks should I put into my storage node? This is a crucial question with implications on cost/performance and cost/capacity. In general, you want to avoid storage nodes which are bottlenecked on performance due to less number of disk spindles as illustrated by the following picture.

Figure 3(a) shows a Swift cluster consisting of 1 proxy node and 2 storage nodes, with each storage node attached to 1 disk. Let’s assume the throughput of this Swift cluster is Y MB/s. However, if we add one more disk on each storage node based on Figure 3(a), we will have two disks on each storage node, as shown in Figure 3(b). Based on our observations the throughput of the new Swift cluster may increase by as much as 1.5Y MB/s. The reason why the throughput is improved by simply attaching more disks is: in Figure 3(a), one disk in each storage node can easily be overwhelmed (i.e. 100% busy) when transferring the data from/to the storage nodes, while other resources (e.g. CPU, memory) in the storage node are not fully-utilized, hence the storage node being “disk-bounded”. However, since more disks are added to each storage node and all disks can work in parallel during the data transfers, the bottleneck of the storage node is shifted from disks to other resources, and thus, the throughput of Swift cluster can be improved. In terms of cost per throughput, Figure 3(b) is more efficient than Figure 3(a), since the cost of adding more disk is significantly less than the cost of the whole server.

An immediate follow-up question is: can the throughput keep increasing by attaching more disks to each storage node? Of course, the answer is No. Figure 4 shows the relationship between the number of disks attached to each storage node and the throughout of Swift cluster. As the number of disks increases from 1, the throughput is indeed improved but after some point (we call it “turning point”), the throughput stops increasing and becomes almost flat later on.

Even though the throughput of Swift cluster can not keep improving by attaching more disks,  some cloud storage builders may want to put large number of disks in each storage node, as doing that does not hurt the performance. Another metric, cost per MB/s per GB of available capacity,  tends to be minimized by adding more disks.

(6) Do not rely on two replicas of data

One more question we get frequently asked from our customers is: can we use 2 replicas of data in the Swift cluster in order to save on cost of storage space ? Our recommendation is: No. Here is why:

Performance: it may seem that a Swift cluster which maintains 2 replicas of data will have better performance when the data is written to the storage nodes as compared to a cluster which maintains 3 replicas (which has one more write stream to the storage nodes). However, in actuality,  when the proxy node attempts to write to N replicas, it only requires  (N/2)+1 successful responses out of N to declare a successful write. That is to say, only (N/2)+1 out of N concurrent writes are synchronous, while the rest of the writes can be asynchronous and Swift will rely on the replication process to ensure that the remaining copies are successfully created.

Based on the above, and in our tests comparing the “3-replication Swift cluster” and  “2-replication Swift cluster”:  they will both generate 2 concurrent synchronous writes to the storage nodes.

Risk of data loss: We recommend using commodity off-the-shelf storage for Swift Storage Nodes, without even using RAID. So, the replicas maintained by Swift are your defense against data loss. Also, lets say a Swift cluster has 5 zones (which is the minimum number of recommended zones) and 3 replicas of data. With this setup, up to two zones can fail at the same time without any data loss. However, if we reduce the number of replications from 3 to 2, the risk of data loss is increased by 100%, because the data can only survive one zone failure.

Avoiding above pitfalls will help you to implement a high-performance and robust Swift Cloud, which will scale to serve your cloud storage needs for several years to come.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Next Steps with OpenStack Swift Advisor - Profiling and Optimization (with Load Balancer in the Mix)

Sunday, April 22nd, 2012

In our last blog on building Swift storage clouds, we proposed the framework for the Swift Advisor - a technique that takes two of  the three constraints (Capacity, Performance, Cost) as  inputs,  and provides hardware recommendations as output - specifically count and configuration of systems for each type of node (storage and proxy) of  the Swift storage cloud (Swift Cloud). Plus, we also provided a subset of our initial results for the Sampling phase.

In this blog, we will continue the discussion on Swift Advisor, first focusing on the impact of the load balancer on the aggregate throughput of the cloud (we will  refer to it as “throughput”) and then provide a subset of outcomes for the profiling and optimization phases in our lab.

Load Balancer

The load balancer distributes the incoming API requests evenly across the proxy servers. As shown below, the load balancer sits in front of the proxy servers to forward the API requests to them and can be connected with any number of proxy servers.

load balancer

If a load balancer is used, it is the only entry point of the Swift Cloud and all user data goes through it. So it is a very important component to consider for user visible performance of your Swift Cloud. In case it is not properly provisioned, it will become a severe bottleneck that inhibits the scalability of the Swift Cloud.

At a high-level, there are two types of load balancers:

Software Load Balancer: Runs a software load balancing software (e.g. Pound, Nginx) or round robin DNS on a server to evenly distribute the requests among proxy servers. The server running the software load balancer usually requires powerful multi-core CPUs and extremely high network bandwidth.

Hardware Load Balancer: Leverages the network switch/firewall or dedicated hardware with capability of load balancing to assign the incoming data traffic to the proxy servers of Swift Cloud.

Regardless of whether a software or hardware load balancer is used, the throughput of the Swift cloud cannot scale beyond the bandwidth of the load balancer. Therefore, we advise the cloud builders to deploy a powerful load balancer (e.g. with 10 Gigabit Ethernet) so that its “effective” bandwidth  exceeds the expected throughput of the Swift cloud.  We recommend that you pick your load balancer so that with a fully loaded (i.e. 100% busy) Swift Cloud, the load balancer still has around 50% unused capacity for future planning or sudden needs of higher bandwidth.

To have a sense of how to properly provision the load balancer and how it impacts the throughput of Swift Cloud, we show some results of running the Swift Cloud of c proxy and cN storage server (c:cN Swift Cloud) with the load balancer. (N is the “magic” value for 1:N Swift Cloud found in Sampling phase). These results are the “performance curves” for the profiling phase and can be directed used for optimizing your goal.

The experiments

In our last article, we already used some running examples to show how to get the output results from the Sampling phase. Here, we directly use the outputs (1:N swift cloud) of sampling phase as the inputs of the profiling phase, as seen below,

  • 1 Large Instance based proxy node: 5 Small Instance based storage nodes (N=5)
  • 1 XL Instance based proxy node: 5 Small Instance based storage nodes (N=5)
  • 1 CPU XL Instance based proxy node: 5 Small Instance based storage nodes (N=5)
  • 1 Quad Instance based proxy node: 5 Medium Instance based storage nodes (N=5)

Based on the above 1:5 swift clouds, we profile the throughput curves of c:c5 Swift cloud (c = 2, 4, 6,…) with the following setups of load balancer:

  1. Using one “Cluster Compute Eight Extra Large Instance” (Eight) with  Pound (a reverse proxy, load balancer) as the software load balancer (”1 Eight”), that all proxy nodes are connected to. (Eight Instance is one-level more powerful than Quad Instance. Similar to the Quad Instance, it also equips 10Gigabit Ethernet, but has 2X amount of CPU resources, 2 x Intel Xeon ES-2670, eight-core “Sandy Bridge” architecture, and 2X of memory.)
  2. Using two identical Eight Instances (each runs with Pound) as the load balancers (”2 Eight”). 50% proxy nodes are connected to the first Eight Instance and another 50% proxy nodes are linked to the second Eight Instance. The storage nodes have no sense of the first and second half of proxy nodes and accept all data from all of the proxy nodes.

Again, we use Amanda Enterprise as our application to backup a 20GB data file to the c:c5 Swift Cloud. We concurrently run two Amanda Enterprise servers on two EC2 Quad instances to send data to the c:c5 Swift cloud, ensuring that two Amanda Enterprise servers can fully load the c:c5 Swift cloud in all cases.

For this experiment, we focus on the backup operations, so the aggregate throughput of backup operations is simply regarded as “throughput” (MB/s) measured between the two Amanda Enterprise servers and the c:c5 Swift cloud.

Let’s first look at the throughput curves (throughput on Y-axis, values of c on X-axis) of c:c5 Swift cloud with the two types of load balancers for each of above mentioned configurations of proxy and storage nodes.

(1) Proxy nodes run on the Large instance and the storage nodes run on the Small instance. The two curves are for the two types of load balancers (LB):

Proxy nodes run on the Large instance

(2) Proxy nodes run on the XL instance and the storage nodes run on the Small instance.

Proxy nodes run on the XL instance

(3) Proxy nodes run on the CPU XL instance and the storage nodes run on the Small instance.

Proxy nodes run on the CPU XL instance

(4) Proxy nodes run on the Quad instance and the storage nodes run on the Medium instance.

Proxy nodes run on the Quad instance

From the above 4 figures, we can see that throughput of c:c5 Swift cloud using 1 Eight instance as the load balancer can not scale beyond 140MB/s. While, with 2 Eight instances as the load balancer, the c:c5 Swift Cloud can scale in linear shape (for the values of “c” we tested with).

Next, we combine the above results of “2 Eight” load balancer  into one picture, and look at it from another point of view –  throughput on Y-axis, cost ($) on X-axis. (As you may recall from our last blog, the cost is defined as the EC2 usage cost of running c:c5 swift cloud for 30 days.)

load balancer  into one picture

The above graph tells us several things:

(1) The configuration of using CPU XL instances for proxy nodes and Small instances for Storage node is not a good choice, because when compared with configuration of using XL instances for proxy nodes and Small instances for Storage node, it consumes similar cost, but delivers lower throughput. The reason for this is our observation that XL instances provide better bandwidth than CPU XL instances. AWS marks the I/O performance (including the network bandwidth) of  both XL instance and CPU XL instance as “High”. From our pure network bandwidth testing, XL instance shows maximum 120 MB/s for both incoming and outgoing bandwidth, while CPU XL instance has maximum 100 MB/s for both incoming and outgoing bandwidth.

(2) The configuration of using Large instances on proxy nodes and Small instances on Storage node is the most cost-effective. Since within each throughput group (marked as dotted circle in the figure): low, medium and high, it achieves the similar throughput, but with much lesser cost. The reason  this configuration can be cost-effective is because Large instance can provide the maximum 100 MB/s for both incoming and outgoing network bandwidth, which is similar to the XL and CPU XL instances, but is associated with 2x lower cost than the XL and CPU XL instances.

(3) While using Large instances on proxy nodes and Small instances on Storage node is very cost-effective, but the configuration of using Quad instances on proxy nodes and Medium instances on Storage node is also an attractive option. Especially if you consider the manageability and failure issues. To achieve 175MB/s througput, you can choose either 8 Large instance based proxy nodes and 40 Small instance based storage nodes (total 48 nodes), or 4 Quad instance based proxy nodes and 20 Medium instance based storage nodes (total 24 nodes). Hosting and managing more nodes in the data center may require higher IT-related costs, e.g. power, # of server racks, failure rate and IT administration. Considering those costs, it may be more attractive to setup a Swift Cloud with smaller number of more powerful nodes.

Based on the data in the above figure and considering the IT-related costs, the goal of the optimization phase is to choose the configuration that optimizes your goal best. For example, if you input the performance and capacity constraints and want to minimize the cost, let’s suppose the two configuration: (1) using Large instances for proxy nodes and Small instances for Storage nodes, and (2) using Quad instances for proxy nodes and Medium instances for Storage nodes, can both satisfy your capacity constraint. Now, the only thing left is that you want to figure out which configuration has less cost to fulfill the throughput constraint. The final result depends on your IT management costs. If your IT management cost is relatively expensive, then you may want to choose second configuration, otherwise, the first configuration will likely incur lesser cost.

In the future articles, we will talk about how to map the EC2 instances to the physical hardware so that the cloud builders can build an optimized Swift cloud running on physical servers.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

OpenStack Swift Advisor: Building Cloud Storage with Optimized Capacity, Cost and Performance

Wednesday, April 18th, 2012

OpenStack Swift is an open source cloud storage platform, which can be used to build massively scalable and highly robust storage clouds. There are two key use cases of Swift:

  • A service provider offering cloud storage with a well defined RESTful HTTP API - i.e. a Public Storage Cloud. An ecosystem of applications integrated with that API are offered to the service provider’s customers. Service provider may also choose to only offer a select service (e.g. Cloud Backup) and not offer access to the API directly.
  • A large enterprise building a cloud storage platform for use for internal applications - i.e. a Private Storage Cloud. The organization may do this because it is reluctant to send its data to a third party public cloud provider or to build a cloud storage platform which is closer to the users of its applications.

In both of above cases, as you plan to build your cloud storage infrastructure, you will face one of these three problems:

  1. Optimize my cost: You know how much usable storage capacity you need from your cloud storage, and you know how much aggregate throughput you need for applications using the cloud storage, but you want to know what is the least amount of budget you need to be able to achieve your capacity and throughput goals.
  2. Optimize my capacity: You know how much aggregate throughput you need for applications using the cloud storage, and you know your budget constraints, but you want to know the maximum capacity you can get for your throughput needs and budget constraints.
  3. Optimize my performance: You know how much usable storage capacity you need from your cloud storage, and you know your budget constraints, but you need to know the configuration to get best aggregate throughput for your capacity and budget constraints.

Solving any of the three problems above is very complex because of the myriad choices that the cloud storage builder has to make, e.g. size and number of various types of servers, network connectivity, SLAs etc. We have done extensive work in our labs and with several cloud providers to understand above problems and to address them with rigorous analysis. In this series of blogs we will provide some of the results of our findings as well as description of tools and services which can help you to build, deploy and maintain your storage cloud with confidence.

Definitions Since the terms used can be interpreted differently depending on context, below are the specific definitions used in this series of blogs for the three key parameters:

Capacity: It is the usable storage capacity, i.e. the size of the maximum application data that can be stored on the cloud storage. Usually, for better availability and durability, the data is replicated in the cloud storage across multiple systems.  So, the the raw capacity of the cloud storage should be planned with the consideration of data redundancy. For example, in OpenStack Swift,  each object is replicated three times by default. So, the total size of raw storage will be at least three times larger than the usable storage capacity.

Performance: It is the maximum aggregate throughput (MB/s or GB/s) that can be achieved by applications from the cloud storage. In this blog, we will also use the term throughput to denote aggregate throughput.

Cost: For this discussion we will only consider the initial purchase cost of the hardware for building the cloud storage. We expect that the built cloud storage will be put to use for several years, but we are not amortizing the cost over a period of time.  We will point out best practices to reduce on-going maintenance and scaling costs. For this series of blogs we will use the terms “node” and “server” interchangeably. So, “storage node” is same as “storage server”.

Introducing the framework for the Swift Advisor

The Swift Advisor is a technique that takes two of  the three constraints (Capacity, Performance, Cost) as  inputs,  and provides hardware recommendation as output, specifically count and configuration of systems for each type of node (storage and proxy) of  the Swift storage cloud. This recommendation is optimized for the third constraint: e.g. minimize  your budget, maximize your throughput, or maximize your usable storage capacity.

Before discussing the technical details of the Swift Advisor, let’s first look at a practical way to use the Swift Advisor: In order to build an optimized Swift cloud storage (Swift Cloud), an important feature of Swift Advisor is to consider a very large range of hardware configurations (e.g. a wide variety of CPU, memory, disk and network choices). However, it is unrealistic and very expensive to blindly purchase a large amount of physical hardware upfront and let Swift Advisor evaluate their individual performances as well as the overall performance after putting them together. Therefore, we choose to leverage virtualized and elastic environment offered by Amazon EC2 and build an optimized Swift Cloud on the EC2 instances initially.

While it may seem ironical that we are using a public compute cloud to come up with an optimized private storage cloud, the reasons for choosing EC2 as the test-bed for Swift Advisor are multi-fold: (1) EC2 provides many types of EC2 instances with different capacities of CPU, memory and I/O to meet the various needs. So, the Swift Advisor can try out many types of EC2 instances on the basis of pay-per-use, instead of physically owning the wide variety of hardware needed. (2) EC2 has a well defined pricing structure.

This provides a good comparison point for the cloud storage builders - they can look at the pricing information and justify the cost of owning their own cloud storage in the long run. (3) Specification of each type of EC2 instance, including CPU, memory, disk and network  is well defined. Once an optimized Swift Cloud is built on the EC2 instances with the input constraints, the specifications of those EC2 instances can effectively guide  the purchases of physical servers to build a Swift Cloud running on the physical hardware. In summary, you can use the elasticity of a compute cloud along with Swift Advisor to get specifications for your physical hardware based storage cloud, while preserving your desired constraints.

The high-level workflow of the Swift Advisor is shown below: The high-level workflow of the Swift Advisor There are four important phases and we explain them as follows:

Sampling Phase: Our eventual goal is to build an optimized Swift cloud consisting of quantity A of proxy servers and quantity B of storage severs  - A and B are unknown initially and we denote it as A:B Swift Cloud. In this first phase we focus on performance and cost characteristics of 1:N Swift Cloud. We look for the “magic” value of N that makes a 1:N Swift Cloud with the lowest cost per throughput ($ per MB/s) . The reason why we want to find a 1:N Swift cloud with the lowest $ per MB/s is to remove two potential pitfalls when building a Swift cloud : (1) Under-provisioning: the proxy server is under utilized and can still be attached to more storage servers to improve the throughput. (2) Over-provisioning: the proxy server has been overwhelmed by too many storage servers.

Since the potential combinatorial space for storage and proxy node choices is potentially huge, we use several heuristics to prune the candidates during various phases of the Swift Advisor. For example we do not consider very low powered configuration (e.g. Micro Instances) for proxy nodes.

After the sampling phase, for each combination of EC2 instance sizes on proxy and storage servers, we know the “magic” value of N that produces the lowest $ per MB/s of running a 1:N Swift cloud. You can run the sampling phase on any available virtual or physical hardware, but the larger the sample set the better.

Profiling Phase: Given the “magic” values of N from the sampling phase, our goal in this phase is to profile throughput curves (the throughput verses the size of Swift cloud) of several Swift clouds consisting of c proxy server and cN storage servers (c:cN Swift Cloud) with various values of c.

Please note that each throughput curve corresponds to each combination of hardware configuration (EC2 instance sizes in our case) of the proxy and storage servers. In our experiments, for each combination of EC2 instance sizes of the proxy and storage servers, the profiling starts from 2:2N Swift Cloud and we double the number of proxy and storage servers each time. (e.g. 4:4N, 8:8*N, ….). All cN EC2 instances for storage nodes are identical.

The profiling stops when the throughput of c:cN Swift Cloud is larger than the throughput constraint. After that, we apply a non-linear or linear regression on the profiled throughputs to plot a throughput curve with the X-values of c and Y-values of the throughput. The output of the profiling phase is a set of throughput curves of c:cN Swift Cloud, where each curve corresponds to a combination of EC2 instance sizes of the proxy and storage servers.

Optimization Phase: By taking the throughput curves from the profiling phase and two input constraints, the optimization phase is where we figure out a Swift Cloud optimized for the third parameter. We do this by plotting constraints on each throughput curve and look for the optimized value across all curves.

For example, lets say we are trying to optimize capacity with maximum budget given and minimum throughput requirement:  we will input the minimum required throughput on each throughput curve and find the corresponding values of c, and then reject the throughput curves where the implied hardware cost is more than the budget. Out of the remaining curves we will select the one resulting in maximum capacity based on cN * storage capacity of the system used for storage server.

Validation and Refinement Phase: The validation phase checks if the optimized Swift cloud really conforms to the throughput constraint through a test run of the workloads. If the test run fails a constraint, then the Swift Advisor goes to the refinement phase. The refinement phase gets the average throughput measured from the test run and sends it to the profiling phase.

The profiling phase adds that information to the profiled data to refine the throughput curves. After that, we use the refined throughput curves as the inputs to redo the optimization phase. The above four phases consists of the core of Swift Advisor. However, there are some important remaining issues to be discussed:

(1) choice of the load balancer

(2) mapping between the EC2 instance and the physical hardware when the cloud operators finally want to move the optimized Swift Cloud to physical servers, while preserving the three constraints on the new hosting hardware.

(3) SLA constraints. We will address these and other issues in building an optimized storage cloud for your needs in our future blogs.

Some Sampling Observations

In this blog, we present some of the results based on running Sampling phase on a selected configuration of systems. In future blogs, we will post the results for Profiling phase and Optimization phase.

For our sampling phase, we assume the following potential servers are available to us for proxy node: EC2 Large (Large), EC2 Extra Large (XL), EC2 Extra Large CPU-high (CPU XL) and EC2 Quadruple Extra Large (Quad). While the candidates for storage node are: EC2 Micro (Micro), EC2 Small (Small) and EC2 Medium (Medium).

Therefore, the total number of combinations of  proxy and storage nodes is 4 * 3 =12 and we need to find the “magic” value of N that produces the lowest $ per MB/s of running a 1:N Swift cloud for each combination. We start the sampling for each combination from N=5, and increase it until the throughput of 1:N Swift Cloud stops increasing. Note that a production Swift Cloud implementation requires at least 5 storage nodes. This happens when the proxy node is fully loaded and adding more storage nodes can not improve the throughput anymore.

We use Amanda Enterprise as our application to backup a 10G data file to the 1:N Swift cloud. The Amanda Enterprise runs on an EC2 Quad instance to ensure that one Amanda Enterprise server can fully load the 1:N Swift cloud in all cases. For this analysis we are assuming that the cloud builder is building the cloud storage optimized for backup operations. The user of the Swift Advisor should change the test workload based on the desired mix of expected application workload when the cloud storage goes production. We first look at the throughput for different values of N on each combination of EC2 instance sizes on proxy and storage nodes.

(1) Proxy node runs on EC2 Large instance and the three curves are for the three different sizes for the storage node:

Proxy node runs on EC2 Large instance

Observations with EC2 Large Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(2) Proxy node runs on EC2 XL instance: Proxy node runs on EC2 XL instance

Observations with EC2 XL Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(3) Proxy node runs on EC2 CPU XL instance: Proxy node runs on EC2 CPU XL instance

Observations with EC2 CPU XL Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(4) Proxy node runs on EC2 Quad instance: Proxy node runs on EC2 Quad instance

Observations with EC2 Quad Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 60
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 20
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 10

Looking at above graphs, we can already draw some conclusions: E.g. if the only storage nodes available to you were equivalent to EC2 Micro Instance and you wanted your storage cloud to be able to scale beyond 30 storage nodes (per proxy node), you should pick at least EC2 Quad Instance equivalent proxy node. Let’s look at the figures (1) - (4) from another view: fix the EC2 instance size of storage node and vary the EC2 instance size of proxy node

(5) Storage node runs on EC2 Micro instance and the four curves are for the four different EC2 instance sizes on proxy node: Observations with EC2 Micro Instance based Storage Node:

  1. Large Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  2. XL Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  3. CPU XL Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  4. Quad Instance based Proxy nodes: Throughput stops increasing at # storage node = 60

From the above graphs, we can conclude that, (a) when proxy node runs on the Quad instance, it has the capability, especially the network bandwidth, that can accommodate more storage nodes and achieve higher throughput (MB/s) than using other instances for the proxy node.  (b) Different EC2 instance sizes on storage node load the same proxy node at different speed: for example, when proxy node runs on the Quad instance, we need to use 60 Micro instances as storage nodes to fully load the proxy node.

While, if we use Small or Medium instance size on storage node, we only need 10 storage nodes to fully load the proxy node. Based on the above results on throughput, now we look at the $ per throughput (MB/s) for different values of N on each combination of EC2 instance sizes on proxy and storage nodes. Here, $ is defined as the EC2 usage cost of running 1:N Swift cloud for 30 days. In this blog we are only showing numbers with proxy node set to EC2 Quad Instance. We will publish numbers for other combinations in another detailed report.

(6) Proxy node runs on EC2 CPU Quad instance: Proxy node runs on EC2 CPU Quad instance Observations with EC2 Quad Instance based Proxy Node:

  1. Micro Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 60
  2. Small Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 15
  3. Medium Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 5

Overall, the lowest $ per MB/s in the above figure  is achieved by using Medium Instance based Storage nodes at # storage node = 5 This specific result will provide input to the profiling phase of N=5, 15 and 60 for proxy/storage node combination EC2 Quad/Medium, EC2 Quad/Small and EC2 Quad/Micro respectively.

So, one can conclude that when using 1 Quad Instance based Proxy node it may be better to use 5 Medium based Storage nodes to achieve the lowest $ per MB/s, rather than using more Micro Instance based storage nodes. Above graphs are a small subset of the overall performance numbers achieved during the Sampling phase.

The overall objective here is to give you a summary of our recommended approach to building an optimized Swift Cloud. As mentioned above, we will publishing detailed results in another report, as more conclusions and best practices in future blogs in this series.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com