How swift is your Swift? Benchmarking OpenStack Swift.

October 8th, 2012

The OpenStack Swift project has been developing at a tremendous pace. The version 1.6.0 was released in August followed by 1.7.4 (Folsom) just after two months!  In these two recent releases, many important features have also been implemented, for example the optimization for using SSD, object versioning, StatsD logging and much more – many of these features have significant implications for performance planning for the cloud builders and operators.

As an integral part of deploying a cloud storage platform based on OpenStack Swift, benchmarking a Swift cluster implementation is essential before the cluster is deployed for production use. Preferably the benchmark should simulate the eventual workload that the cluster will be subjected to.

In this blog, we discuss following Swift benchmarking concepts:
(1)    Benchmark Dimensions for Swift cluster: performance, scalability and degraded-mode performance (e.g. when hardware and software failures happen).
(2)    Sample workloads for Swift cluster

Benchmark Tools for Swift

There are currently two Swift benchmark tools available: swift-bench and COSBench.

swift-bench is a command-line benchmark tool that is shipped along with Swift distribution. Recently,  we improved swift-bench to allow for random object sizes and better usability.

COSBench is a fairly new web-based benchmark tool, led by the researchers at Intel. Fortunately, we obtained a trial version of COSBench. Based on our initial experience with COSBench, we believe it represents a very helpful tool, and may become the the de facto Swift benchmarking tool in the future.

Benchmark Dimensions

Dimension 1 – Performance

The performance dimension is to measure the performance of the Swift cluster when it is under a certain load. The performance metrics can be specified in many ways. In most cases, the cloud operators will be interested in the following four performance metrics:

(1)    The average throughput (number of operations per second)
(2)    The average bandwidth (MB/s)
(3)    The average response time of all requests.
(4)    Response time for a certain percentage of requests (e.g. 95 percentile).

To measure the performance, we first need to populate a Swift cluster with some data (i.e. objects) to simulate an initial stage. The size of the initially loaded objects can be controlled by the inputs of the benchmark client. Subsequently, a pre-defined workload is executed against the Swift cluster while the performance is measured.

When measuring the performance, there is one key issue we need to pay attention to:  First, we need to carefully adjust the number of threads because it determines how much workload the benchmark clients will generate against the Swift cluster. Since we want to measure the performance of the Swift cluster when it is under load or saturated, we need to increase the number of threads, until the point at which the bandwidth/throughput becomes stable and the average response time starts to increase very sharply.

As the number of threads increases, the benchmark client will get busier. We need to make sure that it has enough resources (CPU, memory, network bandwidth) to use and should not be the performance bottleneck.

While the performance of the client software (Cyberduck, Cloud Backup software etc.), that is connecting with Swift, is an important factor in the overall usability of the storage cloud, the scope of this blog is the performance of the storage cloud platform itself.

Dimension 2 – Scalability

The benchmark on scalability is to test if a Swift cluster can scale out gracefully by adding more servers and other resources. We can conduct this benchmark in the following steps:  we proportionally add more servers for each type of node in the Swift cluster. For example, we double the number of the storage nodes and proxy nodes with the same hardware and software configurations. Then, we run the same workloads to measure the performance. If a Swift cluster can scale out nicely, then its bandwidth/throughput will be increased in proportion to the number of new servers we added in. Otherwise, the cloud operators should analyze what is the bottleneck to prevent it from scaling well.

To simulate a real-world scenario, we need to test the scalability of a Swift cluster while it is running. As suggested by a blog from SwiftStack, cloud operators may consider adding new servers gradually in order to avoid the performance degradation because of the data movement between the existing and new servers. During the measurement, we want to observe: (1) if the Swift cluster operates normally (i.e. no period of service disruption) and (2) the increase on performance when the new servers are added into the Swift cluster.

Dimension 3 – Degraded Mode Performance

The cloud operators will face hardware or software failures at some points. If their objective is to ensure that their clusters will perform at a certain level (e.g. abide by the performance SLA) even in face of the failures, they should benchmark their Swift cluster appropriately upfront.

The most straightforward way to measure the availability of a Swift cluster is to intentionally shut down some nodes and measure the number of errors (e.g. failed operations) and performance degradation when the Swift is running in the degraded mode.

There are some factors that increase the complexities of benchmarking the degraded Swift cluster. For example, the failures can happen at every possible system level. For example, I/O devices, OS, Swift processes or even the entire server. The impact of failures is different when they occur at different levels. So, the failure scenarios at all system levels need to be considered. Such as, to simulate a disk failure, we may intentionally umount the disk; To simulate a Swift process failure, we need to kill some or all Swift processes on a node; To simulate an OS or entire server failure, the server could be temporarily powered off; Or a whole zone could be powered off (to simulate power failure of an entire rack of servers).

By combining the above considerations together, we notice that the total problem space for analyzing all failure scenarios may be very huge for a large-scale Swift cluster. So, it is more practical to prioritize those failure scenarios. For example, only the worst scenarios or more common scenarios are evaluated first.

In our presentation at the coming OpenStack Summit, we will present our empirical results to show how a Swift cluster performs when the hardware failures occur.

Sample Workloads

The COSBench tool allows users to define a Swift workload based on the following two aspects: (1) range of the object sizes in the workload (e.g. from 1MB to 10MB). (2) the ratio of PUT, GET and DELETE operations (e.g. 1:8:1).

The object sizes in a workload may have certain distributions. For example, uniform, Zipfan and more. At this point, based on our experiences with COSBench, it assumes the object sizes are uniformly distributed within the pre-defined range. Plus, it assumes all objects have the equal possibility to be accessed by the GET operation. It may be a good direction for COSBench to add more choices on the distribution when the users want to specify the object size and access pattern.

In the following table, we provide some sample Swift workloads in the following table.

Upload Intensive

Download Intensive

Small Objects (size range:1KB-100KB)

GET: 5%, PUT: 90%, DELETE:5%

Example: Online gaming hosting service — the game sessions are periodically saved as the small files which record the user profiles and game information in the order of the time series.

GET: 90%, PUT: 5%, DELETE:5%

Example: Website hosting service — once a new webpage is published by the owner, lots of read requests will hit on the new webpage.

Large Objects (size range:1MB – 10MB)

GET: 5%, PUT: 90%, DELETE:5%

Example: Enterprise Backup — small files are compressed into large trunk of data and backed up to cloud storage. Occasionally, the recovery and delete operations are needed.

GET: 90%, PUT: 5%, DELETE:5%

Example: Online video sharing service — once the new video clips are uploaded, lots of download traffic will be generated when people watch those new video clips.

Plus, the benchmark users are free to define their own favorite workloads based on the two inputs: range of object sizes and ratio between PUT, GET and DELETE operations.

We will discuss above dimensions and benchmarks workloads in detail in future blogs, as well as at our presentation at the OpenStack Summit in San Diego (Presentation at 4:10PM on October 18th). We hope to see you there.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

ZCB 4.3 now available; ever easier cloud backup!

September 5th, 2012

Past few months have been really busy around here. After releasing Zmanda Cloud Backup (ZCB) 4.2, which had a brand new purchasing option with a new cloud storage provider (Google), we had our task clearly cut out: make ZCB even easier to understand and use. So we got back to our laundry list and picked up a few items our users were requesting the most. The result is our latest release 4.3, available for download to all current and new ZCB customers.

So what’s new? Well, quite a lot:

A brand new help portal

Before we deep dive into the product improvements, let me introduce you to our new help portal:

This portal is available at http://help.zmanda.com and has all the documentation available in an easily searchable format. So whether you need any help in configuring ZCB or getting through a technical issue, please search this portal to see if your question has already been answered. We plan to constantly add to the knowledgebase, so please leave us your comments on the pages there!

Encryption made easier

We added a new, user friendly encryption method: Passphrase based AES 256 bit encryption. As compared to our current PFX based encryption method, this method is easy to get started with. Just create your encryption key using a strong password:

AES encryption

and begin using it with your backup sets:

AES encryption use

As with our PFX encryption method, you own your encryption key completely. It is never transferred to Zmanda servers or to cloud storage. What’s great is that you can change the encryption key whenever your security policies want you to (just make sure you save a copy for future restores!).


Interruption-free data protection for laptops

If you are using ZCB to backup laptops, you will get two significant benefits by using ZCB 4.3:

  • When you lose network connection (say when you step out of a WiFi zone), you don’t have to worry about your ongoing upload tasks. These tasks will resume as soon as you get a network connection (on Windows Vista, Windows 7 and Windows 2008 Server) or at the next hour(on Windows XP or Windows 2003 Server).
  • While adding backup schedules, you will see a new option to wake up your computer (if it happens to be sleeping or hibernating at the scheduled backup time) to run the backup job:

wake up computer

Feeling unlucky? Verify your backups!

Moved your external or network drives around? Or simply need reassurance that your cloud data is safe and sound? ZCB 4.3 includes a new “Verify Backup Data” operation which can check the health status of your backups. To run this operation, go to the Report page, right click on a backup run and select the Verify Backup Data option:

verify backup data

If all is well, you will get this reassuring message:

verify backup data result

And if ZCB finds something out of order (we hope not!), you will see this message:

verify backup data result fail

If you see the above message, check the Details column of the backup runs to know the exact error.

Manage your cloud data better

Many ZCB customers value the flexible retention policy provided by ZCB for your data. To keep your costs low, it is important to configure desired retention policy before you start backing up. This will ensure that as your data grows older and stops being valuable to you, it is automatically removed by ZCB from the cloud. But what if you forget to set retention policies before backing up? In that case, your data will keep getting accumulated on the cloud forever, taking up valuable cloud storage space.

Also there may be cases when your backup policies change (for example your old Excel reports prove to be more valuable than you earlier thought and now want to retain them for 1 year instead of 6 months) and hence you need to change the retention policy of backups you have already made.

To support such cases, ZCB 4.3 now has a new operation which you can use to change the retention policy of already run backups. Just go to the Report page, select your backup runs (you can even select multiple runs), right click on them and click on the Change Retention Period option, as shown below:
change retention

This will bring up a window where you can change the retention policy as you desire:

change retention 2

Note that ZCB allows you to set independent retention policies for local backups and cloud backups.


Delete your old, unnecessary backup data easily

Sometimes you just want to delete backups older than a particular date. This may happen when you didn’t set retention policies or you just wish to free up some cloud storage space.

With ZCB 4.3, now you can delete such backups in a single shot by using a new handy option in the File menu:

purge all backup runs before

This will bring up this window with self explanatory options:


purge all backup runs before window

Just provide your preferences above and all older data belonging to your computer will be deleted from cloud. Like other deletions from cloud - this process is irreversible, so take caution!


Locate a backed up file and restore it quickly

If you are looking for a specific file (say a meeting document from 1 year ago), it is super easy in ZCB 4.3. Just select your backup set, go to Restore tab and click on the “Locate a Backup File” button:

locate a backed up file

In the next window, just type in what you remember from the desired file name (such as “team_meeting”) and hit the Find button. ZCB will then show you all the matching file names along with the most recent time of their backup. You can select the desired file from the list and restore it in a couple of mouse clicks.


Backup one computer, upload to cloud from another

If some of your computers (such as an old Desktop at your home) don’t have necessary internet bandwidth to finish cloud backups, now you don’t have to leave them unprotected.Using ZCB 4.3, you can upload the backup data of such computers from a different computer that does have a better internet connection (such as a server in your office).

The steps to do this are described here and can be particularly valuable to finish the upload of large backups (such as the first full backup).


Other important improvements

In addition to many bug fixes and usability improvements, ZCB 4.3 also features:

More robust backups to your network drives

While storage clouds provide unparalleled reliability, network drives are an excellent way to store your backups locally for faster restores. But unfortunately due to fluctuating network conditions and changing user demands, network drives are prone to read/write errors.

ZCB 4.3 has a retransmission mechanism to handle such cases better and hence should result in more reliable backups to your network drives.

Mixing of Differential and Incremental backups is now possible

Before 4.3, you had to select either of the two backup levels - incremental and differential for a backup set. But in 4.3 you can combine these together in the same backup set.

The benefit? For starters you can implement hierarchical backup strategies such Grandfather-Father-Son scheme which provide strong redundancy and quicker restores. The combination allows you to reduce your cost of cloud storage while keeping your backup windows to a minimum.

As always, please let us know your comments and feature requests at zcb@zmanda.com.

Also, to receive product related news from us, please subscribe to this blog and our tweets. We look forward to your feedback!

-Nik

Cyberduck with support for Keystone-based OpenStack Swift

August 28th, 2012

Cyberduck is a popular open source storage browser for several cloud storage platforms. For OpenStack Swift, Cyberduck is a neat and efficient client that enables users to upload/download objects to/from their Swift storage clouds. However, the latest version (4.2.1) of Cyberduck does not support  Keystone-based authentication method.  Keystone is an identity service used by OpenStack for authentication and authorization. We expect Keystone to be the standard identity service for future Swift clouds.

There has been intensive discussions on how to make Cyberduck work with Keystone-based Swift, for example [1], and this issue has been listed as the highest priority for the next release of Cyberduck.

So, we decided to dig into making Cyberduck work with Keystone-based Swift. First we start by thanking the Cyberduck team for making compilation tools available to enable this task. Second, special thanks to David Kocher for guiding us through the process.

The key is to make java-cloudfiles API support Keystone first, because Cyberduck needs java-cloudfiles API to communicate with Swift. We thank AlexYangYu for providing the initial version of the modified java-cloudfiles API that supports Keystone. We made several improvements based on that and our fork is available here:

https://github.com/zmanda/java-cloudfiles.git

The high-level steps are to replace the older cloudfiles-1.9.1.jar in the lib directory of Cyberduck with java-cloudfiles.jar that supports Keystone authentication. Besides, we also need to copy org-json.jar from the lib directory of java-cloudfiles to the lib directory of Cyberduck.

In order to make sure Cyberduck uses the modified java-cloudfiles API, Cyberduck needs to be re-compiled after making above changes. Generally, we need to follow the steps here to set the Authenticate Context Path. But, we need to add the following information to the AppData\Cyberduck.exe_Url_*\[Version]\user.config file

<setting name=”cf.authentication.context” value=”/v2.0/tokens” />

After that, we can run the re-compiled Cyberduck and associate it with a Swift cloud. For example,

In the field Username, we need to use the following style: username:tenant_name. The API Access Key is the password for the username. If the authentication information is correct, we will see that  Cyberduck has been successfully connected to the Keystone-based Swift Cloud Storage.

The following images show that you can use Cyberduck to save any kind of files, e.g. pictures and documents, on your Swift cloud storage. You can even rename any files and open them for editing.

You can download our version of Cyberduck for Windows with support for Keystone by running git clone https://github.com/zmanda/cyberduck or from here. Once the file is unzipped, you can execute cyberduck.exe to test against your Keystone-based Swift.

If you want to know more detail about how we made this work, or you would like to compile or test for other platforms, e.g. OS X, please drop us a note at swift@zmanda.com

Storing Pebbles or Boulders: Optimizing Swift Cloud for different workloads

August 23rd, 2012

While many storage clouds are built as multi-purpose clouds, e.g. to store backup files, images, documents etc., but a cloud builder may be tasked to build and optimize the cloud for a specific purpose. E.g. a storage cloud made for cloud backup may optimize for large object sizes and frequent writes, whereas a storage cloud built for storing images may be optimized for relatively smaller objects with frequent reads.

OpenStack Swift provides a versatile platform to build storage clouds for various needs. As we discussed in our last blog, a cloud builder can choose faster I/O devices for storing their Container database to enhance performance under some scenarios. However, a careful analysis is required to determine under what scenarios the investment in the faster I/O devices for the container DB makes sense. More broadly, we are interested in how to properly provision the Swift cloud for different workloads.

In the first part of this blog, we will focus on how to provision the I/O devices for the container DB. After that, our discussion will be generalized on how to provision the storage nodes under the workloads that contain either small or large objects. We understand that in the real world, the object sizes in a workload may be varied in a wide range. However, in order to study the broad question of provisioning the Swift cloud, it is instructive to consider and separate two extreme workloads in which most objects in a workload are either pebble-sized or boulder-sized.

We will first present the experiments to show how to provision the I/O devices for the container DB with the workloads differing in object sizes.

Experimental Results

Workload Generator

As we did in our last blog, we use Swift-bench as the workload generator to benchmark the Swift cloud in terms of # PUT operations per second. We configured Swift bench for our experiments as follows:

object_size: we use 10KB or 1MB as the average object size to simulate two different workloads: (1) the average size of the objects in the workload is relatively small. (2) The average size of the objects in the workload is relatively large. Some real-world examples of small objects could be the PDF, MS Word documents or the JPEG-format pictures. While the backup or archiving data is usually large in size. (Note that: the real workloads in productions may have even larger average object size. But comparing Swift’s behavior for 10KB sized objects vs. 1MB sized objects provides useful insights to predict behavior as size of objects gets larger. Also, an application like Amanda Enterprise will typically chunk the archives into smaller objects before transferring to the cloud.)

concurrency: we set this parameter to 500 in order to saturate the Swift cloud.

num_container: we use 10 containers in our experiments. This may e.g. imply that there are 10 users of this storage cloud.

num_put: when the object size is 10KB, we upload (PUT) 10 million of such objects to the Swift cloud. However, when the object size is 1MB, we upload (PUT) 100K of such objects. As discussed in [2], the performance of container DB degrades (e.g. to 5-10 updates per second for the container DB) when the number of objects in each container is in the order of magnitude of millions. Since we have 10 containers, our target is to have 1 million objects in each container. So, we set 10 million for the num_put parameter for 10 KB objects. In order to have an equivalent total upload size, we set the num_put parameter to 100K when we upload 1MB objects.

Testing bench

Two types of EC2 instances are used to implement a small-scale Swift cloud with 1 proxy node and 4 storage nodes.

Proxy node: EC2 Cluster Compute Quadruple Extra large Instance (23 GB of memory, 33.5 EC2 Compute Units)

Storage node: High-CPU Extra large Instance (7GB of memory, 20 EC2 Compute Units)

Recently, AWS released the new EBS volume based on the Provisioned IOPS, which lets the AWS user specify the IOPS ( from 100 to 1000) for each EBS volume that will be attached to an EC2 instance. For example, an EBS with 1000 IOPS indicates that it can achieve a maximum of 1000 IOPS (for 16KB I/O request size) regardless of the I/O access pattern. So, a cloud builder can experiment with an EBS volume with higher IOPS to simulate a faster I/O device.

As mentioned in our last blog, the current version of Swift-bench only allows using 1 account. But an unlimited number of containers can be stored in that account. So, our benchmark is executed based on the following sequence: log into 1 existing account, then create 10 containers, and then upload 10 million or 100K objects (depending on the object size). In our experiments, we measure the upload (PUT) operations per second.

Two implementations of Swift cloud are compared: (1) Swift with 1000-IOPS based container DB (We call this 1000-IOPS Swift) and (2) Swift with 500-IOPS based container DB (We call this 500-IOPS Swift).

The 1000-IOPS Swift is implemented with 1 proxy node and 4 storage nodes. Each storage node attaches 9 of 1000-IOPS EBS volumes for storing all objects, 1 of 200-IOPS EBS volume for storing the account DB and 1 of 1000-IOPS EBS for storing the container DB.

The 500-IOPS Swift is implemented with 1 proxy node and 4 storage nodes. Each storage node attaches 9 of 1000-IOPS EBS volumes for storing all objects, 1 of 200-IOPS EBS volume for storing the account DB and 1 of 500-IOPS EBS for storing the container DB.

The proxy node has 10Gbps Ethernet, while the storage node has 1Gbps Ethernet.

Software Settings

We use OpenStack Swift version 1.6.1 and the authentication method on proxy node is TempAuth. All proxy, container, account and object-related parameters are set to Defaults, except: in proxy-server.conf, #worker = 500; in account-server.conf, # workers = 32; in container-server.conf, #worker = 32 and db_preallocation = on; in object-server.conf, # workers = 32.

The Swift-bench, proxy and authentication services run on the proxy node and we ensure that the proxy server is never the bottleneck of the Swift cloud. The account, container, object and rsync services run on the storage nodes.

The number of replicas in the Swift cloud is set to two and the main memory of each node is fully utilized for caching the files and data.

Benchmark results

Figure 1 show the operation rate (operations per second on the Y-axis) of the PUT operation for the two Swift implementations over the benchmark window, when the object size is 10KB. Overall, as seen from Figure 1, we notice that when the object size is set to 10KB, the 1000-IOPS Swift achieves higher operation rate than the 500-IOPS, and 68% extra operation rate is observed when 10 million objects have been uploaded.

Figure 1: Comparing two Swift implementations when the object size is 10KB


To compare with Figure 1, we also plot the operation rate of the PUT operation when the object size 1MB, as shown in Figure 2.

Figure 2: Comparing two Swift implementations when the object size is 1MB


In contrast with Figure 1, the two Swift implementations show the same performance when the object size is 1MB. Moreover, in Figure 1, when the objects (10KB size) are being uploaded, the performance of the two Swift implementations kept decreasing from first object upload onwards. . However, when the object size is 1MB (see Figure 2), the performance of the two Swift implementations increases initially and then becomes stable after the initial stage.

We conclude the results in Figure 1 and Figure 2 as follow:

(1) For the upload workload that mostly contains small objects (e.g. 10KB in our test), it is a good practice to use faster I/O devices for the container DB, because each small object can be quickly uploaded to I/O devices, the container DB should have a faster I/O device to keep up with the fast speed of uploading small objects.

(2) For the upload workload that mostly contains larger objects, using faster I/O devices for the container DB does not make much sense. This is because the storage node spends more time on storing the large objects to the I/O devices and consequently, the update frequency of the container DB is relatively slow. So, there is no need to supply the container DB with faster I/O devices.

Besides the discussion on how to provision the I/O device for the container DB, we also want to discuss how to provision other types of resources in the storage node for these two workloads. To this end, we also monitored the CPU usage, network bandwidth and the I/O devices (that are used for storing the objects) of the storage node during the runs of our benchmarks and summarize our observations below.

CPU: Comparing to case of uploading large objects, we note that the CPU usage is higher when the small objects are being uploaded. The reason is the object service is much busier to handle the newly uploaded small objects every second. (2) the container service has to deal with more updates generated from the container DB. Thus, more CPU resource in the storage node will be consumed when uploading the small objects.

Network bandwidth: Uploading large objects will consume more network bandwidth. This is can be verified by Figure 1 and Figure 2: in Figure 1, when the 10 millions of objects are uploaded, the operation rate of 1000-IOPS Swift is 361 and the object size is 10KB, so the total network bandwidth is about 3.5 MB/s. However, while uploading the large objects (see Figure 2), when 100K of objects are uploaded, the operation rate of 1000-IOPS Swift is 120 and the object size is 1MB, so the total network bandwidth is around 120 MB/s.

I/O devices for storing the objects: The I/O pattern of those I/O devices is more random when the small objects are being uploaded. This can be verified by Figure 3, where we plot the distribution of the logical block distance (LBN) distance between two successive I/Os. As seen from Figure 3, when uploading the objects of 1MB size, only 9% of successive I/Os are separated more than 2.5 million LBN away. However, for the case of uploading the objects of 10KB size, about 38% of successive I/Os are more than 2.5 million LBN away. So, this comparison shows that the I/O pattern generated by uploading 1MB objects is much less random. For reference we also plot the pattern for a large sequential write on the same storage node. We observe that for the case of uploading 1MB objects, 70% of successive I/Os are more than 80 and less than 160 LBN away, which is also the range where most of the successive I/Os for Sequential Write fall into.

Figure 3: The distribution of logical block number (LBN) between two successive I/Os for the 1MB object size and 1KB object size. (“M” denotes Million in x-axis)


To summarize, the important take-away points from the above discussion are:

For the upload workload that mostly contains small objects (pebbles), it will be rewarded for higher operation rate by provisioning the storage node with faster CPU, faster I/O devices for the container DB and only moderate network bandwidth. We should avoid using I/O devices with very low random IOPS for storing the objects, because the I/O pattern from those I/O devices is not sequential, and this will become the bottleneck of the storage node. So, use of SSDs can be considered for this workload.

For the upload workload that mostly contains the large objects, it is adequate to provision the storage node with the commodity CPU and moderate I/O speed for the container DB. In order to have better throughput (MB/s) of the Swift cloud, it is recommended to choose a large bandwidth network, and the I/O devices with high sequential throughput (MB/s) for storing the objects. IOPs are not critical for this workload, so standard SATA drives may be sufficient.

Of course, these choices have to be aligned with higher level choices e.g. number of storage and proxy nodes. Overall, cloud builders can benefit from the optimization practices we mentioned in our Swift Advisor blog.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Should You Consider SSDs for OpenStack Swift Cloud Storage?

August 6th, 2012

With an ongoing move in the industry to move towards building data centers with SSD based storage, there has been a lot of interest in the OpenStack Swift community to consider the faster I/O devices when deploying Swift based storage cloud, such as the discussions in [1], [2] and [3]. Cloud builders are interested in significantly increasing the limit of objects per container by using faster I/O devices. In these discussions a general consensus is to place the account and container databases on the faster I/O devices (e.g. HDD RAID 10 or SSD) for better performance. We expect the desire for ever faster Storage Clouds to increase as faster network infrastructures, e.g. Google Fiber become commonplace.

In this series of blogs we focus on using faster I/O devices in Swift and seek to answer the following questions:

(1) What kinds of workloads can benefit from using the faster I/O devices for the container and account services?

(2) How much extra performance can be expected?

We also evaluate which approach provides the best ROI when upgrading from non-RAID based HDD:

(1) Switch to the SSD-based container and account services, i.e. replace some regular HDDs with SSDs?

(2) Upgrade to HDD RAID (from non-RAIDED HDD) with lower upfront costs?

(3) Or take potentially more convenient approach and install more main memory in each storage node to increase the file caching effect without upgrading the I/O subsystem?

Let’s start with answering the question on how much extra performance can be expected by using SSDs.

Experimental Results

1.    Workload Generator

Our workload generator for our benchmark test is Swift-bench, which can directly send workloads to a Swift instance.  Using Swift-bench we will benchmark three common operations for the Swift Cloud: upload an object (#PUTs/sec), download an object (#GETs/sec) and delete an object (#DELETEs/sec). Swift-bench allows us to tune the workloads based on the following parameters:

object_size: defines the size of an object (or a file) that will be uploaded. In our tests, we use 40 KB as the object size parameter.

concurrency: defines how many concurrent threads will be launched for a benchmark test. In order to saturate the Swift Cloud, we set 256 to be the concurrency parameter.

num_objects: the number of objects in total that will be uploaded to Swift. Here, we use 1 million for this parameter. Given each object size is 40 KB, the total size of the objects to be uploaded is 40GB.

num_gets: the number of objects in total that will be downloaded from Swift. To be equivalent to the num of objects parameter, we use 1 million here.

num_containers: the number of containers that will be created for storing the objects. The objects are uniformly distributed across all containers. Since we want to see if this parameter will affect the performance of the Swift Cloud, in our tests we use two values for this parameter: 10K and 20K.

2.    Testing bench

(Some of the initial testing for this blog was done on SSDs provided by STEC, Inc. We thank STEC for their contribution to this study.)

We leverage the following two types of EC2 instances to implement a small-scale Swift cloud with one proxy node and two storage nodes.

Proxy node: EC2 Cluster GPU Quadruple Extra Large Instance (22 GB of memory, 33.5 EC2 Compute Units)
Storage nodes: EC2 High I/O Quadruple Extra Large Instance (60.5 GB of memory, 35 EC2 Compute Units, 2 SSD-based storage volumes)

Each storage node attaches 20 EBS volumes for storing the objects. All EBS volumes are homogenous and each EBS volume is 25GB.

The current version of Swift-bench only allows using 1 account and unlimited number of containers in that account. So, one benchmark run is based on the following sequence: log into 1 existing account, then create 10K or 20K containers, and then upload 1 million objects. After that, download 1 million objects, and finally delete 1 million objects. We measure the upload, download and delete operations per second.

We will compare two Swift Cloud implementations: (1) Swift with HDD-based container DB (we call this HDD-based Swift for short) and (2) Swift with SSD-based container DB (we call this SSD-based Swift for short).

The HDD-based Swift is implemented by 1 proxy nodes and 2 storage nodes. Each storage node attaches 20 EBS volumes for storing all objects, 1 EBS volume for storing the account DB, and 1 EBS for storing the container DB.

The SSD-based Swift is implemented by 1 proxy nodes and 2 storage nodes. Each storage node attaches 20 EBS volumes for storing all objects, 1 EBS volume for storing the account DB, and 1 SSD-based volume for storing the container DB.

The proxy and storage nodes are connected via 10Gbps network.

3.    Software Settings

We use OpenStack Swift version 1.5.1 and the authentication method on proxy node is TempAuth. All proxy, container, account and object-related parameters are set to Defaults, except the #workers. (#worker defines how many processes to fork. In proxy-server.conf, #workers = 256; in account-server.conf, #workers = 32; in container-server.conf, #workers = 32; in object-server.conf, #workers = 32.)

The swift-bench, proxy and authentication services run on the proxy node, while the account, container, object and rsync services run on the storage nodes.

The number of replicas is set to two. So, each storage node holds one replica.

The main memory (containing the file cache) can also be a very important factor in the benchmark performance. In this blog, we will focus on the case when the total size of the objects stored in a storage node is much larger than the memory size (This is a valid scenario for the PB-level data center). In this case, since most of objects can not be fully cached in the memory, the file cache effect will not be very significant. To simulate this scenario, we choose to “Disable” file cache in the OS (by running “echo 3 > /proc/sys/vm/drop_cache” every second to drop the pagecache, dentries and inodes. The overhead of running this command itself is minimal and will not impact the benchmark performance.) In our later blogs, we will consider the file cache effect of using different memory sizes for the storage node.

4.    Benchmark results

Figure 1 . Comparing two Swift implementations (the file cache is disabled)

Figure 1 shows the operation rate (operations per second on the Y-axis) for three different operations: PUT, GET and DELETE for each Swift implementation, when the number of containers is set to 10K and 20K. As seen in Figure 1, we found that the SSD-based Swift outperforms the HDD-based Swift across all three operations.

Moreover, for the SSD-based Swift, the different number of containers does not affect its performance very much. For example, the PUT, GET and DELETE rates for the SSD-based Swift (shown as the blue bar in Figure 1) are stable when the number of containers is increased from 10K to 20K.

However, the different number of containers does matter to the HDD-based Swift. As seen from Figure 1, the PUT, GET and DELETE rates for the HDD-based Swift (shown as the red bar in Figure 1) at 10K containers are higher than the case of 20K containers. This behavior shows that as the number of containers increases, more overheads will be incurred that negatively impact the performance of the HDD-based Swift.

To further study the overhead for the HDD-based Swift when the number of containers is increased from 10K to 20K, we use the blktrace command to keep track of the logical block number (LBN) of each I/O on the EBS volume that stores the container DB. Then, we calculate the percentage of two successive I/Os that are separate apart more than 125,000 LBN away, in order to observe how random is the I/O pattern that is generated from the container DB. For completeness, we additionally run the benchmark for 5K containers. The distribution is shown in Figure 2.

Figure 2. The percentage of two successive I/Os that are separate apart more than 125000 LBN away

As seen from Figure 2, we notice that about 86% of two successive I/Os are separate apart more than 125000 LBN away for 20K containers, while this is only 77% for 10K containers and 58% for 5K containers. So, as the number of containers increases, the I/O pattern generated from the container DB is more random.

Since HDDs tend to not perform very well for the random I/O access patterns, while SSDs (especially the enterprise SSD) can efficiently handle the random I/Os, therefore when the number of containers becomes large, it tends to hinder performance of container DB in the HDD-based Swift, while the SSD-based Swift has no such issue.

To have a better view of Figure 1, we summarize the speedups of the SSD-based Swift over the HDD-based Swift, as shown in Table 1.

10K containers

20K containers

PUT

25%

70%

GET

7%

12%

DELETE

49%

109%

Table 1. Comparing two Swift implementations

From Table 1, we note that the speedup effect of using SSD (as a backend for container DB) is most significant when doing the DELETE operations (e.g. 49% for 10K containers and 109% for 20K containers), then followed by the PUT operation (e.g. 25% for 10K containers and 70% for 20K containers). The speedup effect on the GET operation is very small (e.g. 7% for 10K containers and 12% for 20 K containers). This behavior has been explained in [1] — the B trees (maintained by the container DB) need to be updated as the objects are put or deleted from the containers. However, when reading the objects (e.g. GET operation), there are no updates on the B trees.

As a conclusion for Table 1 and Figure 1, we note that:

(1) When the file cache effect is minimum, the SSD-based Swift largely outperforms the HDD-based Swift. The extra performance from using the SSDs depends on the certain type of operation: PUT and DELETE operations benefit most from using the SSDs, because they incur a large amount of concurrent updates on the B trees which will generate lots of mixed random read and write I/Os. As we known, the SSDs are far more efficient than HDDs to handle the random I/O pattern, so these types of workloads will certainly benefit from the SSD-based Swift. However, for the GET operation, the extra performance from using the SSDs is small, this is because the GET operation will not incur any update on the B trees.

(2) Since the I/O speed of HDDs is sensitive to the randomness of I/O pattern, the performance of the HDD-based Swift can be negatively impacted by the increasing number of containers, which will generate more random I/O pattern on the storage for the container DB. However, when using SSD for the container DB, the SSD-based Swift will not be affected by the increasing number of containers.

Conclusions and Next Steps

Overall, we have identified some scenarios when the workloads will benefit most from the SSD-based Swift. E.g. the PUT and DELETE are the dominant operations, a large number of containers are used (so, more challenging I/O pattern from the container DB), but has small main memory (the cache effect is minimum). One example of such workload is a storage cloud used for backups for a large number of users. In most such implementations, users will have a separate containers on the cloud, they will be doing mostly PUT operations, and DELETE operation will also be executed frequently to implement the retention policy set by the users.

It is likely that when the object size is large (e.g. 50-100MB), the storage node will be bottlenecked at the I/O devices for storing the large objects. Then, the frequency of updating the container DB will be low. In this case, using HDD as the backend for container DB is adequate, because the workload for the container DB is very light. More generally, when the Swift is not always saturated by the workloads (e.g. low concurrency of active users), it may also be fine to place the container DB on HDD.

It is an interesting issue to identify the tradeoffs (both for performance and cost) between using the SSD-based Swift with small main memory per node and using the HDD-based Swift with large main memory per node. Because the SSDs and large main memory can both help to push up the performance: the SSDs can greatly reduce the wait time of each outstanding I/O (especially for random I/O), while the large main memory can cache the “hot data” (e.g. container DB, account DB and most frequently visited objects) to eliminate lots of actual disk visits and quickly serve the user requests just from the memory.

Since SSD is just one example of the large body of faster I/O devices, our future direction is to broadly discuss the following question:  for a certain type of workload, what is the most cost-effective method to implement a Swift cloud when the cloud builders have the choice of several different combinations of faster I/O devices.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at swift@zmanda.com

Sumo nodes for OpenStack Swift: Mixing storage and proxy services for better performance and lower TCO

June 25th, 2012

The ultimate goal of cloud operators is to provide high-performance and robust computing resources for the end users while minimizing the build and operational costs. Specifically OpenStack Swift operators want to build storage clouds with higher throughput bandwidth but with lower initial hardware purchase cost and lower ongoing IT-related costs (e.g. IT admin, power, cooling etc.).

In this blog, we present a novel way to achieve the above goal by using nodes in a Swift cloud which mix storage and proxy services. This is contrary to the existing common practice in Swift deployments to exclusively run proxy and storage services on separate nodes, where the proxy and authentication services are suggested to run on hardware with high-end CPU and networking resources, and the objects stored on storage nodes without the need for similar compute and networking resources as the proxy nodes. However, our proposal is to provide a new alternative for building cost-effective Swift clouds with higher bandwidth, but without compromising the reliability. This method can be easily adopted by cloud operators who have already built a Swift instance on some existing hardware or who are considering to buy new hardware to build their Swift clouds.

Our idea is based on following principle: when uploading an object to Swift with M replicas, (M/2)+1 out of M writes need to be successful before the uploader/client is acknowledged that the upload is successful. For example, in a default Swift environment which enforces 3 data replicas, when a client sends an object to Swift, once 2 writes complete successfully, Swift will declare a success message to the client - the remaining 1 write can be a delayed write and Swift will rely on the replication process to ensure that the third replica will be successfully created.

Based on the above observation, our idea is to speed up the 2 successful writes so that Swift can declare a success as soon as possible in order to increase the overall bandwidth. The third write is allowed to be finished at a slower pace and its goal is to guarantee the data availability when 2 zones fail. To speed up the 2 successful writes, we propose to run storage, proxy and authentication services on high-end nodes.

We call these mixed nodes Sumo Nodes: Nodes in a Swift cloud which provide proxy and authentication services as well as provide storage services.

The reason why we want to mix the storage and proxy services on these high-end nodes is based on the following observations: Proxy nodes are usually provisioned with high-end hardware, including 10Gbps network, powerful multi-core CPUs and large amount of memory. In many scenarios these nodes are over-provisioned (as we discussed in our last “pitfall” blog). Thus, we want to take full advantage of their resources by consolidating both proxy and storage services on one sumo node, with the goal to complete the 2 successful writes faster.

Sumo nodes will typically be interconnected via 10Gbps network. Writes to sumo nodes will be done faster because they are either local writes (proxy server sends data to storage service within the same node) or the write is done over a 10Gbps network instead of transferring to a storage node connected via a 1Gbps network. So, if two out of three writes for an upload are routed to the sumo nodes, the Swift cloud will return success much faster than the case when an acknowledgement from a traditional storage node is needed.

In the following discussion, we consider a production Swift cloud with five zones, three object replicas and two nodes with proxy services. (Five zones provide redundancy and fault isolation to ensure that three-replica requirement for objects is maintained in the cloud even when an entire zone fails and the repair for that zone takes significant time.)

Key considerations when designing a Swift cloud based on sumo nodes

How should I setup my zones with sumo nodes and storage nodes?

Each sumo node needs to belong to a separate zone (i.e. one zone should not have two sumo nodes). In our example Swift cloud the two sumo nodes represent two zones and rest of the three zones are distributed on the three storage nodes.

Can you guarantee that all “2 successful writes” will be done on the sumo nodes?

No. Upon each upload, every zone will have a chance to take one copy of the data. Based on the setup of two zones on the two sumo nodes and three zones on the storage nodes, it is even possible that all 3 writes will go to the three zones on the storage nodes.

However, we can increase the probability of having “2 successful writes” done on the sumo nodes. One straightforward but potentially costly way is to buy and use more sumo nodes. But, another way is to optimize the “weight” parameter associated with the storage devices on sumo nodes. (The weight parameter will decide how much storage space is used on a storage device, e.g. a device with 200 weight will take 2X amount of data than a device with 100 weight.) By giving higher weight value to the sumo nodes as compared to the storage nodes, more data will be written into the sumo nodes, thereby increasing the probability of “2 successful writes” being done on the sumo nodes.

What is the impact of using higher weight value on the sumo nodes?

As discussed above, when using a higher weight value on the sumo nodes, the probability of “2 successful writes” done at those nodes will be higher, so the bandwidth of the Swift should be better. However, using higher weight value will also speed up the device/disk space usage on the sumo nodes causing these nodes to reach their space limit quickly. Once the space limit on sumo node is reached, all remaining uploads will go the storage nodes. So, the performance of the sumo-node based Swift cloud would be downgraded to a typical Swift cloud (or even worse if the storage nodes being used are wimpy). However, the bottom line is that the storage nodes should have enough storage space to guarantee the data can be completely replicated on them and be able to accommodate failure of a sumo node. For example, assuming each sumo node has 44 available HDD bays and each HDD bay is populated with 2 TB HDD. Once the available 88 TB of space is used up on the sumo nodes, the users of the Swift cloud will stop enjoying the performance improvements because of sumo nodes (the “sumo effect”) even though there might be additional usable storage capacity on the cloud.

The disk usage of sumo nodes can be easily controlled by the weight value. Overall, the right value of the weight parameter will be determined by the individual cloud operators depending on (1) how much total data they want to save in the sumo-node based Swift cloud and (2) how much is the raw space limit on sumo nodes.

In a sumo-based Swift cloud, the function of storage nodes is to help guarantee the data availability. We expect the storage nodes to be wimpy nodes in this configuration. However, the storage nodes should have enough total disk space to handle the capacity requirements of the cloud. At a minimum, storage nodes should have the combined capacity of at least one sumo node.

While performance will increase when using sumo nodes, but what is the impact on the total cost of ownership of the Swift cloud?

Since the sumo nodes provide functionality and capacity of multiple storage nodes, cloud builders will save on the cost of purchasing and maintaining some storage nodes.

Using sumo nodes will not compromise any data availability because we still keep the same number of zone and data replicas at all times as compared to the traditional setup of Swift. However, we should pay attention to the robustness and health status of the sumo nodes. Failure of a sumo node is much more impactful than failure of a traditional proxy node or a storage node. If a sumo node fails, the cloud will lose one of the proxy processes as well as will need to replicate all of the objects stored on the failed sumo node to remaining nodes in the cloud. Therefore, we recommend investing more on the robustness features on the sumo nodes, e.g. dual fans, redundant power supply etc.

To choose the right hardware for sumo nodes and storage nodes, based on your cost, capacity and performance considerations, we recommend using the techniques and process of the Swift Advisor.

What use cases are best suited for sumo-based Swift configurations?

Sumo-based Swift configurations can be best applied in the following use cases: (1) high performance is a key requirement from the Swift cloud, but a moderate amount of usable capacity is required (e.g. less than 100TB) or (2) the total data stored in the Swift cloud will not change dramatically over time, so a cloud builder can plan ahead and pick the right sized sumo nodes for their cloud.

Sumo Nodes and Wimpy Storage Nodes in action

To validate the effectiveness of the sumo based Swift cloud, we ran extensive benchmarks in our labs. We looked at the impact of using Sumo nodes on the observed bandwidth and net usable storage space of the Swift cloud (we assume that cloud operators will not use the additional capacity, if any available, in their Swift cloud once the “sumo effect” wears off).

As before, we use Amanda Enterprise as our application to backup and recover a 15GB data file to/from the Swift cloud to test its write and read bandwidth respectively. We ensure that one Amanda Enterprise server can fully load the Swift cloud in all cases.

Our Swift cloud is built on EC2 compute instances. We use Cluster Compute Quadruple Extra Large Instance (Quad Instance) for the sumo nodes and use Small or Medium Instance for the storage nodes.

To build a minimal production-ready sumo-node based Swift cloud, we use 1 load balancer (pound-based load balancer, running on a Quad Instance), 2 sumo nodes and 3 storage nodes. We set 2 zones on the 2 sumo nodes and the remaining 3 zones are on the 3 storage nodes. To do an apples-to-apples comparison, we also setup a Swift cloud with traditional setup (storage services running exclusively on designated storage nodes) using the same EC2 instances on the load balancer and proxy nodes (Quad instance), and storage services running on five storage nodes (Small or Medium Instance). All storage nodes in traditional setup have the same and default values of the weight parameter.

We use Ubuntu 12.04 as base OS and install the Essex version of Keystone and Swift across all nodes.

We first look at the results of upload workload with different weight values on sumo nodes. In sumo-based Swift, we set the weight values on the storage nodes to be always 100. We use “sumo (w:X)” in Figure 1 and 2 to denote that the value of weight parameter on sumo node is X. E.g. sumo (w:300) represents a Swift cloud with sumo nodes with weight value set at 300.

Figure 1 shows the trade-offs between the observed upload bandwidth and projected storage space of a sumo-node based Swift cloud before the space limit of sumo nodes is reached – this is the point when the “sumo effect” of performance improvement tapers off. The projected storage space is calculated by assuming 44 HDD bays on each node and each HDD bay is populated with a 2TB HDD. The readers can have their own calculations on the storage space depending on their specific situation.

As we can see from the above Figure 1, with either Small or Medium Instances being used for storage nodes, the sumo-based Swift cloud with weight 300 always provides the highest upload bandwidth and lowest usable storage space. On the other hand, the traditional setup and “sumo (w:100)” both provide the largest usable storage space – but even in this case the sumo-based Swift cloud has higher upload bandwidth as it puts some objects on the faster sumo nodes.

In Figure 1, we also observe that: as the weight increases on the sumo nodes, the bandwidth gap between using Small Instances for the storage nodes and using Medium Instances for the storage nodes shrinks. For example, the bandwidth gap at “sumo (w:200)” is about 24 MB/s, but the it is reduced to 13 MB/s at “sumo (w:300)”. This implies that in a sumo-based Swift cloud, you may consider using wimpy storage nodes, as it may be more cost-effective as long as your performance considerations are met. The reason is that the effective bandwidth of a sumo-based Swift cloud is mostly determined by the sumo nodes when the weight of the sumo nodes is set to large.

For completeness, we also show the results of download workload in Figure 2. Similar to the Figure 1, the sumo-based Swift cloud delivers the highest download bandwidth and provides the lowest storage space when using weight 300 for sumo nodes. On the other hand, when using weight 100, it has the same usable capacity as the traditional setup, but still has higher download bandwidth as compared to the observed download bandwidth with traditional setup.

From the above experiments, we observe interesting trade-offs between bandwidth achieved and usable storage space until the point the Swift cloud is enjoying the “sumo effect”. Overall, the sumo-based Swift cloud is able to (1) Significantly boost the overall bandwidth with relatively straightforward modifications on the traditional setup and (2) Reduce the TCO by using smaller number of overall nodes, and requiring wimpy storage nodes, while not sacrificing the data availability characteristics of the traditional setup.

Above analysis and numbers should give Swift builders a new option to deploy and optimize their Swift clouds and seed other ideas on how to allocate various Swift services across nodes and choose hardware for those nodes.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at swift@zmanda.com

Building a Swift Storage Cloud? Avoid Wimpy Proxy Servers and Five other Pitfalls

May 23rd, 2012

We introduced OpenStack Swift Advisor in a previous blog: a set of methods and tools to assist cloud storage builders to select appropriate hardware based on their goals from their Swift Cloud. Here we describe six pitfalls to avoid, when choosing components for your Swift Cloud:

(1) Do not use wimpy servers for proxy nodes

The key functionality of  a proxy node is to process a very large amount of API requests, receive the data from the user applications and send them out to the corresponding storage nodes. Proxy node makes sure that a minimum number of required replicas get written to storage nodes. Reply traffic (e.g. Restore traffic, in case the Swift Cloud is used for Cloud Backup) also flows through the proxy nodes. Moreover, the authenticating  services (e.g. keystone, swauth) may also be integrated into the proxy nodes.  Considering these performance critical functions being performed by proxy nodes, we strongly advise cloud storage builders to consider powerful servers as the proxy nodes. For example, a typical proxy node can be provisioned with 2 or more multi-core Xeon-class CPUs, large memory and 10G Ethernet.

There is some debate on whether a small number of powerful servers or a large number of wimpy servers should be used as the proxy nodes. It is possible that the initial cost outlay of a large number of wimpy proxy nodes may be lower than a smaller number of powerful nodes, while providing acceptable performance. But for data center operators, a large number of wimpy servers will inevitably incur higher IT related costs (personnel, server maintenance, space rental, cooling, energy and so on). Additionally, more servers will need more network switches, thus decreasing some of the cost benefits as well as increasing failure rate. As your cloud storage service gets popular, scalability will be challenging with wimpy proxy nodes.

(2) Don’t let your load-balancer be overloaded

Load-balancer is the first component of a Swift cluster that directly faces the user applications. Its primary job is to take all API requests from the user application and evenly distribute them to the underlying proxy nodes. In some cases, it has to do the SSL termination to authenticate the users, which is a very CPU and network intensive job. An overloaded load-balancer inherently defeats the purpose by becoming the bottleneck of  your Swift cluster’s performance.

As we have discussed in a previous blog (Next Steps with OpenStack Swift Advisor), the linear scalability of a Swift cluster on performance can be seriously inhibited by a load-balancer which doesn’t keep up with the load. To reap the benefits of your investment in proxy and storage nodes, you should make sure that the load-balancer is not underpowered especially for peak load conditions on your storage cloud.

(3) Do not under-utilize your proxy nodes

Proxy node is usually one of the most expensive component in the Swift cluster. Therefore, it is desirable for cloud builders to fully utilize the resources in their proxy nodes. A  good question being asked by our customers is: how many storage nodes should I attach to a proxy node ? or what is the best ratio between the proxy and storage nodes ? If your cloud is built with fewer storage nodes per proxy node, you may be  under-utilizing your proxy nodes, as shown in the following Figure 1(a). (While we have simplified the illustrations, the factors of performance changes indicated in following figures are based on actual observations in our labs.) In this example, initially, the Swift cluster consists of 3 nodes: 1 proxy node and 2 storage nodes (we use capital P and S in the picture to denote proxy and storage nodes respectively). The write throughput of that 3-node Swift cluster is X MB/s. However, if we add two more storage nodes to that Swift cluster, as shown in Figure 1(b), the throughput of the 5-node Swift cluster becomes 2X MB/s.  So the throughput along with capacity of Swift cluster can be doubled (2X) by simply adding in two storage nodes. In terms of the cost per throughput and cost per GB,  the 5-node Swift cluster in this example will likely be more efficient.

(4) Do not over-utilize the proxy nodes

On the other hand, you can’t keep attaching the storage nodes without increasing your proxy nodes at some point. In Figure 2(a), 1 proxy node has been well-utilized by the 4 storage nodes with 2X MB/s throughput. If more storage nodes are attached to the proxy node,  as shown in Figure 2(b), its throughput will not increase because the proxy node is already busy with the 4 storage nodes. Therefore, attaching more storage nodes to a well-utilized (nearly 100% busy) proxy node will only make the Swift cluster less efficient in terms of the cost per throughput. However note that you may decide to over-subscribe proxy nodes, if you are willing to sacrifice potential performance gains by adding more proxy nodes, and you simply want to add more capacity for now. But to increase capacity, first look into making sure you are adding enough disks to each storage node, as described in the next pitfall.

(5) Avoid disk-bounded storage nodes

Another common question we get is: how many disks should I put into my storage node? This is a crucial question with implications on cost/performance and cost/capacity. In general, you want to avoid storage nodes which are bottlenecked on performance due to less number of disk spindles as illustrated by the following picture.

Figure 3(a) shows a Swift cluster consisting of 1 proxy node and 2 storage nodes, with each storage node attached to 1 disk. Let’s assume the throughput of this Swift cluster is Y MB/s. However, if we add one more disk on each storage node based on Figure 3(a), we will have two disks on each storage node, as shown in Figure 3(b). Based on our observations the throughput of the new Swift cluster may increase by as much as 1.5Y MB/s. The reason why the throughput is improved by simply attaching more disks is: in Figure 3(a), one disk in each storage node can easily be overwhelmed (i.e. 100% busy) when transferring the data from/to the storage nodes, while other resources (e.g. CPU, memory) in the storage node are not fully-utilized, hence the storage node being “disk-bounded”. However, since more disks are added to each storage node and all disks can work in parallel during the data transfers, the bottleneck of the storage node is shifted from disks to other resources, and thus, the throughput of Swift cluster can be improved. In terms of cost per throughput, Figure 3(b) is more efficient than Figure 3(a), since the cost of adding more disk is significantly less than the cost of the whole server.

An immediate follow-up question is: can the throughput keep increasing by attaching more disks to each storage node? Of course, the answer is No. Figure 4 shows the relationship between the number of disks attached to each storage node and the throughout of Swift cluster. As the number of disks increases from 1, the throughput is indeed improved but after some point (we call it “turning point”), the throughput stops increasing and becomes almost flat later on.

Even though the throughput of Swift cluster can not keep improving by attaching more disks,  some cloud storage builders may want to put large number of disks in each storage node, as doing that does not hurt the performance. Another metric, cost per MB/s per GB of available capacity,  tends to be minimized by adding more disks.

(6) Do not rely on two replicas of data

One more question we get frequently asked from our customers is: can we use 2 replicas of data in the Swift cluster in order to save on cost of storage space ? Our recommendation is: No. Here is why:

Performance: it may seem that a Swift cluster which maintains 2 replicas of data will have better performance when the data is written to the storage nodes as compared to a cluster which maintains 3 replicas (which has one more write stream to the storage nodes). However, in actuality,  when the proxy node attempts to write to N replicas, it only requires  (N/2)+1 successful responses out of N to declare a successful write. That is to say, only (N/2)+1 out of N concurrent writes are synchronous, while the rest of the writes can be asynchronous and Swift will rely on the replication process to ensure that the remaining copies are successfully created.

Based on the above, and in our tests comparing the “3-replication Swift cluster” and  “2-replication Swift cluster”:  they will both generate 2 concurrent synchronous writes to the storage nodes.

Risk of data loss: We recommend using commodity off-the-shelf storage for Swift Storage Nodes, without even using RAID. So, the replicas maintained by Swift are your defense against data loss. Also, lets say a Swift cluster has 5 zones (which is the minimum number of recommended zones) and 3 replicas of data. With this setup, up to two zones can fail at the same time without any data loss. However, if we reduce the number of replications from 3 to 2, the risk of data loss is increased by 100%, because the data can only survive one zone failure.

Avoiding above pitfalls will help you to implement a high-performance and robust Swift Cloud, which will scale to serve your cloud storage needs for several years to come.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Zmanda “googles” cloud backup!

May 11th, 2012

Today, we are thrilled to announce a new version of Zmanda Cloud  Backup (ZCB) that backs up to Google Cloud Storage. It feels great to support perhaps the first mainstream cloud storage service we were introduced to (via the breakthrough Gmail and other Google services) and considering the huge promise shown by Google’s cloud services, we are sure that this version will be very useful to many of our customers.

However, a new cloud storage partner explains only part of the excitement. :) What makes this version more significant to us is its new packaging. As you may be aware, until now ZCB came only in a Pay-As-You-Go format and while this option has been great for our customers who value the flexibility offered by this model, we realized that there are our other customers (such as government agencies) who need a fixed amount to put down in their proposals and budget provisions. To put it differently - these customers would rather trade-off some of the flexibility for certainty.

So with these customers in mind, we chose to offer this ZCB version in the following prepaid usage quota based plans:

  • $75/year for 50 GB
  • $100/year for 100 GB
  • $1,000/year for 1000 GB
  • $10,000/year for 10000 GB

Note that the above GB values are the maximum size of data users can store on the cloud at any point in time. The prices above are inclusive of all costs of cloud storage and remain unaffected even if you wish to protect multiple (unlimited!) systems.

    So what are the benefits of this new pricing option? Here are some:

  • Budget friendly: Whether you are an IT manager submitting your annual IT budget for approval or a service provider vying for a client’s business, the all-inclusive yearly plans are a great option, one you can confidently put down in writing.
  • Cost effective: If you know your requirements well, this option turns out to be dramatically cost effective. Here is a rough comparison of our pricing with some other well-known providers:

    Note:
    Zmanda Cloud Backup: The annual plan pricing for Google Cloud Storage version was used.
    MozyPro: Based on http://mozy.com/pro/pricing/ “Server Pass” option was chosen since ZCB protects Server applications at no extra cost.
    JungleDisk: Based on: https://www.jungledisk.com/business/server/pricing/ Rackspace storage option was used since this was the only “all-inclusive” price option

  • More payment options: In addition to credit cards, this version supports a variety of payment options (such as Bank transfer, checks, etc.). So whether you are a government agency or an international firm, mode of payment is never going to be an issue.
  • Simplified billing and account management: Since this aspect is entirely handled by Zmanda, it is much easier and user friendly to manage your ZCB subscription. So no more hassles of updating your credit card information and no need of managing multiple accounts. When you need help, just write to a single email id (zcb@zmanda.com), or open a support case with us, and we will assist you with everything you may need assistance with.
  • Partner friendly: The direct result of all the above benefits is that reselling this ZCB version will be much more simplified and rewarding. If you are interested in learning more, do visit our new reseller page for more details.

So with all the great benefits above, do we still expect some customers to choose our current pay-as-you-go ZCB version for Amazon S3? Of course! As we said, if your needs are currently small or unpredictable, the flexibility of scaling up and down without committing to a long term plan is a sensible option. And the 70 GB free tier and volume discount tier offered on this ZCB version can keep your monthly costs very low.

Oh and I almost forgot - along with this version, we have also announced the availability of ZCB Global Dashboard, the web-interface to track usage and backup activity of multiple ZCB systems at a single place. If you have multiple ZCB systems in your environment or you are a reseller, it will be extra useful to you.

As we work on enhancing our ZCB solution more, please keep sending us your feedback at zcb@zmanda.com. Much more is cooking with Cloud Backup at Zmanda. Will be with you with more exciting news soon!

-Nik

Great Combination for Cloud Storage: Ubuntu 12.04 + OpenStack Swift Essex Release

May 7th, 2012

We are very excited to see the release of Ubuntu 12.04 LTS and OpenStack Essex, especially the Essex version of OpenStack Swift, and the brand-new Dashboard. We have not yet seen any performance review on the OpenStack Swift Essex running on Ubuntu 12.o4. The official Dashboard Demo introduced the components of System Panel and Mange Compute, without any details for the Object Store. So, we did an apple-to-apple cloud backup performance comparison between OpenStack Swift Essex on Ubuntu 12.04 LTS and  OpenStack Swift 1.46 + Ubuntu 11.10, as well as demonstrated the functionality of Object Store in the OpenStack Dashboard.

In the following, we will first report our results on some select hardware configurations of proxy and storage node on EC2. Our previous blog (Next steps with the OpenStack Advisor) provides details about these hardware configurations and we use the following four configurations as the example implementations of a “small-scale” Swift cloud.

  • 1 Large Instance based proxy node: 5 Small Instance based storage nodes
  • 1 XL Instance based proxy node: 5 Small Instance based storage nodes
  • 1 CPU XL Instance based proxy node: 5 Small Instance based storage nodes
  • 1 Quad Instance based proxy node: 5 Medium Instance based storage nodes

The Large, XL, CPU XL and Quad instances cover a wide range of CPU and memory selections. For network I/O, Large, XL and CPU XL instances are provisioned with a Gigabit Ethernet (100~120MB/s), while the Quad instance offers 10 Gigabit Ethernet (~1.20GB/s) connectivity.

Again, we use Amanda Enterprise as our application to backup and recover a 10GB data file to/from the Swift cloud to test its write and read throughput respectively. We ensure that one Amanda Enterprise server can fully load the Swift cloud in all cases.

Two systems involved in the comparison are: (1) Ubuntu 11.10 + OpenStack Swift 1.4.6; (2) Ubuntu 12.04 LTS + OpenStack Swift Essex (Configuration parameters of OS, OpenStack and Amanda Enterprise are identical).  In the following, we use 11.10+1.46 and 12.04+Essex as the labels to represent the above two systems.

(1) Proxy node runs on the Large instance and 5 storage nodes run on the Small instances. (Note that the throughput values on y-axis are not plotted from zero)

(2) Proxy node runs on the XL instance and 5 storage nodes run on the Small instances.

(3) Proxy node runs on the CPU XL instance and 5 storage nodes run on the Small instances.

(4) Proxy node runs on the Quad instance and 5 storage nodes run on the Medium instances.

From the above comparisons, we found out 12.04 + Essex performs better than 11.10+1.4.6 in terms of the backup throughput, and the performance gap ranges from 2% - 20% with the average of 9.7%.  For recovery throughput, the average speedup over 11.10+1.4.6 is not as significant as the backup throughput.

We did not dig into as to who (Ubuntu 12.04 LTS or OpenStack Essex) is the cause of this slight improvement on throughput. But we can see that the overall combination performs statistically better. From our initial testing, based on the performance improvements as well as feature improvements, we encourage anyone who is running OpenStack Swift on Ubuntu to upgrade to the latest released versions to take advantages of their new updates. Five years support for 12.04 LTS is a great assurance to maximize ROI for your cloud storage implementation.

Next, we demonstrate the functionality of Object Store within the OpenStack Dashboard.

After we log into the Dashboard and click the “Project” Tab on the left and then the “Containers” under the “Object Store”, we see screen as below:

We can create a container by clicking “Create Container” button and we see the following screen:

After creating a container, we can click the container name and browse the objects associated with that container. Initially, a newly-created container is empty.

We can upload an object to the container by clicking “Upload Object” button:

Meanwhile, we can delete an object from the container by choosing the “Delete Object” from its corresponding drop-down list at the “Actions” column.

Also, we can choose to delete a container by  choosing the “Delete Container” from its corresponding drop-down list at the “Actions” column.

Here, we demonstrate the core functionality of Object Store in OpenStack Dashboard and from the above screenshots, we can observe that Dashboard provides very neat and friendly user interfaces to mange the containers and objects. This saves lot of time to look up command-line syntax for basic functionality.

Congratulations, Ubuntu and OpenStack teams!  The Ubuntu 12.04 + OpenStack Swift Essex Release combination is a great contribution to Open Source and Cloud Storage communities!

Building an OpenStack Swift Cloud: Mapping EC2 to Physical hardware

May 4th, 2012

As we mentioned in an earlier blog that it may seem ironical that we are using a public compute cloud to come up with an optimized private storage cloud. But ready availability of diverse type of EC2 based VMs, makes AWS a great platform for running the Sampling and Profiling phases of the OpenStack Swift Advisor.

After an optimized Swift Cloud is profiled and designed on the virtualized hardware (for example, EC2 instances in our lab), the cloud builders will eventually want to build it on the physical hardware. The question is: how to preserve the cost-effectiveness and guaranteed throughput of the Swift Cloud on the new physical hardware with new data center parameters?

A straightforward answer is to keep the same hardware and software resources in the new hosts. But, the challenge is:  EC2 (this challenge remains if other cloud compute platforms, e.g. OpenStack Compute were used for profiling) provisions the CPU resource for each type of instance in terms of “EC2 Compute Unit”‘, e.g. Large instance has 4 EC2 Compute Units, Quad instance has 33.5 EC2 Compute Units. The question is: how to translate the 33.5 EC2 Compute Units into GHz when you purchase the physical CPUs on the market for the servers? Another ambiguous resource definition associated with EC2 is the network bandwidth. EC2 has 4 standards of network bandwidth: Low, Moderate, High and Very High and for example, EC2 allocates Low bandwidth to Micro instance and Moderate bandwidth to Small instance. But, what does “Low bandwidth” means in terms of MB/s? EC2 specs provide no answers for those.

Here we want to propose a method to translate these ambiguous resource definitions (e.g. EC2 Compute Units) into the standard specifications (e.g. GHz) that can be referred when choosing the physical hardware. We focus on 3 types of hardware resources: CPU, disk and network bandwidth.

CPU: We first choose a CPU benchmark software (e.g. PassMark) and run it on a certain type of EC2 instance to get a benchmark score. Then, we look up the published benchmark scores of that benchmark software to find out which physical CPU got the similar score. For safety, we can choose the physical CPU with a little higher score to ensure it performs no worse than the virtualized CPU in the EC2 instance.

Disk: We roughly assume the I/O patterns in storage nodes are close to sequential, and we can use the “dd” Linux command to benchmark the sequential read and write I/O bandwidths on a certain type of EC2 instance. Based on the I/O bandwidth results in terms of the MB/s, cloud builders can buy the physical storage drives with the matching I/O bandwidths.

Network: To test the maximum bandwidth of a certain EC2 instance within the Swift Cloud, we setup another EC2 instance with very high network bandwidth. e.g. the EC2 Quad instance. First, we install Apache and create a test file (the size of the file depends on the memory size, as discussed later) on both EC2 instances. Then, in order to benchmark the maximum incoming network bandwidth of the EC2 instance, we issue wget command on that EC2 instance to download the test file hosted on the Quad instance. The wget command will give the average network bandwidth after the download is finished and we will use it as the maximum incoming bandwidth. To test the maximum oncoming network bandwidth, we operate the above test in the reversed direction: the Quad instance downloads the test file from the EC2 instance we want to benchmark. The reason we choose wget (instead of e.g. scp) is that wget involves less CPU overhead. Notice that, to remove the interference from the disk I/Os, we ensure the test file can fit into the memory of the EC2 instance so that there are no read I/Os needed. Also, we always execute the wget with “-O /dev/null” to bypass the write I/Os. Once we get the maximum incoming and oncoming network bandwidths, we can choose the right Ethernet components to provision the storage and proxy nodes.

Memory: As to the virtualized memory in EC2 instance, if 10 GB memory is associated with the instance, then it is straightforward to provision 10GB memory in the physical server. So, we feel that there is no translation needed for virtualized memory.

Other cloud management platforms may offer several types of instances (e.g. large, medium, small) based on their own terminologies. We can use the similar methods as above to benchmark each type of instances they offer and find the matching physical hardware.

To fully ensure that the throughput of the Swift Cloud while mapping from the EC2 instances, we advise the cloud builders to provision the physical hardware with at least 10% better specs than deduced by above translation.

Here, we show an example of how to map an EC2 c1.xlarge instance to physical hardware:

CPU: We run Pass Mark CPU benchmark on c1.xlarge. The CPU score from PassMark is: 7295. Considering to provision 10% more resource when translating from virtualized hardware to physical hardware, some choices on physical CPU include: Intel Xeon E3-1245 @ 3.30 GHz, Intel Xeon L5640 @ 2.27GHz, Intel Xeon E5649 @ 2.53 GHz etc.

Memory: As c1.xlarge instance is allocated 7GB memory,  so we could choose 8GB memory (4GB x 2 or 2GB x 4) in the physical machine.

Disk: By using the “dd” command, we found out c1.xlarge instance has 100-120 MB/s for sequential read and 70-80MB/s for sequential write, which matches to a typical 7,200 RPM based drive. Therefore, most HDDs on the market can be safe to use as data disks in the physical machine.

Network: c1.xlarge instance has around 100 MB/s network bandwidth for both incoming and outgoing traffic, which corresponds to a 1Gigabit Ethernet interface. So, a typical 1Gigabit Ethernet should be enough for networking for the physical machine.

If you are thinking of putting together a storage cloud service, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com