How swift is your Swift? Benchmarking OpenStack Swift.

The OpenStack Swift project has been developing at a tremendous pace. The version 1.6.0 was released in August followed by 1.7.4 (Folsom) just after two months!  In these two recent releases, many important features have also been implemented, for example the optimization for using SSD, object versioning, StatsD logging and much more – many of these features have significant implications for performance planning for the cloud builders and operators.

As an integral part of deploying a cloud storage platform based on OpenStack Swift, benchmarking a Swift cluster implementation is essential before the cluster is deployed for production use. Preferably the benchmark should simulate the eventual workload that the cluster will be subjected to.

In this blog, we discuss following Swift benchmarking concepts:
(1)    Benchmark Dimensions for Swift cluster: performance, scalability and degraded-mode performance (e.g. when hardware and software failures happen).
(2)    Sample workloads for Swift cluster

Benchmark Tools for Swift

There are currently two Swift benchmark tools available: swift-bench and COSBench.

swift-bench is a command-line benchmark tool that is shipped along with Swift distribution. Recently,  we improved swift-bench to allow for random object sizes and better usability.

COSBench is a fairly new web-based benchmark tool, led by the researchers at Intel. Fortunately, we obtained a trial version of COSBench. Based on our initial experience with COSBench, we believe it represents a very helpful tool, and may become the the de facto Swift benchmarking tool in the future.

Benchmark Dimensions

Dimension 1 – Performance

The performance dimension is to measure the performance of the Swift cluster when it is under a certain load. The performance metrics can be specified in many ways. In most cases, the cloud operators will be interested in the following four performance metrics:

(1)    The average throughput (number of operations per second)
(2)    The average bandwidth (MB/s)
(3)    The average response time of all requests.
(4)    Response time for a certain percentage of requests (e.g. 95 percentile).

To measure the performance, we first need to populate a Swift cluster with some data (i.e. objects) to simulate an initial stage. The size of the initially loaded objects can be controlled by the inputs of the benchmark client. Subsequently, a pre-defined workload is executed against the Swift cluster while the performance is measured.

When measuring the performance, there is one key issue we need to pay attention to:  First, we need to carefully adjust the number of threads because it determines how much workload the benchmark clients will generate against the Swift cluster. Since we want to measure the performance of the Swift cluster when it is under load or saturated, we need to increase the number of threads, until the point at which the bandwidth/throughput becomes stable and the average response time starts to increase very sharply.

As the number of threads increases, the benchmark client will get busier. We need to make sure that it has enough resources (CPU, memory, network bandwidth) to use and should not be the performance bottleneck.

While the performance of the client software (Cyberduck, Cloud Backup software etc.), that is connecting with Swift, is an important factor in the overall usability of the storage cloud, the scope of this blog is the performance of the storage cloud platform itself.

Dimension 2 – Scalability

The benchmark on scalability is to test if a Swift cluster can scale out gracefully by adding more servers and other resources. We can conduct this benchmark in the following steps:  we proportionally add more servers for each type of node in the Swift cluster. For example, we double the number of the storage nodes and proxy nodes with the same hardware and software configurations. Then, we run the same workloads to measure the performance. If a Swift cluster can scale out nicely, then its bandwidth/throughput will be increased in proportion to the number of new servers we added in. Otherwise, the cloud operators should analyze what is the bottleneck to prevent it from scaling well.

To simulate a real-world scenario, we need to test the scalability of a Swift cluster while it is running. As suggested by a blog from SwiftStack, cloud operators may consider adding new servers gradually in order to avoid the performance degradation because of the data movement between the existing and new servers. During the measurement, we want to observe: (1) if the Swift cluster operates normally (i.e. no period of service disruption) and (2) the increase on performance when the new servers are added into the Swift cluster.

Dimension 3 – Degraded Mode Performance

The cloud operators will face hardware or software failures at some points. If their objective is to ensure that their clusters will perform at a certain level (e.g. abide by the performance SLA) even in face of the failures, they should benchmark their Swift cluster appropriately upfront.

The most straightforward way to measure the availability of a Swift cluster is to intentionally shut down some nodes and measure the number of errors (e.g. failed operations) and performance degradation when the Swift is running in the degraded mode.

There are some factors that increase the complexities of benchmarking the degraded Swift cluster. For example, the failures can happen at every possible system level. For example, I/O devices, OS, Swift processes or even the entire server. The impact of failures is different when they occur at different levels. So, the failure scenarios at all system levels need to be considered. Such as, to simulate a disk failure, we may intentionally umount the disk; To simulate a Swift process failure, we need to kill some or all Swift processes on a node; To simulate an OS or entire server failure, the server could be temporarily powered off; Or a whole zone could be powered off (to simulate power failure of an entire rack of servers).

By combining the above considerations together, we notice that the total problem space for analyzing all failure scenarios may be very huge for a large-scale Swift cluster. So, it is more practical to prioritize those failure scenarios. For example, only the worst scenarios or more common scenarios are evaluated first.

In our presentation at the coming OpenStack Summit, we will present our empirical results to show how a Swift cluster performs when the hardware failures occur.

Sample Workloads

The COSBench tool allows users to define a Swift workload based on the following two aspects: (1) range of the object sizes in the workload (e.g. from 1MB to 10MB). (2) the ratio of PUT, GET and DELETE operations (e.g. 1:8:1).

The object sizes in a workload may have certain distributions. For example, uniform, Zipfan and more. At this point, based on our experiences with COSBench, it assumes the object sizes are uniformly distributed within the pre-defined range. Plus, it assumes all objects have the equal possibility to be accessed by the GET operation. It may be a good direction for COSBench to add more choices on the distribution when the users want to specify the object size and access pattern.

In the following table, we provide some sample Swift workloads in the following table.

Upload Intensive

Download Intensive

Small Objects (size range:1KB-100KB)

GET: 5%, PUT: 90%, DELETE:5%

Example: Online gaming hosting service — the game sessions are periodically saved as the small files which record the user profiles and game information in the order of the time series.

GET: 90%, PUT: 5%, DELETE:5%

Example: Website hosting service — once a new webpage is published by the owner, lots of read requests will hit on the new webpage.

Large Objects (size range:1MB – 10MB)

GET: 5%, PUT: 90%, DELETE:5%

Example: Enterprise Backup — small files are compressed into large trunk of data and backed up to cloud storage. Occasionally, the recovery and delete operations are needed.

GET: 90%, PUT: 5%, DELETE:5%

Example: Online video sharing service — once the new video clips are uploaded, lots of download traffic will be generated when people watch those new video clips.

Plus, the benchmark users are free to define their own favorite workloads based on the two inputs: range of object sizes and ratio between PUT, GET and DELETE operations.

We will discuss above dimensions and benchmarks workloads in detail in future blogs, as well as at our presentation at the OpenStack Summit in San Diego (Presentation at 4:10PM on October 18th). We hope to see you there.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at


Comments are closed.