Archive for August, 2012

Cyberduck with support for Keystone-based OpenStack Swift

Tuesday, August 28th, 2012

Cyberduck is a popular open source storage browser for several cloud storage platforms. For OpenStack Swift, Cyberduck is a neat and efficient client that enables users to upload/download objects to/from their Swift storage clouds. However, the latest version (4.2.1) of Cyberduck does not support  Keystone-based authentication method.  Keystone is an identity service used by OpenStack for authentication and authorization. We expect Keystone to be the standard identity service for future Swift clouds.

There has been intensive discussions on how to make Cyberduck work with Keystone-based Swift, for example [1], and this issue has been listed as the highest priority for the next release of Cyberduck.

So, we decided to dig into making Cyberduck work with Keystone-based Swift. First we start by thanking the Cyberduck team for making compilation tools available to enable this task. Second, special thanks to David Kocher for guiding us through the process.

The key is to make java-cloudfiles API support Keystone first, because Cyberduck needs java-cloudfiles API to communicate with Swift. We thank AlexYangYu for providing the initial version of the modified java-cloudfiles API that supports Keystone. We made several improvements based on that and our fork is available here:

https://github.com/zmanda/java-cloudfiles.git

The high-level steps are to replace the older cloudfiles-1.9.1.jar in the lib directory of Cyberduck with java-cloudfiles.jar that supports Keystone authentication. Besides, we also need to copy org-json.jar from the lib directory of java-cloudfiles to the lib directory of Cyberduck.

In order to make sure Cyberduck uses the modified java-cloudfiles API, Cyberduck needs to be re-compiled after making above changes. Generally, we need to follow the steps here to set the Authenticate Context Path. But, we need to add the following information to the AppData\Cyberduck.exe_Url_*\[Version]\user.config file

<setting name=”cf.authentication.context” value=”/v2.0/tokens” />

After that, we can run the re-compiled Cyberduck and associate it with a Swift cloud. For example,

In the field Username, we need to use the following style: username:tenant_name. The API Access Key is the password for the username. If the authentication information is correct, we will see that  Cyberduck has been successfully connected to the Keystone-based Swift Cloud Storage.

The following images show that you can use Cyberduck to save any kind of files, e.g. pictures and documents, on your Swift cloud storage. You can even rename any files and open them for editing.

You can download our version of Cyberduck for Windows with support for Keystone by running git clone https://github.com/zmanda/cyberduck or from here. Once the file is unzipped, you can execute cyberduck.exe to test against your Keystone-based Swift.

If you want to know more detail about how we made this work, or you would like to compile or test for other platforms, e.g. OS X, please drop us a note at swift@zmanda.com

Storing Pebbles or Boulders: Optimizing Swift Cloud for different workloads

Thursday, August 23rd, 2012

While many storage clouds are built as multi-purpose clouds, e.g. to store backup files, images, documents etc., but a cloud builder may be tasked to build and optimize the cloud for a specific purpose. E.g. a storage cloud made for cloud backup may optimize for large object sizes and frequent writes, whereas a storage cloud built for storing images may be optimized for relatively smaller objects with frequent reads.

OpenStack Swift provides a versatile platform to build storage clouds for various needs. As we discussed in our last blog, a cloud builder can choose faster I/O devices for storing their Container database to enhance performance under some scenarios. However, a careful analysis is required to determine under what scenarios the investment in the faster I/O devices for the container DB makes sense. More broadly, we are interested in how to properly provision the Swift cloud for different workloads.

In the first part of this blog, we will focus on how to provision the I/O devices for the container DB. After that, our discussion will be generalized on how to provision the storage nodes under the workloads that contain either small or large objects. We understand that in the real world, the object sizes in a workload may be varied in a wide range. However, in order to study the broad question of provisioning the Swift cloud, it is instructive to consider and separate two extreme workloads in which most objects in a workload are either pebble-sized or boulder-sized.

We will first present the experiments to show how to provision the I/O devices for the container DB with the workloads differing in object sizes.

Experimental Results

Workload Generator

As we did in our last blog, we use Swift-bench as the workload generator to benchmark the Swift cloud in terms of # PUT operations per second. We configured Swift bench for our experiments as follows:

object_size: we use 10KB or 1MB as the average object size to simulate two different workloads: (1) the average size of the objects in the workload is relatively small. (2) The average size of the objects in the workload is relatively large. Some real-world examples of small objects could be the PDF, MS Word documents or the JPEG-format pictures. While the backup or archiving data is usually large in size. (Note that: the real workloads in productions may have even larger average object size. But comparing Swift’s behavior for 10KB sized objects vs. 1MB sized objects provides useful insights to predict behavior as size of objects gets larger. Also, an application like Amanda Enterprise will typically chunk the archives into smaller objects before transferring to the cloud.)

concurrency: we set this parameter to 500 in order to saturate the Swift cloud.

num_container: we use 10 containers in our experiments. This may e.g. imply that there are 10 users of this storage cloud.

num_put: when the object size is 10KB, we upload (PUT) 10 million of such objects to the Swift cloud. However, when the object size is 1MB, we upload (PUT) 100K of such objects. As discussed in [2], the performance of container DB degrades (e.g. to 5-10 updates per second for the container DB) when the number of objects in each container is in the order of magnitude of millions. Since we have 10 containers, our target is to have 1 million objects in each container. So, we set 10 million for the num_put parameter for 10 KB objects. In order to have an equivalent total upload size, we set the num_put parameter to 100K when we upload 1MB objects.

Testing bench

Two types of EC2 instances are used to implement a small-scale Swift cloud with 1 proxy node and 4 storage nodes.

Proxy node: EC2 Cluster Compute Quadruple Extra large Instance (23 GB of memory, 33.5 EC2 Compute Units)

Storage node: High-CPU Extra large Instance (7GB of memory, 20 EC2 Compute Units)

Recently, AWS released the new EBS volume based on the Provisioned IOPS, which lets the AWS user specify the IOPS ( from 100 to 1000) for each EBS volume that will be attached to an EC2 instance. For example, an EBS with 1000 IOPS indicates that it can achieve a maximum of 1000 IOPS (for 16KB I/O request size) regardless of the I/O access pattern. So, a cloud builder can experiment with an EBS volume with higher IOPS to simulate a faster I/O device.

As mentioned in our last blog, the current version of Swift-bench only allows using 1 account. But an unlimited number of containers can be stored in that account. So, our benchmark is executed based on the following sequence: log into 1 existing account, then create 10 containers, and then upload 10 million or 100K objects (depending on the object size). In our experiments, we measure the upload (PUT) operations per second.

Two implementations of Swift cloud are compared: (1) Swift with 1000-IOPS based container DB (We call this 1000-IOPS Swift) and (2) Swift with 500-IOPS based container DB (We call this 500-IOPS Swift).

The 1000-IOPS Swift is implemented with 1 proxy node and 4 storage nodes. Each storage node attaches 9 of 1000-IOPS EBS volumes for storing all objects, 1 of 200-IOPS EBS volume for storing the account DB and 1 of 1000-IOPS EBS for storing the container DB.

The 500-IOPS Swift is implemented with 1 proxy node and 4 storage nodes. Each storage node attaches 9 of 1000-IOPS EBS volumes for storing all objects, 1 of 200-IOPS EBS volume for storing the account DB and 1 of 500-IOPS EBS for storing the container DB.

The proxy node has 10Gbps Ethernet, while the storage node has 1Gbps Ethernet.

Software Settings

We use OpenStack Swift version 1.6.1 and the authentication method on proxy node is TempAuth. All proxy, container, account and object-related parameters are set to Defaults, except: in proxy-server.conf, #worker = 500; in account-server.conf, # workers = 32; in container-server.conf, #worker = 32 and db_preallocation = on; in object-server.conf, # workers = 32.

The Swift-bench, proxy and authentication services run on the proxy node and we ensure that the proxy server is never the bottleneck of the Swift cloud. The account, container, object and rsync services run on the storage nodes.

The number of replicas in the Swift cloud is set to two and the main memory of each node is fully utilized for caching the files and data.

Benchmark results

Figure 1 show the operation rate (operations per second on the Y-axis) of the PUT operation for the two Swift implementations over the benchmark window, when the object size is 10KB. Overall, as seen from Figure 1, we notice that when the object size is set to 10KB, the 1000-IOPS Swift achieves higher operation rate than the 500-IOPS, and 68% extra operation rate is observed when 10 million objects have been uploaded.

Figure 1: Comparing two Swift implementations when the object size is 10KB


To compare with Figure 1, we also plot the operation rate of the PUT operation when the object size 1MB, as shown in Figure 2.

Figure 2: Comparing two Swift implementations when the object size is 1MB


In contrast with Figure 1, the two Swift implementations show the same performance when the object size is 1MB. Moreover, in Figure 1, when the objects (10KB size) are being uploaded, the performance of the two Swift implementations kept decreasing from first object upload onwards. . However, when the object size is 1MB (see Figure 2), the performance of the two Swift implementations increases initially and then becomes stable after the initial stage.

We conclude the results in Figure 1 and Figure 2 as follow:

(1) For the upload workload that mostly contains small objects (e.g. 10KB in our test), it is a good practice to use faster I/O devices for the container DB, because each small object can be quickly uploaded to I/O devices, the container DB should have a faster I/O device to keep up with the fast speed of uploading small objects.

(2) For the upload workload that mostly contains larger objects, using faster I/O devices for the container DB does not make much sense. This is because the storage node spends more time on storing the large objects to the I/O devices and consequently, the update frequency of the container DB is relatively slow. So, there is no need to supply the container DB with faster I/O devices.

Besides the discussion on how to provision the I/O device for the container DB, we also want to discuss how to provision other types of resources in the storage node for these two workloads. To this end, we also monitored the CPU usage, network bandwidth and the I/O devices (that are used for storing the objects) of the storage node during the runs of our benchmarks and summarize our observations below.

CPU: Comparing to case of uploading large objects, we note that the CPU usage is higher when the small objects are being uploaded. The reason is the object service is much busier to handle the newly uploaded small objects every second. (2) the container service has to deal with more updates generated from the container DB. Thus, more CPU resource in the storage node will be consumed when uploading the small objects.

Network bandwidth: Uploading large objects will consume more network bandwidth. This is can be verified by Figure 1 and Figure 2: in Figure 1, when the 10 millions of objects are uploaded, the operation rate of 1000-IOPS Swift is 361 and the object size is 10KB, so the total network bandwidth is about 3.5 MB/s. However, while uploading the large objects (see Figure 2), when 100K of objects are uploaded, the operation rate of 1000-IOPS Swift is 120 and the object size is 1MB, so the total network bandwidth is around 120 MB/s.

I/O devices for storing the objects: The I/O pattern of those I/O devices is more random when the small objects are being uploaded. This can be verified by Figure 3, where we plot the distribution of the logical block distance (LBN) distance between two successive I/Os. As seen from Figure 3, when uploading the objects of 1MB size, only 9% of successive I/Os are separated more than 2.5 million LBN away. However, for the case of uploading the objects of 10KB size, about 38% of successive I/Os are more than 2.5 million LBN away. So, this comparison shows that the I/O pattern generated by uploading 1MB objects is much less random. For reference we also plot the pattern for a large sequential write on the same storage node. We observe that for the case of uploading 1MB objects, 70% of successive I/Os are more than 80 and less than 160 LBN away, which is also the range where most of the successive I/Os for Sequential Write fall into.

Figure 3: The distribution of logical block number (LBN) between two successive I/Os for the 1MB object size and 1KB object size. (“M” denotes Million in x-axis)


To summarize, the important take-away points from the above discussion are:

For the upload workload that mostly contains small objects (pebbles), it will be rewarded for higher operation rate by provisioning the storage node with faster CPU, faster I/O devices for the container DB and only moderate network bandwidth. We should avoid using I/O devices with very low random IOPS for storing the objects, because the I/O pattern from those I/O devices is not sequential, and this will become the bottleneck of the storage node. So, use of SSDs can be considered for this workload.

For the upload workload that mostly contains the large objects, it is adequate to provision the storage node with the commodity CPU and moderate I/O speed for the container DB. In order to have better throughput (MB/s) of the Swift cloud, it is recommended to choose a large bandwidth network, and the I/O devices with high sequential throughput (MB/s) for storing the objects. IOPs are not critical for this workload, so standard SATA drives may be sufficient.

Of course, these choices have to be aligned with higher level choices e.g. number of storage and proxy nodes. Overall, cloud builders can benefit from the optimization practices we mentioned in our Swift Advisor blog.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Should You Consider SSDs for OpenStack Swift Cloud Storage?

Monday, August 6th, 2012

With an ongoing move in the industry to move towards building data centers with SSD based storage, there has been a lot of interest in the OpenStack Swift community to consider the faster I/O devices when deploying Swift based storage cloud, such as the discussions in [1], [2] and [3]. Cloud builders are interested in significantly increasing the limit of objects per container by using faster I/O devices. In these discussions a general consensus is to place the account and container databases on the faster I/O devices (e.g. HDD RAID 10 or SSD) for better performance. We expect the desire for ever faster Storage Clouds to increase as faster network infrastructures, e.g. Google Fiber become commonplace.

In this series of blogs we focus on using faster I/O devices in Swift and seek to answer the following questions:

(1) What kinds of workloads can benefit from using the faster I/O devices for the container and account services?

(2) How much extra performance can be expected?

We also evaluate which approach provides the best ROI when upgrading from non-RAID based HDD:

(1) Switch to the SSD-based container and account services, i.e. replace some regular HDDs with SSDs?

(2) Upgrade to HDD RAID (from non-RAIDED HDD) with lower upfront costs?

(3) Or take potentially more convenient approach and install more main memory in each storage node to increase the file caching effect without upgrading the I/O subsystem?

Let’s start with answering the question on how much extra performance can be expected by using SSDs.

Experimental Results

1.    Workload Generator

Our workload generator for our benchmark test is Swift-bench, which can directly send workloads to a Swift instance.  Using Swift-bench we will benchmark three common operations for the Swift Cloud: upload an object (#PUTs/sec), download an object (#GETs/sec) and delete an object (#DELETEs/sec). Swift-bench allows us to tune the workloads based on the following parameters:

object_size: defines the size of an object (or a file) that will be uploaded. In our tests, we use 40 KB as the object size parameter.

concurrency: defines how many concurrent threads will be launched for a benchmark test. In order to saturate the Swift Cloud, we set 256 to be the concurrency parameter.

num_objects: the number of objects in total that will be uploaded to Swift. Here, we use 1 million for this parameter. Given each object size is 40 KB, the total size of the objects to be uploaded is 40GB.

num_gets: the number of objects in total that will be downloaded from Swift. To be equivalent to the num of objects parameter, we use 1 million here.

num_containers: the number of containers that will be created for storing the objects. The objects are uniformly distributed across all containers. Since we want to see if this parameter will affect the performance of the Swift Cloud, in our tests we use two values for this parameter: 10K and 20K.

2.    Testing bench

(Some of the initial testing for this blog was done on SSDs provided by STEC, Inc. We thank STEC for their contribution to this study.)

We leverage the following two types of EC2 instances to implement a small-scale Swift cloud with one proxy node and two storage nodes.

Proxy node: EC2 Cluster GPU Quadruple Extra Large Instance (22 GB of memory, 33.5 EC2 Compute Units)
Storage nodes: EC2 High I/O Quadruple Extra Large Instance (60.5 GB of memory, 35 EC2 Compute Units, 2 SSD-based storage volumes)

Each storage node attaches 20 EBS volumes for storing the objects. All EBS volumes are homogenous and each EBS volume is 25GB.

The current version of Swift-bench only allows using 1 account and unlimited number of containers in that account. So, one benchmark run is based on the following sequence: log into 1 existing account, then create 10K or 20K containers, and then upload 1 million objects. After that, download 1 million objects, and finally delete 1 million objects. We measure the upload, download and delete operations per second.

We will compare two Swift Cloud implementations: (1) Swift with HDD-based container DB (we call this HDD-based Swift for short) and (2) Swift with SSD-based container DB (we call this SSD-based Swift for short).

The HDD-based Swift is implemented by 1 proxy nodes and 2 storage nodes. Each storage node attaches 20 EBS volumes for storing all objects, 1 EBS volume for storing the account DB, and 1 EBS for storing the container DB.

The SSD-based Swift is implemented by 1 proxy nodes and 2 storage nodes. Each storage node attaches 20 EBS volumes for storing all objects, 1 EBS volume for storing the account DB, and 1 SSD-based volume for storing the container DB.

The proxy and storage nodes are connected via 10Gbps network.

3.    Software Settings

We use OpenStack Swift version 1.5.1 and the authentication method on proxy node is TempAuth. All proxy, container, account and object-related parameters are set to Defaults, except the #workers. (#worker defines how many processes to fork. In proxy-server.conf, #workers = 256; in account-server.conf, #workers = 32; in container-server.conf, #workers = 32; in object-server.conf, #workers = 32.)

The swift-bench, proxy and authentication services run on the proxy node, while the account, container, object and rsync services run on the storage nodes.

The number of replicas is set to two. So, each storage node holds one replica.

The main memory (containing the file cache) can also be a very important factor in the benchmark performance. In this blog, we will focus on the case when the total size of the objects stored in a storage node is much larger than the memory size (This is a valid scenario for the PB-level data center). In this case, since most of objects can not be fully cached in the memory, the file cache effect will not be very significant. To simulate this scenario, we choose to “Disable” file cache in the OS (by running “echo 3 > /proc/sys/vm/drop_cache” every second to drop the pagecache, dentries and inodes. The overhead of running this command itself is minimal and will not impact the benchmark performance.) In our later blogs, we will consider the file cache effect of using different memory sizes for the storage node.

4.    Benchmark results

Figure 1 . Comparing two Swift implementations (the file cache is disabled)

Figure 1 shows the operation rate (operations per second on the Y-axis) for three different operations: PUT, GET and DELETE for each Swift implementation, when the number of containers is set to 10K and 20K. As seen in Figure 1, we found that the SSD-based Swift outperforms the HDD-based Swift across all three operations.

Moreover, for the SSD-based Swift, the different number of containers does not affect its performance very much. For example, the PUT, GET and DELETE rates for the SSD-based Swift (shown as the blue bar in Figure 1) are stable when the number of containers is increased from 10K to 20K.

However, the different number of containers does matter to the HDD-based Swift. As seen from Figure 1, the PUT, GET and DELETE rates for the HDD-based Swift (shown as the red bar in Figure 1) at 10K containers are higher than the case of 20K containers. This behavior shows that as the number of containers increases, more overheads will be incurred that negatively impact the performance of the HDD-based Swift.

To further study the overhead for the HDD-based Swift when the number of containers is increased from 10K to 20K, we use the blktrace command to keep track of the logical block number (LBN) of each I/O on the EBS volume that stores the container DB. Then, we calculate the percentage of two successive I/Os that are separate apart more than 125,000 LBN away, in order to observe how random is the I/O pattern that is generated from the container DB. For completeness, we additionally run the benchmark for 5K containers. The distribution is shown in Figure 2.

Figure 2. The percentage of two successive I/Os that are separate apart more than 125000 LBN away

As seen from Figure 2, we notice that about 86% of two successive I/Os are separate apart more than 125000 LBN away for 20K containers, while this is only 77% for 10K containers and 58% for 5K containers. So, as the number of containers increases, the I/O pattern generated from the container DB is more random.

Since HDDs tend to not perform very well for the random I/O access patterns, while SSDs (especially the enterprise SSD) can efficiently handle the random I/Os, therefore when the number of containers becomes large, it tends to hinder performance of container DB in the HDD-based Swift, while the SSD-based Swift has no such issue.

To have a better view of Figure 1, we summarize the speedups of the SSD-based Swift over the HDD-based Swift, as shown in Table 1.

10K containers

20K containers

PUT

25%

70%

GET

7%

12%

DELETE

49%

109%

Table 1. Comparing two Swift implementations

From Table 1, we note that the speedup effect of using SSD (as a backend for container DB) is most significant when doing the DELETE operations (e.g. 49% for 10K containers and 109% for 20K containers), then followed by the PUT operation (e.g. 25% for 10K containers and 70% for 20K containers). The speedup effect on the GET operation is very small (e.g. 7% for 10K containers and 12% for 20 K containers). This behavior has been explained in [1] — the B trees (maintained by the container DB) need to be updated as the objects are put or deleted from the containers. However, when reading the objects (e.g. GET operation), there are no updates on the B trees.

As a conclusion for Table 1 and Figure 1, we note that:

(1) When the file cache effect is minimum, the SSD-based Swift largely outperforms the HDD-based Swift. The extra performance from using the SSDs depends on the certain type of operation: PUT and DELETE operations benefit most from using the SSDs, because they incur a large amount of concurrent updates on the B trees which will generate lots of mixed random read and write I/Os. As we known, the SSDs are far more efficient than HDDs to handle the random I/O pattern, so these types of workloads will certainly benefit from the SSD-based Swift. However, for the GET operation, the extra performance from using the SSDs is small, this is because the GET operation will not incur any update on the B trees.

(2) Since the I/O speed of HDDs is sensitive to the randomness of I/O pattern, the performance of the HDD-based Swift can be negatively impacted by the increasing number of containers, which will generate more random I/O pattern on the storage for the container DB. However, when using SSD for the container DB, the SSD-based Swift will not be affected by the increasing number of containers.

Conclusions and Next Steps

Overall, we have identified some scenarios when the workloads will benefit most from the SSD-based Swift. E.g. the PUT and DELETE are the dominant operations, a large number of containers are used (so, more challenging I/O pattern from the container DB), but has small main memory (the cache effect is minimum). One example of such workload is a storage cloud used for backups for a large number of users. In most such implementations, users will have a separate containers on the cloud, they will be doing mostly PUT operations, and DELETE operation will also be executed frequently to implement the retention policy set by the users.

It is likely that when the object size is large (e.g. 50-100MB), the storage node will be bottlenecked at the I/O devices for storing the large objects. Then, the frequency of updating the container DB will be low. In this case, using HDD as the backend for container DB is adequate, because the workload for the container DB is very light. More generally, when the Swift is not always saturated by the workloads (e.g. low concurrency of active users), it may also be fine to place the container DB on HDD.

It is an interesting issue to identify the tradeoffs (both for performance and cost) between using the SSD-based Swift with small main memory per node and using the HDD-based Swift with large main memory per node. Because the SSDs and large main memory can both help to push up the performance: the SSDs can greatly reduce the wait time of each outstanding I/O (especially for random I/O), while the large main memory can cache the “hot data” (e.g. container DB, account DB and most frequently visited objects) to eliminate lots of actual disk visits and quickly serve the user requests just from the memory.

Since SSD is just one example of the large body of faster I/O devices, our future direction is to broadly discuss the following question:  for a certain type of workload, what is the most cost-effective method to implement a Swift cloud when the cloud builders have the choice of several different combinations of faster I/O devices.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at swift@zmanda.com