Should You Consider SSDs for OpenStack Swift Cloud Storage?

With an ongoing move in the industry to move towards building data centers with SSD based storage, there has been a lot of interest in the OpenStack Swift community to consider the faster I/O devices when deploying Swift based storage cloud, such as the discussions in [1], [2] and [3]. Cloud builders are interested in significantly increasing the limit of objects per container by using faster I/O devices. In these discussions a general consensus is to place the account and container databases on the faster I/O devices (e.g. HDD RAID 10 or SSD) for better performance. We expect the desire for ever faster Storage Clouds to increase as faster network infrastructures, e.g. Google Fiber become commonplace.

In this series of blogs we focus on using faster I/O devices in Swift and seek to answer the following questions:

(1) What kinds of workloads can benefit from using the faster I/O devices for the container and account services?

(2) How much extra performance can be expected?

We also evaluate which approach provides the best ROI when upgrading from non-RAID based HDD:

(1) Switch to the SSD-based container and account services, i.e. replace some regular HDDs with SSDs?

(2) Upgrade to HDD RAID (from non-RAIDED HDD) with lower upfront costs?

(3) Or take potentially more convenient approach and install more main memory in each storage node to increase the file caching effect without upgrading the I/O subsystem?

Let’s start with answering the question on how much extra performance can be expected by using SSDs.

Experimental Results

1.    Workload Generator

Our workload generator for our benchmark test is Swift-bench, which can directly send workloads to a Swift instance.  Using Swift-bench we will benchmark three common operations for the Swift Cloud: upload an object (#PUTs/sec), download an object (#GETs/sec) and delete an object (#DELETEs/sec). Swift-bench allows us to tune the workloads based on the following parameters:

object_size: defines the size of an object (or a file) that will be uploaded. In our tests, we use 40 KB as the object size parameter.

concurrency: defines how many concurrent threads will be launched for a benchmark test. In order to saturate the Swift Cloud, we set 256 to be the concurrency parameter.

num_objects: the number of objects in total that will be uploaded to Swift. Here, we use 1 million for this parameter. Given each object size is 40 KB, the total size of the objects to be uploaded is 40GB.

num_gets: the number of objects in total that will be downloaded from Swift. To be equivalent to the num of objects parameter, we use 1 million here.

num_containers: the number of containers that will be created for storing the objects. The objects are uniformly distributed across all containers. Since we want to see if this parameter will affect the performance of the Swift Cloud, in our tests we use two values for this parameter: 10K and 20K.

2.    Testing bench

(Some of the initial testing for this blog was done on SSDs provided by STEC, Inc. We thank STEC for their contribution to this study.)

We leverage the following two types of EC2 instances to implement a small-scale Swift cloud with one proxy node and two storage nodes.

Proxy node: EC2 Cluster GPU Quadruple Extra Large Instance (22 GB of memory, 33.5 EC2 Compute Units)
Storage nodes: EC2 High I/O Quadruple Extra Large Instance (60.5 GB of memory, 35 EC2 Compute Units, 2 SSD-based storage volumes)

Each storage node attaches 20 EBS volumes for storing the objects. All EBS volumes are homogenous and each EBS volume is 25GB.

The current version of Swift-bench only allows using 1 account and unlimited number of containers in that account. So, one benchmark run is based on the following sequence: log into 1 existing account, then create 10K or 20K containers, and then upload 1 million objects. After that, download 1 million objects, and finally delete 1 million objects. We measure the upload, download and delete operations per second.

We will compare two Swift Cloud implementations: (1) Swift with HDD-based container DB (we call this HDD-based Swift for short) and (2) Swift with SSD-based container DB (we call this SSD-based Swift for short).

The HDD-based Swift is implemented by 1 proxy nodes and 2 storage nodes. Each storage node attaches 20 EBS volumes for storing all objects, 1 EBS volume for storing the account DB, and 1 EBS for storing the container DB.

The SSD-based Swift is implemented by 1 proxy nodes and 2 storage nodes. Each storage node attaches 20 EBS volumes for storing all objects, 1 EBS volume for storing the account DB, and 1 SSD-based volume for storing the container DB.

The proxy and storage nodes are connected via 10Gbps network.

3.    Software Settings

We use OpenStack Swift version 1.5.1 and the authentication method on proxy node is TempAuth. All proxy, container, account and object-related parameters are set to Defaults, except the #workers. (#worker defines how many processes to fork. In proxy-server.conf, #workers = 256; in account-server.conf, #workers = 32; in container-server.conf, #workers = 32; in object-server.conf, #workers = 32.)

The swift-bench, proxy and authentication services run on the proxy node, while the account, container, object and rsync services run on the storage nodes.

The number of replicas is set to two. So, each storage node holds one replica.

The main memory (containing the file cache) can also be a very important factor in the benchmark performance. In this blog, we will focus on the case when the total size of the objects stored in a storage node is much larger than the memory size (This is a valid scenario for the PB-level data center). In this case, since most of objects can not be fully cached in the memory, the file cache effect will not be very significant. To simulate this scenario, we choose to “Disable” file cache in the OS (by running “echo 3 > /proc/sys/vm/drop_cache” every second to drop the pagecache, dentries and inodes. The overhead of running this command itself is minimal and will not impact the benchmark performance.) In our later blogs, we will consider the file cache effect of using different memory sizes for the storage node.

4.    Benchmark results

Figure 1 . Comparing two Swift implementations (the file cache is disabled)

Figure 1 shows the operation rate (operations per second on the Y-axis) for three different operations: PUT, GET and DELETE for each Swift implementation, when the number of containers is set to 10K and 20K. As seen in Figure 1, we found that the SSD-based Swift outperforms the HDD-based Swift across all three operations.

Moreover, for the SSD-based Swift, the different number of containers does not affect its performance very much. For example, the PUT, GET and DELETE rates for the SSD-based Swift (shown as the blue bar in Figure 1) are stable when the number of containers is increased from 10K to 20K.

However, the different number of containers does matter to the HDD-based Swift. As seen from Figure 1, the PUT, GET and DELETE rates for the HDD-based Swift (shown as the red bar in Figure 1) at 10K containers are higher than the case of 20K containers. This behavior shows that as the number of containers increases, more overheads will be incurred that negatively impact the performance of the HDD-based Swift.

To further study the overhead for the HDD-based Swift when the number of containers is increased from 10K to 20K, we use the blktrace command to keep track of the logical block number (LBN) of each I/O on the EBS volume that stores the container DB. Then, we calculate the percentage of two successive I/Os that are separate apart more than 125,000 LBN away, in order to observe how random is the I/O pattern that is generated from the container DB. For completeness, we additionally run the benchmark for 5K containers. The distribution is shown in Figure 2.

Figure 2. The percentage of two successive I/Os that are separate apart more than 125000 LBN away

As seen from Figure 2, we notice that about 86% of two successive I/Os are separate apart more than 125000 LBN away for 20K containers, while this is only 77% for 10K containers and 58% for 5K containers. So, as the number of containers increases, the I/O pattern generated from the container DB is more random.

Since HDDs tend to not perform very well for the random I/O access patterns, while SSDs (especially the enterprise SSD) can efficiently handle the random I/Os, therefore when the number of containers becomes large, it tends to hinder performance of container DB in the HDD-based Swift, while the SSD-based Swift has no such issue.

To have a better view of Figure 1, we summarize the speedups of the SSD-based Swift over the HDD-based Swift, as shown in Table 1.

10K containers

20K containers

PUT

25%

70%

GET

7%

12%

DELETE

49%

109%

Table 1. Comparing two Swift implementations

From Table 1, we note that the speedup effect of using SSD (as a backend for container DB) is most significant when doing the DELETE operations (e.g. 49% for 10K containers and 109% for 20K containers), then followed by the PUT operation (e.g. 25% for 10K containers and 70% for 20K containers). The speedup effect on the GET operation is very small (e.g. 7% for 10K containers and 12% for 20 K containers). This behavior has been explained in [1] — the B trees (maintained by the container DB) need to be updated as the objects are put or deleted from the containers. However, when reading the objects (e.g. GET operation), there are no updates on the B trees.

As a conclusion for Table 1 and Figure 1, we note that:

(1) When the file cache effect is minimum, the SSD-based Swift largely outperforms the HDD-based Swift. The extra performance from using the SSDs depends on the certain type of operation: PUT and DELETE operations benefit most from using the SSDs, because they incur a large amount of concurrent updates on the B trees which will generate lots of mixed random read and write I/Os. As we known, the SSDs are far more efficient than HDDs to handle the random I/O pattern, so these types of workloads will certainly benefit from the SSD-based Swift. However, for the GET operation, the extra performance from using the SSDs is small, this is because the GET operation will not incur any update on the B trees.

(2) Since the I/O speed of HDDs is sensitive to the randomness of I/O pattern, the performance of the HDD-based Swift can be negatively impacted by the increasing number of containers, which will generate more random I/O pattern on the storage for the container DB. However, when using SSD for the container DB, the SSD-based Swift will not be affected by the increasing number of containers.

Conclusions and Next Steps

Overall, we have identified some scenarios when the workloads will benefit most from the SSD-based Swift. E.g. the PUT and DELETE are the dominant operations, a large number of containers are used (so, more challenging I/O pattern from the container DB), but has small main memory (the cache effect is minimum). One example of such workload is a storage cloud used for backups for a large number of users. In most such implementations, users will have a separate containers on the cloud, they will be doing mostly PUT operations, and DELETE operation will also be executed frequently to implement the retention policy set by the users.

It is likely that when the object size is large (e.g. 50-100MB), the storage node will be bottlenecked at the I/O devices for storing the large objects. Then, the frequency of updating the container DB will be low. In this case, using HDD as the backend for container DB is adequate, because the workload for the container DB is very light. More generally, when the Swift is not always saturated by the workloads (e.g. low concurrency of active users), it may also be fine to place the container DB on HDD.

It is an interesting issue to identify the tradeoffs (both for performance and cost) between using the SSD-based Swift with small main memory per node and using the HDD-based Swift with large main memory per node. Because the SSDs and large main memory can both help to push up the performance: the SSDs can greatly reduce the wait time of each outstanding I/O (especially for random I/O), while the large main memory can cache the “hot data” (e.g. container DB, account DB and most frequently visited objects) to eliminate lots of actual disk visits and quickly serve the user requests just from the memory.

Since SSD is just one example of the large body of faster I/O devices, our future direction is to broadly discuss the following question:  for a certain type of workload, what is the most cost-effective method to implement a Swift cloud when the cloud builders have the choice of several different combinations of faster I/O devices.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at swift@zmanda.com

Tags:

Comments are closed.