Building a Swift Storage Cloud? Avoid Wimpy Proxy Servers and Five other Pitfalls

We introduced OpenStack Swift Advisor in a previous blog: a set of methods and tools to assist cloud storage builders to select appropriate hardware based on their goals from their Swift Cloud. Here we describe six pitfalls to avoid, when choosing components for your Swift Cloud:

(1) Do not use wimpy servers for proxy nodes

The key functionality of  a proxy node is to process a very large amount of API requests, receive the data from the user applications and send them out to the corresponding storage nodes. Proxy node makes sure that a minimum number of required replicas get written to storage nodes. Reply traffic (e.g. Restore traffic, in case the Swift Cloud is used for Cloud Backup) also flows through the proxy nodes. Moreover, the authenticating  services (e.g. keystone, swauth) may also be integrated into the proxy nodes.  Considering these performance critical functions being performed by proxy nodes, we strongly advise cloud storage builders to consider powerful servers as the proxy nodes. For example, a typical proxy node can be provisioned with 2 or more multi-core Xeon-class CPUs, large memory and 10G Ethernet.

There is some debate on whether a small number of powerful servers or a large number of wimpy servers should be used as the proxy nodes. It is possible that the initial cost outlay of a large number of wimpy proxy nodes may be lower than a smaller number of powerful nodes, while providing acceptable performance. But for data center operators, a large number of wimpy servers will inevitably incur higher IT related costs (personnel, server maintenance, space rental, cooling, energy and so on). Additionally, more servers will need more network switches, thus decreasing some of the cost benefits as well as increasing failure rate. As your cloud storage service gets popular, scalability will be challenging with wimpy proxy nodes.

(2) Don’t let your load-balancer be overloaded

Load-balancer is the first component of a Swift cluster that directly faces the user applications. Its primary job is to take all API requests from the user application and evenly distribute them to the underlying proxy nodes. In some cases, it has to do the SSL termination to authenticate the users, which is a very CPU and network intensive job. An overloaded load-balancer inherently defeats the purpose by becoming the bottleneck of  your Swift cluster’s performance.

As we have discussed in a previous blog (Next Steps with OpenStack Swift Advisor), the linear scalability of a Swift cluster on performance can be seriously inhibited by a load-balancer which doesn’t keep up with the load. To reap the benefits of your investment in proxy and storage nodes, you should make sure that the load-balancer is not underpowered especially for peak load conditions on your storage cloud.

(3) Do not under-utilize your proxy nodes

Proxy node is usually one of the most expensive component in the Swift cluster. Therefore, it is desirable for cloud builders to fully utilize the resources in their proxy nodes. A  good question being asked by our customers is: how many storage nodes should I attach to a proxy node ? or what is the best ratio between the proxy and storage nodes ? If your cloud is built with fewer storage nodes per proxy node, you may be  under-utilizing your proxy nodes, as shown in the following Figure 1(a). (While we have simplified the illustrations, the factors of performance changes indicated in following figures are based on actual observations in our labs.) In this example, initially, the Swift cluster consists of 3 nodes: 1 proxy node and 2 storage nodes (we use capital P and S in the picture to denote proxy and storage nodes respectively). The write throughput of that 3-node Swift cluster is X MB/s. However, if we add two more storage nodes to that Swift cluster, as shown in Figure 1(b), the throughput of the 5-node Swift cluster becomes 2X MB/s.  So the throughput along with capacity of Swift cluster can be doubled (2X) by simply adding in two storage nodes. In terms of the cost per throughput and cost per GB,  the 5-node Swift cluster in this example will likely be more efficient.

(4) Do not over-utilize the proxy nodes

On the other hand, you can’t keep attaching the storage nodes without increasing your proxy nodes at some point. In Figure 2(a), 1 proxy node has been well-utilized by the 4 storage nodes with 2X MB/s throughput. If more storage nodes are attached to the proxy node,  as shown in Figure 2(b), its throughput will not increase because the proxy node is already busy with the 4 storage nodes. Therefore, attaching more storage nodes to a well-utilized (nearly 100% busy) proxy node will only make the Swift cluster less efficient in terms of the cost per throughput. However note that you may decide to over-subscribe proxy nodes, if you are willing to sacrifice potential performance gains by adding more proxy nodes, and you simply want to add more capacity for now. But to increase capacity, first look into making sure you are adding enough disks to each storage node, as described in the next pitfall.

(5) Avoid disk-bounded storage nodes

Another common question we get is: how many disks should I put into my storage node? This is a crucial question with implications on cost/performance and cost/capacity. In general, you want to avoid storage nodes which are bottlenecked on performance due to less number of disk spindles as illustrated by the following picture.

Figure 3(a) shows a Swift cluster consisting of 1 proxy node and 2 storage nodes, with each storage node attached to 1 disk. Let’s assume the throughput of this Swift cluster is Y MB/s. However, if we add one more disk on each storage node based on Figure 3(a), we will have two disks on each storage node, as shown in Figure 3(b). Based on our observations the throughput of the new Swift cluster may increase by as much as 1.5Y MB/s. The reason why the throughput is improved by simply attaching more disks is: in Figure 3(a), one disk in each storage node can easily be overwhelmed (i.e. 100% busy) when transferring the data from/to the storage nodes, while other resources (e.g. CPU, memory) in the storage node are not fully-utilized, hence the storage node being “disk-bounded”. However, since more disks are added to each storage node and all disks can work in parallel during the data transfers, the bottleneck of the storage node is shifted from disks to other resources, and thus, the throughput of Swift cluster can be improved. In terms of cost per throughput, Figure 3(b) is more efficient than Figure 3(a), since the cost of adding more disk is significantly less than the cost of the whole server.

An immediate follow-up question is: can the throughput keep increasing by attaching more disks to each storage node? Of course, the answer is No. Figure 4 shows the relationship between the number of disks attached to each storage node and the throughout of Swift cluster. As the number of disks increases from 1, the throughput is indeed improved but after some point (we call it “turning point”), the throughput stops increasing and becomes almost flat later on.

Even though the throughput of Swift cluster can not keep improving by attaching more disks,  some cloud storage builders may want to put large number of disks in each storage node, as doing that does not hurt the performance. Another metric, cost per MB/s per GB of available capacity,  tends to be minimized by adding more disks.

(6) Do not rely on two replicas of data

One more question we get frequently asked from our customers is: can we use 2 replicas of data in the Swift cluster in order to save on cost of storage space ? Our recommendation is: No. Here is why:

Performance: it may seem that a Swift cluster which maintains 2 replicas of data will have better performance when the data is written to the storage nodes as compared to a cluster which maintains 3 replicas (which has one more write stream to the storage nodes). However, in actuality,  when the proxy node attempts to write to N replicas, it only requires  (N/2)+1 successful responses out of N to declare a successful write. That is to say, only (N/2)+1 out of N concurrent writes are synchronous, while the rest of the writes can be asynchronous and Swift will rely on the replication process to ensure that the remaining copies are successfully created.

Based on the above, and in our tests comparing the “3-replication Swift cluster” and  “2-replication Swift cluster”:  they will both generate 2 concurrent synchronous writes to the storage nodes.

Risk of data loss: We recommend using commodity off-the-shelf storage for Swift Storage Nodes, without even using RAID. So, the replicas maintained by Swift are your defense against data loss. Also, lets say a Swift cluster has 5 zones (which is the minimum number of recommended zones) and 3 replicas of data. With this setup, up to two zones can fail at the same time without any data loss. However, if we reduce the number of replications from 3 to 2, the risk of data loss is increased by 100%, because the data can only survive one zone failure.

Avoiding above pitfalls will help you to implement a high-performance and robust Swift Cloud, which will scale to serve your cloud storage needs for several years to come.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Comments are closed.