In our last blog on building Swift storage clouds, we proposed the framework for the Swift Advisor - a technique that takes two of the three constraints (Capacity, Performance, Cost) as inputs, and provides hardware recommendations as output - specifically count and configuration of systems for each type of node (storage and proxy) of the Swift storage cloud (Swift Cloud). Plus, we also provided a subset of our initial results for the Sampling phase.
In this blog, we will continue the discussion on Swift Advisor, first focusing on the impact of the load balancer on the aggregate throughput of the cloud (we will refer to it as “throughput”) and then provide a subset of outcomes for the profiling and optimization phases in our lab.
The load balancer distributes the incoming API requests evenly across the proxy servers. As shown below, the load balancer sits in front of the proxy servers to forward the API requests to them and can be connected with any number of proxy servers.
If a load balancer is used, it is the only entry point of the Swift Cloud and all user data goes through it. So it is a very important component to consider for user visible performance of your Swift Cloud. In case it is not properly provisioned, it will become a severe bottleneck that inhibits the scalability of the Swift Cloud.
At a high-level, there are two types of load balancers:
Software Load Balancer: Runs a software load balancing software (e.g. Pound, Nginx) or round robin DNS on a server to evenly distribute the requests among proxy servers. The server running the software load balancer usually requires powerful multi-core CPUs and extremely high network bandwidth.
Hardware Load Balancer: Leverages the network switch/firewall or dedicated hardware with capability of load balancing to assign the incoming data traffic to the proxy servers of Swift Cloud.
Regardless of whether a software or hardware load balancer is used, the throughput of the Swift cloud cannot scale beyond the bandwidth of the load balancer. Therefore, we advise the cloud builders to deploy a powerful load balancer (e.g. with 10 Gigabit Ethernet) so that its “effective” bandwidth exceeds the expected throughput of the Swift cloud. We recommend that you pick your load balancer so that with a fully loaded (i.e. 100% busy) Swift Cloud, the load balancer still has around 50% unused capacity for future planning or sudden needs of higher bandwidth.
To have a sense of how to properly provision the load balancer and how it impacts the throughput of Swift Cloud, we show some results of running the Swift Cloud of c proxy and cN storage server (c:cN Swift Cloud) with the load balancer. (N is the “magic” value for 1:N Swift Cloud found in Sampling phase). These results are the “performance curves” for the profiling phase and can be directed used for optimizing your goal.
In our last article, we already used some running examples to show how to get the output results from the Sampling phase. Here, we directly use the outputs (1:N swift cloud) of sampling phase as the inputs of the profiling phase, as seen below,
- 1 Large Instance based proxy node: 5 Small Instance based storage nodes (N=5)
- 1 XL Instance based proxy node: 5 Small Instance based storage nodes (N=5)
- 1 CPU XL Instance based proxy node: 5 Small Instance based storage nodes (N=5)
- 1 Quad Instance based proxy node: 5 Medium Instance based storage nodes (N=5)
Based on the above 1:5 swift clouds, we profile the throughput curves of c:c5 Swift cloud (c = 2, 4, 6,…) with the following setups of load balancer:
- Using one “Cluster Compute Eight Extra Large Instance” (Eight) with Pound (a reverse proxy, load balancer) as the software load balancer (”1 Eight”), that all proxy nodes are connected to. (Eight Instance is one-level more powerful than Quad Instance. Similar to the Quad Instance, it also equips 10Gigabit Ethernet, but has 2X amount of CPU resources, 2 x Intel Xeon ES-2670, eight-core “Sandy Bridge” architecture, and 2X of memory.)
- Using two identical Eight Instances (each runs with Pound) as the load balancers (”2 Eight”). 50% proxy nodes are connected to the first Eight Instance and another 50% proxy nodes are linked to the second Eight Instance. The storage nodes have no sense of the first and second half of proxy nodes and accept all data from all of the proxy nodes.
Again, we use Amanda Enterprise as our application to backup a 20GB data file to the c:c5 Swift Cloud. We concurrently run two Amanda Enterprise servers on two EC2 Quad instances to send data to the c:c5 Swift cloud, ensuring that two Amanda Enterprise servers can fully load the c:c5 Swift cloud in all cases.
For this experiment, we focus on the backup operations, so the aggregate throughput of backup operations is simply regarded as “throughput” (MB/s) measured between the two Amanda Enterprise servers and the c:c5 Swift cloud.
Let’s first look at the throughput curves (throughput on Y-axis, values of c on X-axis) of c:c5 Swift cloud with the two types of load balancers for each of above mentioned configurations of proxy and storage nodes.
(1) Proxy nodes run on the Large instance and the storage nodes run on the Small instance. The two curves are for the two types of load balancers (LB):
(2) Proxy nodes run on the XL instance and the storage nodes run on the Small instance.
(3) Proxy nodes run on the CPU XL instance and the storage nodes run on the Small instance.
(4) Proxy nodes run on the Quad instance and the storage nodes run on the Medium instance.
From the above 4 figures, we can see that throughput of c:c5 Swift cloud using 1 Eight instance as the load balancer can not scale beyond 140MB/s. While, with 2 Eight instances as the load balancer, the c:c5 Swift Cloud can scale in linear shape (for the values of “c” we tested with).
Next, we combine the above results of “2 Eight” load balancer into one picture, and look at it from another point of view – throughput on Y-axis, cost ($) on X-axis. (As you may recall from our last blog, the cost is defined as the EC2 usage cost of running c:c5 swift cloud for 30 days.)
The above graph tells us several things:
(1) The configuration of using CPU XL instances for proxy nodes and Small instances for Storage node is not a good choice, because when compared with configuration of using XL instances for proxy nodes and Small instances for Storage node, it consumes similar cost, but delivers lower throughput. The reason for this is our observation that XL instances provide better bandwidth than CPU XL instances. AWS marks the I/O performance (including the network bandwidth) of both XL instance and CPU XL instance as “High”. From our pure network bandwidth testing, XL instance shows maximum 120 MB/s for both incoming and outgoing bandwidth, while CPU XL instance has maximum 100 MB/s for both incoming and outgoing bandwidth.
(2) The configuration of using Large instances on proxy nodes and Small instances on Storage node is the most cost-effective. Since within each throughput group (marked as dotted circle in the figure): low, medium and high, it achieves the similar throughput, but with much lesser cost. The reason this configuration can be cost-effective is because Large instance can provide the maximum 100 MB/s for both incoming and outgoing network bandwidth, which is similar to the XL and CPU XL instances, but is associated with 2x lower cost than the XL and CPU XL instances.
(3) While using Large instances on proxy nodes and Small instances on Storage node is very cost-effective, but the configuration of using Quad instances on proxy nodes and Medium instances on Storage node is also an attractive option. Especially if you consider the manageability and failure issues. To achieve 175MB/s througput, you can choose either 8 Large instance based proxy nodes and 40 Small instance based storage nodes (total 48 nodes), or 4 Quad instance based proxy nodes and 20 Medium instance based storage nodes (total 24 nodes). Hosting and managing more nodes in the data center may require higher IT-related costs, e.g. power, # of server racks, failure rate and IT administration. Considering those costs, it may be more attractive to setup a Swift Cloud with smaller number of more powerful nodes.
Based on the data in the above figure and considering the IT-related costs, the goal of the optimization phase is to choose the configuration that optimizes your goal best. For example, if you input the performance and capacity constraints and want to minimize the cost, let’s suppose the two configuration: (1) using Large instances for proxy nodes and Small instances for Storage nodes, and (2) using Quad instances for proxy nodes and Medium instances for Storage nodes, can both satisfy your capacity constraint. Now, the only thing left is that you want to figure out which configuration has less cost to fulfill the throughput constraint. The final result depends on your IT management costs. If your IT management cost is relatively expensive, then you may want to choose second configuration, otherwise, the first configuration will likely incur lesser cost.
In the future articles, we will talk about how to map the EC2 instances to the physical hardware so that the cloud builders can build an optimized Swift cloud running on physical servers.
If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at email@example.com