Building an OpenStack Swift Cloud: Mapping EC2 to Physical hardware

As we mentioned in an earlier blog that it may seem ironical that we are using a public compute cloud to come up with an optimized private storage cloud. But ready availability of diverse type of EC2 based VMs, makes AWS a great platform for running the Sampling and Profiling phases of the OpenStack Swift Advisor.

After an optimized Swift Cloud is profiled and designed on the virtualized hardware (for example, EC2 instances in our lab), the cloud builders will eventually want to build it on the physical hardware. The question is: how to preserve the cost-effectiveness and guaranteed throughput of the Swift Cloud on the new physical hardware with new data center parameters?

A straightforward answer is to keep the same hardware and software resources in the new hosts. But, the challenge is:  EC2 (this challenge remains if other cloud compute platforms, e.g. OpenStack Compute were used for profiling) provisions the CPU resource for each type of instance in terms of “EC2 Compute Unit”‘, e.g. Large instance has 4 EC2 Compute Units, Quad instance has 33.5 EC2 Compute Units. The question is: how to translate the 33.5 EC2 Compute Units into GHz when you purchase the physical CPUs on the market for the servers? Another ambiguous resource definition associated with EC2 is the network bandwidth. EC2 has 4 standards of network bandwidth: Low, Moderate, High and Very High and for example, EC2 allocates Low bandwidth to Micro instance and Moderate bandwidth to Small instance. But, what does “Low bandwidth” means in terms of MB/s? EC2 specs provide no answers for those.

Here we want to propose a method to translate these ambiguous resource definitions (e.g. EC2 Compute Units) into the standard specifications (e.g. GHz) that can be referred when choosing the physical hardware. We focus on 3 types of hardware resources: CPU, disk and network bandwidth.

CPU: We first choose a CPU benchmark software (e.g. PassMark) and run it on a certain type of EC2 instance to get a benchmark score. Then, we look up the published benchmark scores of that benchmark software to find out which physical CPU got the similar score. For safety, we can choose the physical CPU with a little higher score to ensure it performs no worse than the virtualized CPU in the EC2 instance.

Disk: We roughly assume the I/O patterns in storage nodes are close to sequential, and we can use the “dd” Linux command to benchmark the sequential read and write I/O bandwidths on a certain type of EC2 instance. Based on the I/O bandwidth results in terms of the MB/s, cloud builders can buy the physical storage drives with the matching I/O bandwidths.

Network: To test the maximum bandwidth of a certain EC2 instance within the Swift Cloud, we setup another EC2 instance with very high network bandwidth. e.g. the EC2 Quad instance. First, we install Apache and create a test file (the size of the file depends on the memory size, as discussed later) on both EC2 instances. Then, in order to benchmark the maximum incoming network bandwidth of the EC2 instance, we issue wget command on that EC2 instance to download the test file hosted on the Quad instance. The wget command will give the average network bandwidth after the download is finished and we will use it as the maximum incoming bandwidth. To test the maximum oncoming network bandwidth, we operate the above test in the reversed direction: the Quad instance downloads the test file from the EC2 instance we want to benchmark. The reason we choose wget (instead of e.g. scp) is that wget involves less CPU overhead. Notice that, to remove the interference from the disk I/Os, we ensure the test file can fit into the memory of the EC2 instance so that there are no read I/Os needed. Also, we always execute the wget with “-O /dev/null” to bypass the write I/Os. Once we get the maximum incoming and oncoming network bandwidths, we can choose the right Ethernet components to provision the storage and proxy nodes.

Memory: As to the virtualized memory in EC2 instance, if 10 GB memory is associated with the instance, then it is straightforward to provision 10GB memory in the physical server. So, we feel that there is no translation needed for virtualized memory.

Other cloud management platforms may offer several types of instances (e.g. large, medium, small) based on their own terminologies. We can use the similar methods as above to benchmark each type of instances they offer and find the matching physical hardware.

To fully ensure that the throughput of the Swift Cloud while mapping from the EC2 instances, we advise the cloud builders to provision the physical hardware with at least 10% better specs than deduced by above translation.

Here, we show an example of how to map an EC2 c1.xlarge instance to physical hardware:

CPU: We run Pass Mark CPU benchmark on c1.xlarge. The CPU score from PassMark is: 7295. Considering to provision 10% more resource when translating from virtualized hardware to physical hardware, some choices on physical CPU include: Intel Xeon E3-1245 @ 3.30 GHz, Intel Xeon L5640 @ 2.27GHz, Intel Xeon E5649 @ 2.53 GHz etc.

Memory: As c1.xlarge instance is allocated 7GB memory,  so we could choose 8GB memory (4GB x 2 or 2GB x 4) in the physical machine.

Disk: By using the “dd” command, we found out c1.xlarge instance has 100-120 MB/s for sequential read and 70-80MB/s for sequential write, which matches to a typical 7,200 RPM based drive. Therefore, most HDDs on the market can be safe to use as data disks in the physical machine.

Network: c1.xlarge instance has around 100 MB/s network bandwidth for both incoming and outgoing traffic, which corresponds to a 1Gigabit Ethernet interface. So, a typical 1Gigabit Ethernet should be enough for networking for the physical machine.

If you are thinking of putting together a storage cloud service, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Comments are closed.