Amazon S3 Backup Testing Results

Digg this article

So, I’ve been playing around with Amazon’s new S3 service. It’s essentially an on-demand storage-and-bandwidth combination; Amazon will scale their service transparantly to provide as much as you need of either. It’s pretty cheap, too, at $0.15 per GB per month for storage and $0.20 per GB per month for data transfer. I wanted to see what would be the best way to use Amazon S3 for backups with Amanda and MySQL ZRM, so I did some tests to evaluate the performance under various circumstances.

Amazon requires that you upload data in complete ‘objects’, and it wasn’t clear to me what the optimal object size was. I wrote a script that attempts S3 uploads while randomly varying the number of concurrent object uploads and size of each object. Data transfer amount was fixed at 100 MB per trial.

The first thing that I discovered in my testing is that requests to Amazon S3 frequently fail. There seems to be a variety of causes for this, but the best solution I could find was to simply retry failed requests. So I changed the script to retry any particular request up to 100 times. Eventually I want to find a better solution for the variety of transient error messages that come back.
Then I gave it a run. I ran it under three conditions: On a T1 office connection, on a home cable modem with no other network applications running, and on the same cable modem while already saturated with other requests (in both directions).

Mostly because of the small data size (due in turn to lower bandwidth), results from the cable modem were not statistically significant, but I was able to make the key observation that of the tests on the congested connection, only one set of uploads succeeded (all the others reached the 100-retry limit on at least one object). The one successful upload was single threaded with a relatively small object size.

Since the T1 is a faster connection, I collected more data (n=282) and was able to make some formal statistically conclusions of statistical significance. Comparing the threaded and non-threaded cases, I used Student’s t-distribution with t=1.72 and df=29.24 to arrive at p=0.0481 for the hypothesis that performance is faster with threading. Note however that, although significant, the added performance was tiny: threaded trials were on average only 3% faster than unthreaded trials.

The standard deviation was also notably greater for the single-thread case, indicating that multiple threads may help guard against network changes.

There was no significant difference between the 2-thread and 3-thread cases (p=.47), and although the 3-threaded case had the highest mean, in general no significant corelation could be found among the threaded cases.

Outside of the congested network, object size did not seem to have a great impact on performance, but trials with smaller objects generally had more consistant performance (if not faster overall).

If anyone has comments on this work or has done similar research, please drop us a line at community@zmanda.com.

2 Responses to “Amazon S3 Backup Testing Results”

  1. Ian says:

    Richard,

    I didn’t collect data on the failure rate, but I can say from informal testing experience that sometimes as many as 1 in 10 requests fail (though obviously more for the congested network case). The good news is that when a request fails, it almost always comes back with a message right away — it’s not like you have to upload your entire object and only then find out that things have bombed.

    I only tested uploads in this test, and only using the REST API. YMMV otherwise.

    Cheers,

    –Ian

  2. Richard Moss says:

    Thanks for posting this. Do you have any numbers you can share regarding the fail rate observed in your tests? Also, did you only test uploads or did you test retrieval requests as well? Thanks,

    Richard