Archive for the ‘Open Source’ Category

OpenStack Swift Advisor: Building Cloud Storage with Optimized Capacity, Cost and Performance

Wednesday, April 18th, 2012

OpenStack Swift is an open source cloud storage platform, which can be used to build massively scalable and highly robust storage clouds. There are two key use cases of Swift:

  • A service provider offering cloud storage with a well defined RESTful HTTP API - i.e. a Public Storage Cloud. An ecosystem of applications integrated with that API are offered to the service provider’s customers. Service provider may also choose to only offer a select service (e.g. Cloud Backup) and not offer access to the API directly.
  • A large enterprise building a cloud storage platform for use for internal applications - i.e. a Private Storage Cloud. The organization may do this because it is reluctant to send its data to a third party public cloud provider or to build a cloud storage platform which is closer to the users of its applications.

In both of above cases, as you plan to build your cloud storage infrastructure, you will face one of these three problems:

  1. Optimize my cost: You know how much usable storage capacity you need from your cloud storage, and you know how much aggregate throughput you need for applications using the cloud storage, but you want to know what is the least amount of budget you need to be able to achieve your capacity and throughput goals.
  2. Optimize my capacity: You know how much aggregate throughput you need for applications using the cloud storage, and you know your budget constraints, but you want to know the maximum capacity you can get for your throughput needs and budget constraints.
  3. Optimize my performance: You know how much usable storage capacity you need from your cloud storage, and you know your budget constraints, but you need to know the configuration to get best aggregate throughput for your capacity and budget constraints.

Solving any of the three problems above is very complex because of the myriad choices that the cloud storage builder has to make, e.g. size and number of various types of servers, network connectivity, SLAs etc. We have done extensive work in our labs and with several cloud providers to understand above problems and to address them with rigorous analysis. In this series of blogs we will provide some of the results of our findings as well as description of tools and services which can help you to build, deploy and maintain your storage cloud with confidence.

Definitions Since the terms used can be interpreted differently depending on context, below are the specific definitions used in this series of blogs for the three key parameters:

Capacity: It is the usable storage capacity, i.e. the size of the maximum application data that can be stored on the cloud storage. Usually, for better availability and durability, the data is replicated in the cloud storage across multiple systems.  So, the the raw capacity of the cloud storage should be planned with the consideration of data redundancy. For example, in OpenStack Swift,  each object is replicated three times by default. So, the total size of raw storage will be at least three times larger than the usable storage capacity.

Performance: It is the maximum aggregate throughput (MB/s or GB/s) that can be achieved by applications from the cloud storage. In this blog, we will also use the term throughput to denote aggregate throughput.

Cost: For this discussion we will only consider the initial purchase cost of the hardware for building the cloud storage. We expect that the built cloud storage will be put to use for several years, but we are not amortizing the cost over a period of time.  We will point out best practices to reduce on-going maintenance and scaling costs. For this series of blogs we will use the terms “node” and “server” interchangeably. So, “storage node” is same as “storage server”.

Introducing the framework for the Swift Advisor

The Swift Advisor is a technique that takes two of  the three constraints (Capacity, Performance, Cost) as  inputs,  and provides hardware recommendation as output, specifically count and configuration of systems for each type of node (storage and proxy) of  the Swift storage cloud. This recommendation is optimized for the third constraint: e.g. minimize  your budget, maximize your throughput, or maximize your usable storage capacity.

Before discussing the technical details of the Swift Advisor, let’s first look at a practical way to use the Swift Advisor: In order to build an optimized Swift cloud storage (Swift Cloud), an important feature of Swift Advisor is to consider a very large range of hardware configurations (e.g. a wide variety of CPU, memory, disk and network choices). However, it is unrealistic and very expensive to blindly purchase a large amount of physical hardware upfront and let Swift Advisor evaluate their individual performances as well as the overall performance after putting them together. Therefore, we choose to leverage virtualized and elastic environment offered by Amazon EC2 and build an optimized Swift Cloud on the EC2 instances initially.

While it may seem ironical that we are using a public compute cloud to come up with an optimized private storage cloud, the reasons for choosing EC2 as the test-bed for Swift Advisor are multi-fold: (1) EC2 provides many types of EC2 instances with different capacities of CPU, memory and I/O to meet the various needs. So, the Swift Advisor can try out many types of EC2 instances on the basis of pay-per-use, instead of physically owning the wide variety of hardware needed. (2) EC2 has a well defined pricing structure.

This provides a good comparison point for the cloud storage builders - they can look at the pricing information and justify the cost of owning their own cloud storage in the long run. (3) Specification of each type of EC2 instance, including CPU, memory, disk and network  is well defined. Once an optimized Swift Cloud is built on the EC2 instances with the input constraints, the specifications of those EC2 instances can effectively guide  the purchases of physical servers to build a Swift Cloud running on the physical hardware. In summary, you can use the elasticity of a compute cloud along with Swift Advisor to get specifications for your physical hardware based storage cloud, while preserving your desired constraints.

The high-level workflow of the Swift Advisor is shown below: The high-level workflow of the Swift Advisor There are four important phases and we explain them as follows:

Sampling Phase: Our eventual goal is to build an optimized Swift cloud consisting of quantity A of proxy servers and quantity B of storage severs  - A and B are unknown initially and we denote it as A:B Swift Cloud. In this first phase we focus on performance and cost characteristics of 1:N Swift Cloud. We look for the “magic” value of N that makes a 1:N Swift Cloud with the lowest cost per throughput ($ per MB/s) . The reason why we want to find a 1:N Swift cloud with the lowest $ per MB/s is to remove two potential pitfalls when building a Swift cloud : (1) Under-provisioning: the proxy server is under utilized and can still be attached to more storage servers to improve the throughput. (2) Over-provisioning: the proxy server has been overwhelmed by too many storage servers.

Since the potential combinatorial space for storage and proxy node choices is potentially huge, we use several heuristics to prune the candidates during various phases of the Swift Advisor. For example we do not consider very low powered configuration (e.g. Micro Instances) for proxy nodes.

After the sampling phase, for each combination of EC2 instance sizes on proxy and storage servers, we know the “magic” value of N that produces the lowest $ per MB/s of running a 1:N Swift cloud. You can run the sampling phase on any available virtual or physical hardware, but the larger the sample set the better.

Profiling Phase: Given the “magic” values of N from the sampling phase, our goal in this phase is to profile throughput curves (the throughput verses the size of Swift cloud) of several Swift clouds consisting of c proxy server and cN storage servers (c:cN Swift Cloud) with various values of c.

Please note that each throughput curve corresponds to each combination of hardware configuration (EC2 instance sizes in our case) of the proxy and storage servers. In our experiments, for each combination of EC2 instance sizes of the proxy and storage servers, the profiling starts from 2:2N Swift Cloud and we double the number of proxy and storage servers each time. (e.g. 4:4N, 8:8*N, ….). All cN EC2 instances for storage nodes are identical.

The profiling stops when the throughput of c:cN Swift Cloud is larger than the throughput constraint. After that, we apply a non-linear or linear regression on the profiled throughputs to plot a throughput curve with the X-values of c and Y-values of the throughput. The output of the profiling phase is a set of throughput curves of c:cN Swift Cloud, where each curve corresponds to a combination of EC2 instance sizes of the proxy and storage servers.

Optimization Phase: By taking the throughput curves from the profiling phase and two input constraints, the optimization phase is where we figure out a Swift Cloud optimized for the third parameter. We do this by plotting constraints on each throughput curve and look for the optimized value across all curves.

For example, lets say we are trying to optimize capacity with maximum budget given and minimum throughput requirement:  we will input the minimum required throughput on each throughput curve and find the corresponding values of c, and then reject the throughput curves where the implied hardware cost is more than the budget. Out of the remaining curves we will select the one resulting in maximum capacity based on cN * storage capacity of the system used for storage server.

Validation and Refinement Phase: The validation phase checks if the optimized Swift cloud really conforms to the throughput constraint through a test run of the workloads. If the test run fails a constraint, then the Swift Advisor goes to the refinement phase. The refinement phase gets the average throughput measured from the test run and sends it to the profiling phase.

The profiling phase adds that information to the profiled data to refine the throughput curves. After that, we use the refined throughput curves as the inputs to redo the optimization phase. The above four phases consists of the core of Swift Advisor. However, there are some important remaining issues to be discussed:

(1) choice of the load balancer

(2) mapping between the EC2 instance and the physical hardware when the cloud operators finally want to move the optimized Swift Cloud to physical servers, while preserving the three constraints on the new hosting hardware.

(3) SLA constraints. We will address these and other issues in building an optimized storage cloud for your needs in our future blogs.

Some Sampling Observations

In this blog, we present some of the results based on running Sampling phase on a selected configuration of systems. In future blogs, we will post the results for Profiling phase and Optimization phase.

For our sampling phase, we assume the following potential servers are available to us for proxy node: EC2 Large (Large), EC2 Extra Large (XL), EC2 Extra Large CPU-high (CPU XL) and EC2 Quadruple Extra Large (Quad). While the candidates for storage node are: EC2 Micro (Micro), EC2 Small (Small) and EC2 Medium (Medium).

Therefore, the total number of combinations of  proxy and storage nodes is 4 * 3 =12 and we need to find the “magic” value of N that produces the lowest $ per MB/s of running a 1:N Swift cloud for each combination. We start the sampling for each combination from N=5, and increase it until the throughput of 1:N Swift Cloud stops increasing. Note that a production Swift Cloud implementation requires at least 5 storage nodes. This happens when the proxy node is fully loaded and adding more storage nodes can not improve the throughput anymore.

We use Amanda Enterprise as our application to backup a 10G data file to the 1:N Swift cloud. The Amanda Enterprise runs on an EC2 Quad instance to ensure that one Amanda Enterprise server can fully load the 1:N Swift cloud in all cases. For this analysis we are assuming that the cloud builder is building the cloud storage optimized for backup operations. The user of the Swift Advisor should change the test workload based on the desired mix of expected application workload when the cloud storage goes production. We first look at the throughput for different values of N on each combination of EC2 instance sizes on proxy and storage nodes.

(1) Proxy node runs on EC2 Large instance and the three curves are for the three different sizes for the storage node:

Proxy node runs on EC2 Large instance

Observations with EC2 Large Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(2) Proxy node runs on EC2 XL instance: Proxy node runs on EC2 XL instance

Observations with EC2 XL Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(3) Proxy node runs on EC2 CPU XL instance: Proxy node runs on EC2 CPU XL instance

Observations with EC2 CPU XL Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 30
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 10
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 5

(4) Proxy node runs on EC2 Quad instance: Proxy node runs on EC2 Quad instance

Observations with EC2 Quad Instance based Proxy Node:

  1. Micro Instance based Storage nodes: Throughput stops increasing at # storage node = 60
  2. Small Instance based Storage nodes: Throughput stops increasing at # storage node = 20
  3. Medium Instance based Storage nodes: Throughput stops increasing at # storage node = 10

Looking at above graphs, we can already draw some conclusions: E.g. if the only storage nodes available to you were equivalent to EC2 Micro Instance and you wanted your storage cloud to be able to scale beyond 30 storage nodes (per proxy node), you should pick at least EC2 Quad Instance equivalent proxy node. Let’s look at the figures (1) - (4) from another view: fix the EC2 instance size of storage node and vary the EC2 instance size of proxy node

(5) Storage node runs on EC2 Micro instance and the four curves are for the four different EC2 instance sizes on proxy node: Observations with EC2 Micro Instance based Storage Node:

  1. Large Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  2. XL Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  3. CPU XL Instance based Proxy nodes: Throughput stops increasing at # storage node = 30
  4. Quad Instance based Proxy nodes: Throughput stops increasing at # storage node = 60

From the above graphs, we can conclude that, (a) when proxy node runs on the Quad instance, it has the capability, especially the network bandwidth, that can accommodate more storage nodes and achieve higher throughput (MB/s) than using other instances for the proxy node.  (b) Different EC2 instance sizes on storage node load the same proxy node at different speed: for example, when proxy node runs on the Quad instance, we need to use 60 Micro instances as storage nodes to fully load the proxy node.

While, if we use Small or Medium instance size on storage node, we only need 10 storage nodes to fully load the proxy node. Based on the above results on throughput, now we look at the $ per throughput (MB/s) for different values of N on each combination of EC2 instance sizes on proxy and storage nodes. Here, $ is defined as the EC2 usage cost of running 1:N Swift cloud for 30 days. In this blog we are only showing numbers with proxy node set to EC2 Quad Instance. We will publish numbers for other combinations in another detailed report.

(6) Proxy node runs on EC2 CPU Quad instance: Proxy node runs on EC2 CPU Quad instance Observations with EC2 Quad Instance based Proxy Node:

  1. Micro Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 60
  2. Small Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 15
  3. Medium Instance based Storage nodes: The lowest $ per MB/s is achieved at # storage node = 5

Overall, the lowest $ per MB/s in the above figure  is achieved by using Medium Instance based Storage nodes at # storage node = 5 This specific result will provide input to the profiling phase of N=5, 15 and 60 for proxy/storage node combination EC2 Quad/Medium, EC2 Quad/Small and EC2 Quad/Micro respectively.

So, one can conclude that when using 1 Quad Instance based Proxy node it may be better to use 5 Medium based Storage nodes to achieve the lowest $ per MB/s, rather than using more Micro Instance based storage nodes. Above graphs are a small subset of the overall performance numbers achieved during the Sampling phase.

The overall objective here is to give you a summary of our recommended approach to building an optimized Swift Cloud. As mentioned above, we will publishing detailed results in another report, as more conclusions and best practices in future blogs in this series.

If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at  swift@zmanda.com

Zmanda Cloud Backup adds Tokyo as its latest cloud storage location

Wednesday, March 16th, 2011

We are adding support for Asia Pacific (Tokyo) Region in Zmanda Cloud Backup (ZCB). This is the fifth worldwide location supported by ZCB.

This support provides faster uploads for ZCB users in Japan. Throughput will be significantly higher because of less hops along the way and very high bandwidth connections typically available in Japan. Overall processing will be faster because of lower latency (expected to be single digit millisecond latency for most end users in Japan).

Cloud Backup to Three Continents Now Includes Japan

Cloud Backup to Three Continents Now Includes Japan

This support enables users to ensure that their data does not leave Japan, e.g. if required for compliance reasons.

In summary, users in Japan now have an effective and scalable solution to backup their Windows Filesystems, Microsoft Applications and Databases (MySQL, SQL Server, Oracle) to a robust storage cloud

As an introductory offer to our customers in Japan, we are waiving all transfer and storage charges to the Tokyo location until April 30th, 2011. You only pay for the initial setup fees ($4.95) and pro-rated monthly fees ($4.95 per month). After April 30th, our regular charges will apply at par with all other supported regions.

There is more on the horizon for our Japanese customers. We are soon going to offer a fully localized Japanese version of ZCB (Current shipping version has already been tested with Japanese file and folder names). Watch this space for an announcement on that within few weeks.

Zmanda Cloud Backup with Japanese Files/Folders

MySQL Backup Webinar Series: Scalable backup of live databases

Thursday, October 14th, 2010

mysql logo

Setting up of a good backup and recovery strategy is crucial for any serious MySQL implementation. This strategy can vary from site to site based on various factors including size of the database, rate of change, security needs, retention and other compliance policy etc. In general, it is also required from MySQL DBAs to have least possible impact on usability and performance of the database at the time of backup - i.e MySQL and its dependent applications should remain hot during backup.

Join MySQL backup experts from Zmanda for two webinars dedicated to hot backup of MySQL:

MySQL Backup Essentials: In this webinar, we will go over best practices for backing up live MySQL databases. We will also cover Zmanda Recovery Manager (ZRM) for MySQL product in detail, including a walk through the configuration and management processes. We will discuss various features of ZRM including full backups using snapshots, point-in-time recovery, monitoring and reporting.

Register for MySQL Backup Essentials Webinar on November 23rd at 10:00AM PT

MySQL Backup to Cloud: In this webinar, we will focus on backing up MySQL databases running on Windows to the cloud. Cloud Storage provides an excellent alternative to backing up to removable media and shipping it to remote secure site. We will provide live demonstration of the Zmanda Cloud Backup (ZCB) product backing up MySQL to Amazon S3 storage. ZCB is an easy-to-use cloud backup solution which supports all Windows platforms. We will also discuss recovering MySQL database in the cloud, creating a radically low cost disaster recovery solution for MySQL.

Register for MySQL Backup to Cloud Webinar on November 30th at 10:00AM PT

Taking a Snapshot of a Thousand Dancing Dolphins

Monday, April 12th, 2010

An increasing number of large MySQL applications, e.g. social networking and SaaS back-ends, use a distributed MySQL architecture. MySQL data is distributed logically or heuristically on multiple, and in some cases thousands of, real or virtual servers. Backing up such large and dynamic environments presents its own complexities.

In this blog, we will use the cluster terminology - but we do not imply that NDB Cluster storage engine is being used for MySQL. Most implementations use InnoDB for data and MyISAM for dictionary. Typical architecture for such applications uses Database Sharding - i.e. shared-nothing partitioning of data across similarly configured nodes.

In most sharded environments, high availability is built-in - i.e. the cluster can continue to answer the queries and commit the transactions of all users in face of a node failure. This is typically accomplished either by database level replication or by designing the application so that each row is mirrored on two or more nodes. If MySQL Replication is being used, then slaves can be used for load-balancing as well - as long as it is ok that some clients may not get the latest data on the master node. E.g. a profile update by a user may not be visible to all her friends right away.

But built-in high availability does not do away with the need for setting up a backup and recovery process. Just like RAID does not replace backup, Sharding with redundancy does not replace backup either. The inherent complexity of large scale distributed database environments makes errors (human, system, environmental) more probable. Also, the implied availability of these environments increases the stress during the recovery process.

Here are the backup and recovery needs for such environments, some of the needs conflict with each other:

  • Application managers desire a point-in-time restore which is coordinated across multiple servers.
  • IT managers want to have as identical configuration as possible across all nodes - so process of replacing nodes becomes simple.
  • Depending on the application, retention policy could be several years.
  • Overall application should be able to recover from multiple node failures, human errors or sabotage, and geographic problems (disaster, connectivity etc.)

Zmanda Recovery Manager for MySQL is designed to meet these challenging needs. It uses various backup methods for backing up individual shards, and manages backup and recovery of the overall MySQL environment.

For point-in-time restore capability, ZRM uses MySQL binary logs. In very high update-oriented environments - size of these binary logs can become very big. In such environments, if the organization’s Recovery Point Objective (RPO) requires to be able to recover to any point within the past few weeks, it may not be possible to store these binary logs on the MySQL node itself. In any case, in order to be able to recover in the face of complete node failures, these logs need to be stored outside of the node. So, a storage environment which is physically or logically shared among the nodes is typically a requirement for storing the backup images. This shared secondary storage does not violate the shared-nothing principles of sharding, because it is not in the path of actual application. It is out-of-band storage being accessed and managed by the backup software. Also note that ZRM can automatically remove the binary logs from the MySQL node once they have been copied over to their archive location.

Taking a Snapshot of a multiple MySQL databases

ZRM can use two techniques to allow for point-in-time recovery of distributed MySQL environments: Coordinated Backups or Coordinated Restores:

Coordinated backup provides a backup image of all nodes consistent to a specific event. E.g. all rows are backed up until a specific Global Sequence Number (GSN) - assuming a GSN exists in the application. Another option is to create a checkpoint event specifically for backup purposes. Of course, having a GSN or a checkpoint event may create periodic brief hiccups which may or may not be acceptable for the business needs. But this process creates the cleanest backup images for the whole application.

Coordinated restore allows for each individual node to be backed up completely independent of each other. This eliminates the need for a backup checkpoint event. However at the time of recovery more processing is required to make sure all nodes are recovered to a point which is logically acceptable to the higher level application. ZRM can be scripted to identify this point in the backed up binary logs for every shard. Also, the visual log analyzer feature of ZRM helps DBAs to efficiently search for these points. Note that it is possible that all shards are not recovered to their state as it existed at exact same time, however they should be recovered to a state which is acceptable for the overall application. Having the clocks of nodes synchronized will also help the DBAs to identify points-of-recovery across nodes - by being able to correlate events easily.

Being able to backup a smaller shard instead of the whole dataset provides some opportunities both from technical and logical perspective. Since the size of each shard may be relatively small, a particular backup method may be acceptable even though it would not have been acceptable if the whole dataset was in one monolithic database. If data was distributed among shards using some external criteria (e.g. users of each zip code go to a particular shard), then backup images of each shard may be individually usable by an application. ZRM creates portable backup images - a key need for backing up shards - so backups from one node can be restored on another.

If recovery from a site wide disaster is also an objective, then suitable backup images need to be securely transported to the remote site. This can be done via the new Disaster Recovery Option now available for ZRM. This option replicates backup images, backup catalog and configuration data to the remote site - enabling full disaster recovery on an as-needed basis. Individual nodes need not be replicated, saving huge hassle and cost.

If your show is backed by a pod of dancing dolphins, a well implemented and documented backup and disaster recovery process is a good investment.

What’s New in Amanda Community: Postgres Backups

Thursday, March 25th, 2010

Second installment in a series of posts about recent work on Amanda.

The Application API allows Amanda to back up structured data — data that cannot be handled well by dump or tar. Most databases fall into this category, and with the 3.1 release, Amanda Community Edition ships with ampgsql, which supports backing up Postgres databases using the software’s point-in-time recovery mechanism.

The how-to for this application is on the Amanda wiki.

Operation

Postgres, like most “advanced” databases, uses a logging system to ensure consistency even in the face of (some) hardware failures. In essence, it writes every change that it makes to the database to the logfile before changing the database itself. This is similar to the operation of logging filesystems. The idea is that, in the face of a failure, you just replay the log to re-apply any potentially corrupted changes.

Postgres calls its log files WAL (write-ahead log) files. By default, they are 16MB. Postgres runs a shell command to “archive” each logfile when it is full.

So there are two things to back up: the data itself, which can be quite large, and the logfiles. A full backup works like this:

  • Execute PG_START_BACKUP(ident) with some unique identifier.
  • Dump the data directory, excluding the active WAL logs. Note that the database is still in operation at this point, so the dumped data, taken alone, will be inconsistent.
  • Execute PG_STOP_BACKUP(). This archives a text file with the suffix .backup that indicates which WAL files are needed to make the dumped data consistent again.
  • Dump the required WAL files

An incremental backup, on the other hand, only requires backing up the already-archived WAL files.

A restore is still a manual operation — a DBA would usually want to perform a restore very carefully. The process is described on the wiki page linked above, but boils down to restoring the data directory and the necessary WAL files, then providing postgres with a shell command to “pull” the WAL files it wants. When postgres next starts up, it will automatically enter recovery mode and replay the WAL files as necessary.

Quiet Databases

On older Postgres versions, making a full backup of a quiet database is actually impossible. After PG_STOP_BACKUP() is invoked, the final WAL file required to reconstruct a consistent database is still “in progress” and thus not archived yet. Since the database is quiet, postgres does not get any closer to archiving that WAL file, and the database hangs (or, in the case of ampgsql, times out).

Newer versions of Postgres do the obvious thing: PG_STOP_BACKUP() “forces” an early archiving of the current WAL file.

The best solution for older versions is to make sure transactions are being committed to the database all the time. If the database is truly silent during the dump (perhaps it is only accessed during working hours), then this may mean writing garbage rows to a throwaway table:

CREATE TABLE push_wal AS SELECT * FROM GENERATE_SERIES(1, 500000);
DROP TABLE push_wal;

Note that using CREATE TEMPORARY TABLE will not work, as temporary tables are not written to the WAL file.

As a brief encounter in #postgres taught me, another option is to upgrade to a more modern version of Postgres!

Log Incremental Backups

DBAs and backup admins generally want to avoid making frequent full backups, since they’re so large. The usual pattern is to make a full backup and then dump the archived log files on a nightly basis for a week or two. As the log files are dumped, they can be deleted from the database server, saving considerable space.

In Amanda terms, each of these dumps is an incremental, and is based on the previous night’s backup. That means that the dump after the full is level 1, the next is level 2, and so on. Amanda currently supports 99 levels, but this limit is fairly arbitrary and can be increased as necessary.

The problem in ampgsql, as implemented, is that it allows Amanda to schedule incremental levels however it likes. Amanda considers a level-n backup to be everything that has changed since the last level-n-1 backup. This works great for GNU tar, but not so well for Postgres. Consider the following schedule:

Monday level 0
Tuesday level 1
Wednesday level 2
Thursday level 1

The problem is that the dump on Thursday, as a level 1, needs to capture all changes since the previous level 0, on Monday. That means that it must contain all WAL files archived since Monday, so those WAL files must remain on the database server until Thursday.

The fix to this is to only perform level 0 or level n+1 dumps, where n is the level of the last dump performed. In the example above, this means either a level 0 or level 3 dump on Thursday. A level 0 is a full backup and requires no history. A level 3 would only contain WAL files archived since the level 2 dump on Wednesday, so any WAL files before that could be deleted from the database server.

Summary

The combination of a powerful open source database system and the open source ampgsql plugin combine to produce a powerful protected storage system for your mission-critical data. We will continue to develop additional Application API plugins, and encourage you and other members of the community to do the same!

What’s New in Amanda: Automated Tests

Friday, March 12th, 2010

This is the first in what will be a series of posts about recent work on Amanda. Amanda has a reputation as old and crusty — not so! Hopefully this series will help to illustrate some of the new features we’ve completed, and what’s coming up. I’m cross-posting these on my own blog, too.

Among open-source applications, Amanda is known for being stable and highly reliable. To ensure that Amanda lives up to this reputation, we’ve constructed an automated testing framework (using Buildbot) that runs on every commit. I’ll give some of the technical details after the jump, but I think the numbers speak for themselves. The latest release of Amanda (which will soon be 3.1.0) has 2936 automated tests!

These tests range from highly-focused unit tests, for example to ensure that all of Amanda’s spellings of “true” are parsed correctly, all the way up to full integration: runs of amdump and the recovery applications.

The tests are implemented with Perl’s Test::More and Test::Harness. The result for the current trunk looks like this:

=setupcache.....................ok
Amanda_Archive..................ok
Amanda_Changer..................ok
Amanda_Changer_compat...........ok
Amanda_Changer_disk.............ok
Amanda_Changer_multi............ok
Amanda_Changer_ndmp.............ok
Amanda_Changer_null.............ok
Amanda_Changer_rait.............ok
Amanda_Changer_robot............ok
Amanda_Changer_single...........ok
Amanda_ClientService............ok
Amanda_Cmdline..................ok
Amanda_Config...................ok
Amanda_Curinfo..................ok
Amanda_DB_Catalog...............ok
Amanda_Debug....................ok
Amanda_Device...................ok
        211/428 skipped: various reasons
Amanda_Disklist.................ok
Amanda_Feature..................ok
Amanda_Header...................ok
Amanda_Holding..................ok
Amanda_IPC_Binary...............ok
Amanda_IPC_LineProtocol.........ok
Amanda_Logfile..................ok
Amanda_MainLoop.................ok
Amanda_NDMP.....................ok
Amanda_Process..................ok
Amanda_Recovery_Clerk...........ok
Amanda_Recovery_Planner.........ok
Amanda_Recovery_Scan............ok
Amanda_Report...................ok
Amanda_Tapelist.................ok
Amanda_Taper_Scan...............ok
Amanda_Taper_Scan_traditional...ok
Amanda_Taper_Scribe.............ok
Amanda_Util.....................ok
Amanda_Xfer.....................ok
amadmin.........................ok
amarchiver......................ok
amcheck.........................ok
amcheck-device..................ok
amcheckdump.....................ok
amdevcheck......................ok
amdump..........................ok
amfetchdump.....................ok
amgetconf.......................ok
amgtar..........................ok
amidxtaped......................ok
amlabel.........................ok
ampgsql.........................ok
        40/40 skipped: various reasons
amraw...........................ok
amreport........................ok
amrestore.......................ok
amrmtape........................ok
amservice.......................ok
amstatus........................ok
amtape..........................ok
amtapetype......................ok
bigint..........................ok
mock_mtx........................ok
noop............................ok
pp-scripts......................ok
taper...........................ok
All tests successful, 251 subtests skipped.
Files=64, Tests=2936, 429 wallclock secs (155.44 cusr + 31.48 csys = 186.92 CPU)

The skips are due to tests that require external resources - tape drives, database servers, etc. The first part of the list contains tests for almost all perl packages in the Amanda namespace. These are generally unit tests of the new Perl code, although some tests integrate several units due to limitations of the interfaces. The second half of the list is tests of Amanda command-line tools. These are integration tests, and ensure that all of the documented command-line options are present and working, and that the tool’s behavior is correct. The integration tests are necessarily incomplete, as it’s simply not possible to test every permutation of this highly flexible package.

The =setupcache test at the top is interesting: because most of the Amanda applications need some dumps to work against, we “cache” a few completed amdump runs using tar, and re-load them as needed during the subsequent tests. This speeds things up quite a bit, and also removes some variability from the tests (there are a lot of ways an amdump can go wrong!).

The entire test suite is run at least 54 times for every commit by Buildbot. We test on 42 different architectures - about a dozen linux distros, in both 32- and 64-bit varieties, plus Solaris 8 and 10, and Darwin-8.10.1 on both x86 and PowerPC. The remaining tests are for special configurations — server-only, client-only, special runs on a system with several tape drives, and so on.

Fast Backups of MySQL Running on Amazon EC2

Saturday, January 23rd, 2010

If you are running your MySQL databases on the Amazon EC2 compute cloud, Zmanda Recovery Manager (ZRM) for MySQL can perform fast full backups of these databases by using Elastic Block Store (EBS) Snapshots. ZRM takes only a momentary read lock on the MySQL database during the creation of the snapshot, in order to ensure consistency of the backed up database archive. MySQL Backups using Amazon EBS snapshots are differential backups, meaning that only the blocks that have changed since your last full backup (via EBS snapshot) will be saved. For example, if you have a database with 100 GBs of data, but only 5 GBs of data has changed since your last snapshot, only the 5 additional GBs of snapshot data will be stored back to Amazon S3 during the current full backup run.

EC2 to S3 mysql backup diagram

ZRM automatically deletes EBS snapshots (containing full backups of MySQL) according to the configured retention policy. Just like other snapshot based full backups, ZRM intelligently correlates EBS Snapshots with incremental backups using MySQL logs, enabling you to recover your MySQL instances running on EC2 to any point in time.

Backups made using EBS snapshots can be recovered on the original EC2 instance or on a new EC2 instance. This also provides a quick and convenient mechanism to instantiate new MySQL database servers based on the database state from a desired point-in-time.

ZRM can run on the same EC2 instance as the MySQL database. On the other hand, if you have multiple EC2 instances with MySQL databases, you can run ZRM on one centralized EC2 instance dedicated for backup purposes. In this case, backup configuration and management for all MySQL databases is performed via Zmanda Management Console from this centralized backup server.

We have created an Amazon Machine Image (AMI) with ZRM pre-configured. This makes implementation of a MySQL backup solution on the cloud even simpler. We have used the “EC2 Small Instance” - which is powerful enough to backup most MySQL workloads in the cloud. This also makes it a very cost-effective option. This AMI is available to all ZRM customers, as part of the ZRM Enterprise subscription. You will need to create your own Amazon EC2 account, and pay standard per hour price to Amazon to run an instance based on this AMI. Note that you can configure your backup server instance to run only during the backup window. So, if you are backing up your databases once a week, and your backups takes less than an hour, then you can have this instance up only during that hour. EC2 pricing is per instance-hour consumed from the time an instance is launched until it is terminated. Each partial instance-hour consumed will be billed as a full hour. In addition to the EC2 compute capacity, you will pay standard storage charges for Amazon S3 (to store EBS Snapshots).

Join us on January 28th for a webinar on MySQL Backups (hosted by Sun/MySQL). Along with an introduction to Zmanda Recovery Manager, we will also discuss backing up MySQL applications on the cloud, and demonstrate the new ZRM AMI.

Red Hat Enterprise Linux and Amanda Enterprise: IT Manager’s Backup Solution

Thursday, January 14th, 2010

A backup server represents a very important component of any IT infrastructure. red hat logoYou need to pick the right components to implement a scalable, robust and secure backup server. The choice of the operating system has crucial implications. Red Hat Enterprise Linux (RHEL) provides many of the features needed from an ideal OS for a backup server. Some of these include:

Virtualization: RHEL includes a modern hypervisor (Red Hat Enterprise Virtualization Hypervisor) based on the Kernel-Based Virtual Machine (KVM) technology.  Amanda backup server can be run as a virtual machine on this hypervisor. This virtual backup server can be brought up as needed. This provides optimal resource management, e.g. you can bring up the backup server just at the time of backup window or for restores. A virtualized backup server also makes it much more flexible to change the resource levels depending on the business needs, e.g. if more oomph is needed from the backup server prior to a data center move.

High I/O Throughput:
Backup server represents huge I/Os, typically characterized by large sequential writes. RHEL, both as real and virtual system, provides high I/O throughput needed for a backup server workload. RHEL 5 allows for switching I/O schedulers on-the-fly. So, a backup administrator can fine tune I/O activity to match with higher level function (e.g. write-heavy backups vs. read-heavy restores).

Security: Securing a backup server is critical in any overall IT security planning. In a targeted attack, a backup server provides a juicy target because data that is deemed to be important by an organization can be had from one place. Security-Enhanced Linux (SELinux) in RHEL implements a variety of security policies, including U.S. Department of Defense style mandatory access controls, through the use of Linux Security Modules (LSM) in the Linux kernel. Amanda supports RHEL SELinux configuration. It allows users to run backup server in a secure environment.

Scalable Storage:
Storage technologies built into RHEL provide scalability needed from backup storage. The Ext3 filesytem supports up to 16TB file systems. Logical Volume Manager (LVM) allows for backup storage on a pool of devices which can be added to when needed. System administrators can also leverage Global File System (GFS) to provide backup server direct access to data to be backed up, by-passing the production network.

Compatibility: RHEL is found on compatibility matrix of any modern secondary storage device - whether it be a tape drive, tape library or a NAS device. RHEL also supports wide variety of SAN architectures, including iSCSI and Fibre Channel. This, along with Amanda’s use of native drivers to access secondary media, gives IT managers the widest choice in the market for devices to store backup archives.

Manageability: Easy update mechanism, e.g. using yum, from Redhat Network makes it easier for the administrator to keep the backup server updated with latest fixes (including security patches). Amanda depends on some of the system libraries and tools to perform backup and recovery operations. A system administrator can pare down a RHEL environment to only have bare-minimum set of packages needed for Amanda, and then use RHN to keep these packages up-to-date.

Long Retention Lifecycle: Many organizations need to retain their backup archives for several years due to business or compliance reasons. Each version of RHEL comes with seven year support. This combined with open formats used by Amanda Enterprise makes it practical for IT managers to have real long-term retention policies, with a confidence to be able to recover their data several years from now.

starbucks coffee
In summary, if you are in the process of making a choice for your backup server, RHEL should certainly be in the short-list for operating systems, and (yes, we are biased) Amanda in the short-list for backup software.  We will discuss this combination in detail in a webinar on January 21st. Red Hat is warming up this webinar by offering a $10 Starbucks card for every attendee. Join us!

Our price increased today. Now we are one-tenth the cost of Symantec.

Friday, July 10th, 2009

Today we increased price for the Amanda Enterprise Backup Server. The new price for our Standard subscription level is $500 per year. Our online store is a place to quickly checkout prices for all our products on a single page. This price increase was done in conjunction with release of Amanda Enterprise 3.0, which represents several man years of R&D on the backup server, including advanced media management such as D2D2T. Our subscription provides access to software and enterprise-class support.

Amanda Enterprise is used by businesses of all shapes and sizes. But a typical scenario is the following:

  • Backup Server on Linux
  • One tape library with one or two tape drives. Or VTL on a NAS device.
  • A mix of Linux & Windows servers and desktops to be protected
  • A mix of applications (e.g. Exchange) and databases (e.g. MySQL or Postgres) to be protected
  • Encryption on the server to protect data at rest

In above scenario, customers often consider NetBackup from Symantec as a potential product. Lets compare the new Amanda Enterprise pricing with NetBackup pricing.

First of all, finding prices for NetBackup for a particular configuration is a harrowing experience. There is no place on Symantec website which provides prices for all NetBackup options and features in one consolidated location. Rumor has it that the internal licensing guide for NetBackup is more than 40 pages long!

The least expensive way to buy NetBackup is one of the “Starter Packs”. Their 5 client starter pack with 1 NetBackup server and 1 tape drive license costs $3995. This price does not include any support. Maintenance is priced separately: $720 per year (similar support level to our Standard subscription). This restricts NetBackup server to “Tier 1 and Tier 2″ systems. Tiering is one of the several confusing aspects of Symantec pricing. If your backup server has four or more CPUs, you are out of luck on the Starter Packs. A la carte pricing for NetBackup server and clients is significantly more expensive. A standard NetBackup server for a Tier 3 Linux server lists at $3200 + maintenance contract. Amanda Enterprise Servers or Clients have no tiered pricing. You can choose as hefty a server as your requirements dictate and pay us the same standard price.

Encryption on the backup server is a desired option for IT managers. This protects critical data at rest (or in transit - e.g. when a backup tape is being transferred to a remote location) from unauthorized access. By encrypting on the backup server, you relieve the CPUs on production clients from the burden of encryption. With Netbackup you need to buy the Media Server Encryption Option - list price for which is $10K+ (and this does not include the maintenance cost). Encryption is a built-in feature of the Amanda Enterprise Server.

Per Library Charge: NetBackup’s pricing options for Tape Drives and Tape Libraries is at best confusing. The starter pack above gives access to one tape drive. If you want to use another tape drive in a tape library, you need to buy a Tape Library Option for $3K list price. (And again, that does not include maintenance.). You can drive as many Tape Drives and Libraries you want from a single Amanda Enterprise server.

Want to use a VTL? You need to buy NetBackup Standard Disk or Enterprise Disk option. Standard Disk option is $995 for up to 1TB of data protected. Symantec doesn’t even pass the savings of data compression to you. (On top of that, you guessed it right, that does not include maintenance.). Amanda Enterprise has a built-in capability of transforming disk into virtual tapes. You can also use a VTL of your choice, with no additional cost.

This recent blog gives more color on NetBackup Licensing (Note: We don’t have any affiliation with this blog or its author).

In summary, a $500 Standard subscription for the Amanda Enterprise gives you a backup server which runs on any Tier server, including a Linux or a Solaris server, which can backup to as many tape drives and libraries you want - including VTLs, with server side encryption support, unlimited disk based backup, and vaulting (D2D2T) support. You will not get anything close to that for $5K from Symantec.

How are we able to do this? Our open source development, marketing and distribution model allows us to innovate aggressively with a huge community of developers and users providing extensive feedback on a regular basis. Proprietary software companies spend 60% to 80% of their budget on sales and marketing (Source: Time for a New Software Model), and these costs are passed on to the customers. We have a different equation. Our freely downloadable community editions represent the bulk of our marketing budget. This marketing budget is spent in the form of R&D to innovate and add features and usability to both our community and enterprise editions. So, instead of paying for sales and marketing overhead of proprietary backup software companies, you only pay for the R&D which provides direct benefit to you.

Symantec’s excessive use of tiering, options, different licensing models, components, and packs can be mind numbing for a hapless IT manager just looking to protect their systems and applications. This Doonesbury strip captures Symantec’s nickeling-and-diming pricing strategy for their backup software:

Comprehensive Backup and Recovery for ZFS filesystems

Tuesday, February 3rd, 2009

opensolaris logo

Amanda 2.6.1 release provides Application API and flexibility to configure multiple backup methods for a filesystem or an application. One of the examples of the application API flexibility is backup of ZFS filesystems on OpenSolaris. Amanda performs backups using platform tools to archive a filesystem. In case of ZFS multiple such tools are available to extract data for backup purposes.
Amanda 2.6.1 supports four different methods to backup ZFS filesystem/directories. Each method has pros and cons. They are summarized in the table below:ZFS backup methods
In addition to providing four different backup methods, ZFS snapshot pre-backup plugin is available (regardless of which method the administrator chooses for backup) to backup open files. It provides crash consistent backup of the filesystem/directory. Amanda allows OpenSolaris administrator to configure appropriate backup method for each DLE (disk list entry), which is a unit of Amanda backup configuration.

This work illustrates how availability of open APIs such as Amanda’s Application API fosters development and testing contributions from the Amanda community.