Taking a Snapshot of a Thousand Dancing Dolphins

An increasing number of large MySQL applications, e.g. social networking and SaaS back-ends, use a distributed MySQL architecture. MySQL data is distributed logically or heuristically on multiple, and in some cases thousands of, real or virtual servers. Backing up such large and dynamic environments presents its own complexities.

In this blog, we will use the cluster terminology - but we do not imply that NDB Cluster storage engine is being used for MySQL. Most implementations use InnoDB for data and MyISAM for dictionary. Typical architecture for such applications uses Database Sharding - i.e. shared-nothing partitioning of data across similarly configured nodes.

In most sharded environments, high availability is built-in - i.e. the cluster can continue to answer the queries and commit the transactions of all users in face of a node failure. This is typically accomplished either by database level replication or by designing the application so that each row is mirrored on two or more nodes. If MySQL Replication is being used, then slaves can be used for load-balancing as well - as long as it is ok that some clients may not get the latest data on the master node. E.g. a profile update by a user may not be visible to all her friends right away.

But built-in high availability does not do away with the need for setting up a backup and recovery process. Just like RAID does not replace backup, Sharding with redundancy does not replace backup either. The inherent complexity of large scale distributed database environments makes errors (human, system, environmental) more probable. Also, the implied availability of these environments increases the stress during the recovery process.

Here are the backup and recovery needs for such environments, some of the needs conflict with each other:

  • Application managers desire a point-in-time restore which is coordinated across multiple servers.
  • IT managers want to have as identical configuration as possible across all nodes - so process of replacing nodes becomes simple.
  • Depending on the application, retention policy could be several years.
  • Overall application should be able to recover from multiple node failures, human errors or sabotage, and geographic problems (disaster, connectivity etc.)

Zmanda Recovery Manager for MySQL is designed to meet these challenging needs. It uses various backup methods for backing up individual shards, and manages backup and recovery of the overall MySQL environment.

For point-in-time restore capability, ZRM uses MySQL binary logs. In very high update-oriented environments - size of these binary logs can become very big. In such environments, if the organization’s Recovery Point Objective (RPO) requires to be able to recover to any point within the past few weeks, it may not be possible to store these binary logs on the MySQL node itself. In any case, in order to be able to recover in the face of complete node failures, these logs need to be stored outside of the node. So, a storage environment which is physically or logically shared among the nodes is typically a requirement for storing the backup images. This shared secondary storage does not violate the shared-nothing principles of sharding, because it is not in the path of actual application. It is out-of-band storage being accessed and managed by the backup software. Also note that ZRM can automatically remove the binary logs from the MySQL node once they have been copied over to their archive location.

Taking a Snapshot of a multiple MySQL databases

ZRM can use two techniques to allow for point-in-time recovery of distributed MySQL environments: Coordinated Backups or Coordinated Restores:

Coordinated backup provides a backup image of all nodes consistent to a specific event. E.g. all rows are backed up until a specific Global Sequence Number (GSN) - assuming a GSN exists in the application. Another option is to create a checkpoint event specifically for backup purposes. Of course, having a GSN or a checkpoint event may create periodic brief hiccups which may or may not be acceptable for the business needs. But this process creates the cleanest backup images for the whole application.

Coordinated restore allows for each individual node to be backed up completely independent of each other. This eliminates the need for a backup checkpoint event. However at the time of recovery more processing is required to make sure all nodes are recovered to a point which is logically acceptable to the higher level application. ZRM can be scripted to identify this point in the backed up binary logs for every shard. Also, the visual log analyzer feature of ZRM helps DBAs to efficiently search for these points. Note that it is possible that all shards are not recovered to their state as it existed at exact same time, however they should be recovered to a state which is acceptable for the overall application. Having the clocks of nodes synchronized will also help the DBAs to identify points-of-recovery across nodes - by being able to correlate events easily.

Being able to backup a smaller shard instead of the whole dataset provides some opportunities both from technical and logical perspective. Since the size of each shard may be relatively small, a particular backup method may be acceptable even though it would not have been acceptable if the whole dataset was in one monolithic database. If data was distributed among shards using some external criteria (e.g. users of each zip code go to a particular shard), then backup images of each shard may be individually usable by an application. ZRM creates portable backup images - a key need for backing up shards - so backups from one node can be restored on another.

If recovery from a site wide disaster is also an objective, then suitable backup images need to be securely transported to the remote site. This can be done via the new Disaster Recovery Option now available for ZRM. This option replicates backup images, backup catalog and configuration data to the remote site - enabling full disaster recovery on an as-needed basis. Individual nodes need not be replicated, saving huge hassle and cost.

If your show is backed by a pod of dancing dolphins, a well implemented and documented backup and disaster recovery process is a good investment.

Comments are closed.