Archive for April, 2009

Backup Zen

Monday, April 20th, 2009

ZenOne of the questions that comes up often in the Backup world is “Why can’t I just write a script to do this myself!?”. Well, as a do-it-yourselfer myself, the answer is “Absolutely, a customized backup script can be written, and in fact, the first version of it won’t be that complicated to develop either”. However, a home-grown backup script can quickly become tedious to enhance and maintain.

Lets look at the progression of events following a decision made by a system administrator or a DBA to develop their own backup scripts. Lets take the example of Joe, a MySQL Database Administrator at an e-commerce company SuperWidgets:

1. January: SuperWidgets has been around for one year, and sales of their super widgets have been increasing. The CEO of SuperWidgets, Mary, has faced serious consequences of losing customer data before, and she tells Joe to make sure their MySQL database, which powers their web store, is being backed up.

2. Joe starts looking at ways to backup the web store database (running on Red Hat Linux), and discovers various tools that came with his MySQL installation: mysqldump, mysqlhotcopy and MySQL replication. After spending a couple of days of doing research on various tools, he decides to use the mysqlhotcopy utility to make a quick raw backup of the database.

3. Joe begins digging into the syntax for mysqlhotcopy, and in a day has a script running under cronjob which performs a backup of the web store database at midnight every day.

4. February: SuperWidgets uses Windows as the development platform for their web applications. One morning, Tom the database developer finds that a filesystem level corruption had taken out his database. He used mirroring, but since the corruption was logical rather than physical, both copies are damaged beyond repair. This causes a downtime of two days for the development team. Mary instructs Joe to make sure that all three databases on the development platform are also backed up.

5. Joe discovers that mysqlhotcopy doesn’t work on Windows. So, after doing further research, he writes a custom script to use mysqldump for backing up the development databases nightly via Windows Task Scheduler.

6. Since MySQL database backup has become a hot button for Mary, Joe starts monitoring the status of each of his MySQL backup scripts. He periodically logs onto each of the five systems where MySQL instances are being backed up by his scripts and makes sure that archives were successfully created the previous night.

7. March: SuperWidgets decides to use Alfresco as the Content Management System (CMS) for an internal project, with MySQL as the underlying database. Tom is in charge of the Alfresco implementation. The data stored in this CMS is sensitive and important. Mary gives Tom the responsibility of backing up and, when needed, restoring the CMS data.

8. Joe ports one of his backup scripts to the system running CMS, and trains Tom on nuances of feeding and caring for his script.

9. April: Tom upgrades his MySQL database and discovers that one of the options for mysqldump has changed, causing the backup scripts to fail. He fixes the script to work with the new mysqldump syntax.

9. May: The business of SuperWidgets has gone through the roof. But, one afternoon the webstore is brought to its knees because of an application error causing database to have inconsistent data. Fortunately, Joe’s script worked and he is able to recover the database using an archive from the previous night. Unfortunately, this meant that the transaction data of hundreds of customers since last night is lost. This forced Mary and rest of the management team to take several actions to manage reputation of now well known SuperWidgets.

10. As a result, Joe is instructed to ensure that MySQL can be recovered to any point-in-time, rather than just to the previous night’s status. Also, he has been instructed to send a high-level summary of MySQL backups to the management on a weekly basis. He has also been asked to look at reducing the amount of time the MySQL database is locked up while backups were being done. To add to Joe’s woes, Tom has decided to leave the organization. Joe must now takeover the backup of the CMS. Joe discovers that Tom has modified the original backup scripts for CMS without providing any documentation.

Joe’s situation is not atypical and shows how a Backup solution involves more than just putting a simple script around a utility which makes copy of data. For workloads of even moderate importance, any organization will find the need for cataloging of backup archives, monitoring and reporting to be vital. A common user interface across various backup methods, which is easy for new personnel to learn, has a huge long-term value as well. This is precisely where our backup solutions come in. Specifically for MySQL, our Zmanda Recovery Manager offers a great solution to Joe’s woes, by providing:

– An intelligent MySQL backup solution which figures out the best way to backup a particular MySQL database
– A common user interface across all platforms, whether they are Linux, Solaris or Windows
– A common user interface across all backup methods, whether they are raw backups, logical backups or snapshot based backups
– Integration between backup methods (e.g. snapshots) and MySQL logs to be able to recover MySQL to any point in time
– Role based access control, enabling management and DBAs to have control over who has access to what data
– A centralized backup solution, enabling a quick and automated health check of the entire backup infrastructure
– A customizable Reporting module, enabling automated reporting for desired levels of details

Zen InnovationsOne of Zmanda’s customers, Zen Innovations, initially backed up their data using scripts and manual backup procedures, but soon found that this was not scalable, and opted for Zmanda’s backup solutions. According to Sergio Laberer, Managing Director of Zen Innovations: “Zmanda’s ability to manage multiple platforms over a web based GUI was exactly what we were looking for. Our initial manual processes, scripts and cron jobs quickly started to get complicated as we grew our infrastructure. We needed to do backups regularly and be in a position to recover quickly without too much manual intervention. Our initial approach was neither scalable nor suitable to work efficiently.”