Subversion incremental backup
By Pierrick Le Gall on Tuesday, April 17 2007, 11:48 - Subversion - Permalink

When your Subversion repository gets bigger and bigger, you need to find a solution to backup only what's new, and not the whole repository. Thanks to Subversion revisions, we can easily identify what's new since last backup. I've used this principle to write a Perl script making incremental backup.
Note: tos means Talend Open Studio. It is the Subversion repository I need to backup.
[perl]
#!/usr/bin/perl
use strict;
use warnings;
my $tos_repos = '/home/fbonan/svn/tos';
my $savedir = '/home/fbonan/save/tos';
my $last_saved_file = $savedir.'/last_saved.txt';
# read the last saved revision from the file
open(LAST_SAVED, '<', $last_saved_file);
my $last_saved = <LAST_SAVED>;
chomp $last_saved;
close(LAST_SAVED);
# get informed of the current last revision (head)
my $head = `svnlook youngest $tos_repos`;
chomp $head;
# of course, if the head is not younger than the last saved revision
# it's useless to go on backuping
if ($last_saved == $head) {
exit();
}
# if the last saved is 1000 and the head is 1023, we want the backup
# from 1001 to 1023
my $from = $last_saved + 1;
my $to = $head;
# the backup filename looks like tos-01001_01023.svndump
my $dumpfile = sprintf(
$savedir.'/tos-%05u_%05u.svndump',
$from,
$to
);
my $command = sprintf(
'svnadmin dump -q -r%u:%u --incremental %s > %s',
$from,
$to,
$tos_repos,
$dumpfile
);
system($command);
# here is the most interesting part of this script, how to make sure
# the dump is a success? I've chosen to read its content to find the
# information that the head revision was saved. If yes, I consider the dump
# backup as a success. It sounds quite reliable.
if (grep /^Revision-number: $to/, `grep --text ^Revision-number: $dumpfile`) {
open(LAST_SAVED, '>', $last_saved_file);
print LAST_SAVED $to, "
";
close(LAST_SAVED);
# here we compress the dump file
system('gzip '.$dumpfile);
# let's add the md5sum of the file to MD5SUMS file storing md5sums of
# all Subversion backups
chdir($savedir);
use File::Basename;
system('md5sum '.basename($dumpfile).'.gz >> '.$savedir.'/MD5SUMS');
}
At each backup session, the script stores the information "until which revision the last dump backuped?" in a file last_saved.txt. The interesting idea of this script is the way it checks the success of the backup. I needed this check because my remote provider sometimes kills the process. To check the dump, I check that the last revision I wanted to save is indeed saved in the dump. I do this by grepping the dump content.
This script needs to have at least one previous backup. The first one must be done by hand.
[bash] $ svnadmin dump -r0:9 --incremental /home/fbonan/svn/tos > tos-00000_00009.svndump * Dumped revision 0. * Dumped revision 1. * Dumped revision 2. * Dumped revision 3. * Dumped revision 4. * Dumped revision 5. * Dumped revision 6. * Dumped revision 7. * Dumped revision 8. * Dumped revision 9. $ gzip tos-00000_00009.svndump $ echo 9 > last_saved.txt
Add the incremental backup script in your crontab (crontab -e)
[bash] $ crontab -l # m h dom mon dow command # every three hours, incremental backup of TOS repository 0 */3 * * * /home/fbonan/script/tos_svndump.pl
After a few days, your backup directory will look like:
[bash] $ ls -1 MD5SUMS last_saved.txt tos-00000_00009.svndump.gz tos-00010_01999.svndump.gz tos-02000_02399.svndump.gz [...] tos-02783_02787.svndump.gz tos-02788_02792.svndump.gz tos-02793_02801.svndump tos-02793_02809.svndump.gz tos-02810_02818.svndump.gz $ cat last_saved.txt 2818
Here you see that there was a problem during backup of revision 2793 to 2801. 3 hours later (due to the crontab configuration), backup script runs once more and saves 2793 to 2809.
Comments