In this post, I'll be focusing on RSyncBackup, written by Colin Stewart. It allows you to access the powerful functionality of rsync using Python. It shines as a tool for writing custom backup scripts that will fit whatever situation you need. It manages the parsing of the rsync command, handles archive rotation (including deletion) after specifiable intervals, determines if it's been at least 24 hours since the last reboot (great for systems that are online sporadically like laptops), logs the transactions and alerts you by email should something go wrong.
Download
We will be editing the RSyncBackup module to add functionality. If you'd rather just download an already-edited archive, you may do so by clicking here, downloading it, reading the Decompress section, and then skipping down to the Installation section of this post.
Elsewise, download Colin's original code here and follow along with me as I make changes. At the time of this writing, his latest version is 1.3.
Elsewise, download Colin's original code here and follow along with me as I make changes. At the time of this writing, his latest version is 1.3.
Decompress
For Windows users, I suggest cygwin's tar utility, in which case you can follow the same instructions as Linux. But if you don't want to go through that hassle right now, download 7zip, which is a superb compression/decompression utility in its own right.
For Mac, you can just open the file when it's downloaded and it will decompress itself.
For Linux, I suggest using the command line. cd to the directory where the downloaded file lives and unwrap it with the tar utility. For example, if your file downloaded to your user's Download directory:
# cd ~/Downloads # tar -zxvf RSyncBackup-1.3.tar.gz # cd RSyncBackup-1.3/
I won't be going into usage of the tar utility right now. There are lots of tar guides on the Google machine if you're interested.
Edit the Module
Using your favorite text editor, open lib/RSyncBackup.py inside the decompressed directory.
We want to add functionality to use rsync's --password-file switch so we can securely input a password when transferring over the network. All I can figure is that Mr. Stewart didn't account for the possibility that one might need to sync to an rsync server existing on another system.
In RSyncBackup.py, find the text that reads
First, let's add the passwordFile parameter to the documentation which lives just below the function definition. On line 98, I added the following:
Down below, find these lines
Installation
Make sure you're in the decompressed directory (e.g. ~/Downloads/RSyncBackup-1.3/ if the .tar.gz file was untarred in your Downloads folder). Execute the following:
Configuration
The time's come to write a script for our backups. We'll start with the backup.py script provided in the examples/ directory because, well, it's a good example. Copy backup.py to some other filename and open the copy with your favorite text editor.
Skip down to the "Logging to email" section. If you want to use this (and I do recommend it), you'll have to change the values to work for you. You can read about configuring the SMTPHandler here. You'll have to be running a mail server on the mailhost you specify. It can even be gmail.com, but you'll have to make sure you configure that correctly.
testRun=1 indicates that no file operations should be taken: it should only tell us what it would do. testRun will also prevent the timeToBackup() function from stopping the train. In short, if you're just setting up, setting testRun as enabled is a good idea.
the backup command
Here we go. This will show the backup command I'd use if I wanted to
Using your favorite text editor, open lib/RSyncBackup.py inside the decompressed directory.
We want to add functionality to use rsync's --password-file switch so we can securely input a password when transferring over the network. All I can figure is that Mr. Stewart didn't account for the possibility that one might need to sync to an rsync server existing on another system.
In RSyncBackup.py, find the text that reads
def backup (self, source, destination, archive = None, excludeList = None):(line 90 in v1.3) and edit it to match the following
def backup(self, source, destination, archive = None, excludeList = None, passwordFile = None):What this says is, "If the script calls the backup function and doesn't specify a passwordFile parameter, then forget I ever said anything about it." Next, we need to write in code so the function knows what to do if you do specify a passwordFile.
First, let's add the passwordFile parameter to the documentation which lives just below the function definition. On line 98, I added the following:
passwordFile - (Optional) A path to a file containing only a password Rsync should use if a password is required.Format the lines however you wish to make them look good.
Down below, find these lines
if (excludeList is not None): for exclude in excludeList: cmnd = '%s --exclude="%s"' % (cmnd, exclude) cmnd = "%s '%s' '%s'" % (cmnd, source, destination)and add text to make it match this
if (excludeList is not None): for exclude in excludeList: cmnd = '%s --exclude="%s"' % (cmnd, exclude) if (passwordFile is not None): cmnd = '%s --password-file="%s"' % (cmnd, passwordFile) cmnd = "%s '%s' '%s'" % (cmnd, source, destination)As of now, that's the only edit I wish to make to Colin's code.
Installation
Make sure you're in the decompressed directory (e.g. ~/Downloads/RSyncBackup-1.3/ if the .tar.gz file was untarred in your Downloads folder). Execute the following:
# python setup.py installNow the module is installed and we can write or execute scripts that use it.
Configuration
The time's come to write a script for our backups. We'll start with the backup.py script provided in the examples/ directory because, well, it's a good example. Copy backup.py to some other filename and open the copy with your favorite text editor.
# cp backup.py backup_test.py
global variables
At the top of the example backup.py, we import the helper modules and specify the paths to some of our helper files. Feel free to change the path to your helper files. LOG_FILE will be, of course, the location of the log that keeps a record of all the script's actions. LAST_RUN_FILE will keep a timestamp of the last time the script ran. This is used so that you can schedule your backup script to run very frequently (e.g. every hour), but it will consult with LAST_RUN_FILE to determine if it's been long enough to run an actual backup again (default is once every 24 hours, but you can specify your own time amount).
logging
Next up, the script sets up its usage of the Python logging module. If you haven't used logging in your own scripts, now's the time to change your ways. It's absolutely fantastic, and I recommend leaving all this in your backup script. It can be the single most useful tool for debugging, and it makes it so easy to flip on and off like a switch. I'll do a feature on useful logging usage later.Skip down to the "Logging to email" section. If you want to use this (and I do recommend it), you'll have to change the values to work for you. You can read about configuring the SMTPHandler here. You'll have to be running a mail server on the mailhost you specify. It can even be gmail.com, but you'll have to make sure you configure that correctly.
the RSyncBackup object
Now we get to the meat of the script: the actual backup(s). You'll notice the line
backup = RSyncBackup.RSyncBackup(lastRunFile = LAST_RUN_FILE, rsync="/usr/bin/rsync", testRun=1)This is where we define our RSyncBackup object for usage within our script. You'll want to make sure that rsync= is actually pointing to your rsync binary. For cygwin users, /usr/bin/rsync is the correct path.
testRun=1 indicates that no file operations should be taken: it should only tell us what it would do. testRun will also prevent the timeToBackup() function from stopping the train. In short, if you're just setting up, setting testRun as enabled is a good idea.
timeToBackup()
Down below, we see where timeToBackup() comes into play:
if (backup.timeToBackup()):This is where the script determines if it's been long enough since the last backup to continue onto the backup commands. As I said, the default is 24 hours, but if you'd like it to be different, you have to specify when invoking the timeToBackup() function. For instance, if you wanted only a 30-minute window, this line should read as
if (backup.timeToBackup(backupInterval=30)):Or if you wanted 3 days
if (backup.timeToBackup(backupInterval=3*24*60)):For me, I'm fine with the once-per-24-hour backup, so I'll leave it be.
exclude
The next important piece is the exclude directive. The script includes a way to exclude certain files and directories from your backup. I use this when I want to skip my digital archive of media, which changes rarely and therefore should not be considered for backup. Looks like Colin had a similar idea because his example exclusion is
exclude = ['colin/media']If he also wanted to exclude his Documents folder, he would have written it as such:
exclude = ['colin/media', 'Documents']So it's just a list of strings denoting paths to folders or files we do not want backed up. Note that all these exclusions are paths relative to the source path we'll define below.
the backup command
Here we go. This will show the backup command I'd use if I wanted to
- backup everything under my system's home directory
- to a remote host named mybackupshare.com using the username brad in a directory called backups
- with a specific archive directory on that remote host (a path where files that have changed since the last backup will be stored)
- excluding the files listed above, and
- using a particular password file
backup.backup(source="/home/", destination="rsync://brad@mybackupshare.com/backups", archive="backups/archives/", excludeList=exclude, passwordFile="~/.rsync_password")Very cool. Let's say I want to backup from one folder on my computer to another without archives and without excluding anything:
backup.backup(source="/one/folder/", destination="/another/folder/")
Colin's example includes a MySQL backup without archives. This may actually be very handy out of the box for a lot of you. Though I might even run it with archives.
trimArchives()
Now we can "trim" our archives, which will delete archives older than a specified amount of time so we have a rotating archive of changed files. I recommend intentionally creating some archives first that can be deleted (by making a few backups in a row and changing a small file between each backup) and playing around with this to make sure it works how you want. Remember that if you want to make several backups in a row, you'll have to change or exclude the timeToBackup() function to allow multiple backups within a relatively short period of time.
Remember: using the trimArchives() function will delete data where you specify.
The included examples show keeping 5 backups worth of "evolution" archives using a filter. Really, I don't know what "evolution" is, but the filter parameter will only collect archive directories for trimming that match my filter, which is expressed as a regular expression as read by Python's re module. So, let's say I want to keep only 60 backups worth of archive data overall, but I only want to keep the 5 most recent versions of any PhotoShop file since they are large. I could do that by using two trimArchives() functions:
backup.trimArchives('/backup/archives', filter='.psd$', entriesToKeep=5) backup.trimArchives('/backup/archives', entriesToKeep=60)
The optional filter parameter can be an extremely powerful tool. And dangerous. Be careful not to write a regex that will match everything (unless you want it to apply to everything).
the end
If you're done trimming and backing up, throw this into the script
backup.finish()
And if you're using logging, I would suggest leaving the stuff at the bottom of the example script alone.
Create A Password File
If you're planning on using the passwordFile parameter we added, you'll need to create a password file. Open a new document. Name it whatever you want, but I recommend it be named something relevant so you'll know not to delete it later. I also like to keep mine in my user home directory and name it with a dot at the first so it won't show up in a file explorer with default settings. So, for example, the path to and name of my file might be
/home/brad/.rsync_password
Create the file and enter a single line of text: your password with no escapes, exactly as written. Now we take care of security and make sure we're the only ones who have access to it:
Like I said before, if you're still experimenting to make sure your script will work, I recommend keeping the testRun=1 parameter set in your RSyncBackup object. If you have tested and checked the log output to make sure it will run the rsync command the way you want, you can set testRun=0.
Make sure you're in the directory where your backup script lives and run it like this
If it did, add your backup script as a cron job:
# crontab -e
(Warning: crontab uses vi-type controls; if you don't know how to use them and you're stuck in the file, type in <ESC><ESC>:q! and press Enter to get out, then Google for some instructions to use vi)
The line in your crontab should read something like this if you want the script to run every hour on the hour:
Windows users, you may have installed cron with cygwin, but if not, you can do something similar with Scheduled Tasks.
Relax
Now you can rest a little more easy that your data is being safely backed up elsewhere. Remember, a backup isn't a backup if you're only copying the data to another hard drive in the same location. A truly safe backup exists in three physical locations: the original and two separate, off-site locations.
Let me know if you have any questions. I'd expect this document to change over time as it has such a huge scope and I just know there will be typos, mistakes, and many unclear things. So if you are using this information in your own blog or article, link directly to this post rather than copy and paste parts of it.
Create A Password File
If you're planning on using the passwordFile parameter we added, you'll need to create a password file. Open a new document. Name it whatever you want, but I recommend it be named something relevant so you'll know not to delete it later. I also like to keep mine in my user home directory and name it with a dot at the first so it won't show up in a file explorer with default settings. So, for example, the path to and name of my file might be
/home/brad/.rsync_password
Create the file and enter a single line of text: your password with no escapes, exactly as written. Now we take care of security and make sure we're the only ones who have access to it:
sudo chmod 600 /home/brad/.rsync_passwordRun Your Script
Like I said before, if you're still experimenting to make sure your script will work, I recommend keeping the testRun=1 parameter set in your RSyncBackup object. If you have tested and checked the log output to make sure it will run the rsync command the way you want, you can set testRun=0.
Make sure you're in the directory where your backup script lives and run it like this
# python backup_test.pyHopefully everything will work. Check your log output. Check your backup and backup archive directories. Make sure everything went as planned.
If it did, add your backup script as a cron job:
# crontab -e
(Warning: crontab uses vi-type controls; if you don't know how to use them and you're stuck in the file, type in <ESC><ESC>:q! and press Enter to get out, then Google for some instructions to use vi)
The line in your crontab should read something like this if you want the script to run every hour on the hour:
0 * * * * /path/to/your/backup/script/backup.pyRemember, it's okay to have it run every hour. It will only actually run the backup once every 24 hours at most, or however long you specified, so you won't be using up a bunch of resources every hour.
Windows users, you may have installed cron with cygwin, but if not, you can do something similar with Scheduled Tasks.
Relax
Now you can rest a little more easy that your data is being safely backed up elsewhere. Remember, a backup isn't a backup if you're only copying the data to another hard drive in the same location. A truly safe backup exists in three physical locations: the original and two separate, off-site locations.
Let me know if you have any questions. I'd expect this document to change over time as it has such a huge scope and I just know there will be typos, mistakes, and many unclear things. So if you are using this information in your own blog or article, link directly to this post rather than copy and paste parts of it.
Add a comment