Rsync is the Python of the backup world. It's flexible, simple, and focuses on the user's experience. It only seems natural the two should be best friends. However, there's a distinct lack of rsync modules for Python, and an even greater lack of documentation and usage examples for those that do exist.

In this post, I'll be focusing on RSyncBackup, written by Colin Stewart. It allows you to access the powerful functionality of rsync using Python. It shines as a tool for writing custom backup scripts that will fit whatever situation you need. It manages the parsing of the rsync command, handles archive rotation (including deletion) after specifiable intervals, determines if it's been at least 24 hours since the last reboot (great for systems that are online sporadically like laptops), logs the transactions and alerts you by email should something go wrong.

Download

We will be editing the RSyncBackup module to add functionality. If you'd rather just download an already-edited archive, you may do so by clicking here, downloading it, reading the Decompress section, and then skipping down to the Installation section of this post.

Elsewise, download Colin's original code here and follow along with me as I make changes. At the time of this writing, his latest version is 1.3.

Decompress

For Windows users, I suggest cygwin's tar utility, in which case you can follow the same instructions as Linux. But if you don't want to go through that hassle right now, download 7zip, which is a superb compression/decompression utility in its own right.

For Mac, you can just open the file when it's downloaded and it will decompress itself.

For Linux, I suggest using the command line. cd to the directory where the downloaded file lives and unwrap it with the tar utility. For example, if your file downloaded to your user's Download directory:
# cd ~/Downloads
# tar -zxvf RSyncBackup-1.3.tar.gz
# cd RSyncBackup-1.3/
I won't be going into usage of the tar utility right now. There are lots of tar guides on the Google machine if you're interested.

Edit the Module

Using your favorite text editor, open lib/RSyncBackup.py inside the decompressed directory.

We want to add functionality to use rsync's --password-file switch so we can securely input a password when transferring over the network. All I can figure is that Mr. Stewart didn't account for the possibility that one might need to sync to an rsync server existing on another system.

In RSyncBackup.py, find the text that reads
def backup (self, source, destination, archive = None, excludeList = None):
(line 90 in v1.3) and edit it to match the following
def backup(self, source, destination, archive = None,
           excludeList = None, passwordFile = None):
What this says is, "If the script calls the backup function and doesn't specify a passwordFile parameter, then forget I ever said anything about it." Next, we need to write in code so the function knows what to do if you do specify a passwordFile.

First, let's add the passwordFile parameter to the documentation which lives just below the function definition. On line 98, I added the following:
passwordFile - (Optional) A path to a file containing only a password Rsync
               should use if a password is required.
Format the lines however you wish to make them look good.

Down below, find these lines
        if (excludeList is not None):
            for exclude in excludeList:
            cmnd = '%s --exclude="%s"' % (cmnd, exclude)
        cmnd = "%s '%s' '%s'" % (cmnd, source, destination)
and add text to make it match this
        if (excludeList is not None):
            for exclude in excludeList:
                cmnd = '%s --exclude="%s"' % (cmnd, exclude)
        if (passwordFile is not None):
            cmnd = '%s --password-file="%s"' % (cmnd, passwordFile)
        cmnd = "%s '%s' '%s'" % (cmnd, source, destination)
As of now, that's the only edit I wish to make to Colin's code.

Installation

Make sure you're in the decompressed directory (e.g. ~/Downloads/RSyncBackup-1.3/ if the .tar.gz file was untarred in your Downloads folder). Execute the following:
# python setup.py install
Now the module is installed and we can write or execute scripts that use it.

Configuration

The time's come to write a script for our backups. We'll start with the backup.py script provided in the examples/ directory because, well, it's a good example. Copy backup.py to some other filename and open the copy with your favorite text editor.
# cp backup.py backup_test.py
global variables
At the top of the example backup.py, we import the helper modules and specify the paths to some of our helper files. Feel free to change the path to your helper files. LOG_FILE will be, of course, the location of the log that keeps a record of all the script's actions. LAST_RUN_FILE will keep a timestamp of the last time the script ran. This is used so that you can schedule your backup script to run very frequently (e.g. every hour), but it will consult with LAST_RUN_FILE to determine if it's been long enough to run an actual backup again (default is once every 24 hours, but you can specify your own time amount).

logging
Next up, the script sets up its usage of the Python logging module. If you haven't used logging in your own scripts, now's the time to change your ways. It's absolutely fantastic, and I recommend leaving all this in your backup script. It can be the single most useful tool for debugging, and it makes it so easy to flip on and off like a switch. I'll do a feature on useful logging usage later.

Skip down to the "Logging to email" section. If you want to use this (and I do recommend it), you'll have to change the values to work for you. You can read about configuring the SMTPHandler here. You'll have to be running a mail server on the mailhost you specify. It can even be gmail.com, but you'll have to make sure you configure that correctly.

the RSyncBackup object
Now we get to the meat of the script: the actual backup(s). You'll notice the line
backup = RSyncBackup.RSyncBackup(lastRunFile = LAST_RUN_FILE,
                                 rsync="/usr/bin/rsync",
                                 testRun=1)
This is where we define our RSyncBackup object for usage within our script. You'll want to make sure that rsync= is actually pointing to your rsync binary. For cygwin users, /usr/bin/rsync is the correct path.

testRun=1 indicates that no file operations should be taken: it should only tell us what it would do. testRun will also prevent the timeToBackup() function from stopping the train. In short, if you're just setting up, setting testRun as enabled is a good idea.

timeToBackup()
Down below, we see where timeToBackup() comes into play:
if (backup.timeToBackup()):
This is where the script determines if it's been long enough since the last backup to continue onto the backup commands. As I said, the default is 24 hours, but if you'd like it to be different, you have to specify when invoking the timeToBackup() function. For instance, if you wanted only a 30-minute window, this line should read as
if (backup.timeToBackup(backupInterval=30)):
Or if you wanted 3 days
if (backup.timeToBackup(backupInterval=3*24*60)):
For me, I'm fine with the once-per-24-hour backup, so I'll leave it be.

exclude
The next important piece is the exclude directive. The script includes a way to exclude certain files and directories from your backup. I use this when I want to skip my digital archive of media, which changes rarely and therefore should not be considered for backup. Looks like Colin had a similar idea because his example exclusion is
exclude = ['colin/media']
If he also wanted to exclude his Documents folder, he would have written it as such:
exclude = ['colin/media', 'Documents']
So it's just a list of strings denoting paths to folders or files we do not want backed up. Note that all these exclusions are paths relative to the source path we'll define below.

the backup command
Here we go. This will show the backup command I'd use if I wanted to

  • backup everything under my system's home directory
  • to a remote host named mybackupshare.com using the username brad in a directory called backups
  • with a specific archive directory on that remote host (a path where files that have changed since the last backup will be stored)
  • excluding the files listed above, and
  • using a particular password file
backup.backup(source="/home/",
              destination="rsync://brad@mybackupshare.com/backups",
              archive="backups/archives/",
              excludeList=exclude,
              passwordFile="~/.rsync_password")
Very cool. Let's say I want to backup from one folder on my computer to another without archives and without excluding anything:
backup.backup(source="/one/folder/", destination="/another/folder/")
Colin's example includes a MySQL backup without archives. This may actually be very handy out of the box for a lot of you. Though I might even run it with archives.

trimArchives()
Now we can "trim" our archives, which will delete archives older than a specified amount of time so we have a rotating archive of changed files. I recommend intentionally creating some archives first that can be deleted (by making a few backups in a row and changing a small file between each backup) and playing around with this to make sure it works how you want. Remember that if you want to make several backups in a row, you'll have to change or exclude the timeToBackup() function to allow multiple backups within a relatively short period of time.

Remember: using the trimArchives() function will delete data where you specify.

The included examples show keeping 5 backups worth of "evolution" archives using a filter. Really, I don't know what "evolution" is, but the filter parameter will only collect archive directories for trimming that match my filter, which is expressed as a regular expression as read by Python's re module. So, let's say I want to keep only 60 backups worth of archive data overall, but I only want to keep the 5 most recent versions of any PhotoShop file since they are large. I could do that by using two trimArchives() functions:
backup.trimArchives('/backup/archives', filter='.psd$', entriesToKeep=5)
backup.trimArchives('/backup/archives', entriesToKeep=60)
The optional filter parameter can be an extremely powerful tool. And dangerous. Be careful not to write a regex that will match everything (unless you want it to apply to everything).

the end
If you're done trimming and backing up, throw this into the script
backup.finish()
And if you're using logging, I would suggest leaving the stuff at the bottom of the example script alone.

Create A Password File

If you're planning on using the passwordFile parameter we added, you'll need to create a password file. Open a new document. Name it whatever you want, but I recommend it be named something relevant so you'll know not to delete it later. I also like to keep mine in my user home directory and name it with a dot at the first so it won't show up in a file explorer with default settings. So, for example, the path to and name of my file might be
/home/brad/.rsync_password
Create the file and enter a single line of text: your password with no escapes, exactly as written. Now we take care of security and make sure we're the only ones who have access to it:
sudo chmod 600 /home/brad/.rsync_password
Run Your Script

Like I said before, if you're still experimenting to make sure your script will work, I recommend keeping the testRun=1 parameter set in your RSyncBackup object. If you have tested and checked the log output to make sure it will run the rsync command the way you want, you can set testRun=0.

Make sure you're in the directory where your backup script lives and run it like this
# python backup_test.py
Hopefully everything will work. Check your log output. Check your backup and backup archive directories. Make sure everything went as planned.

If it did, add your backup script as a cron job:
# crontab -e
(Warning: crontab uses vi-type controls; if you don't know how to use them and you're stuck in the file, type in <ESC><ESC>:q! and press Enter to get out, then Google for some instructions to use vi)
The line in your crontab should read something like this if you want the script to run every hour on the hour:
0 * * * * /path/to/your/backup/script/backup.py
Remember, it's okay to have it run every hour. It will only actually run the backup once every 24 hours at most, or however long you specified, so you won't be using up a bunch of resources every hour.

Windows users, you may have installed cron with cygwin, but if not, you can do something similar with Scheduled Tasks.

Relax

Now you can rest a little more easy that your data is being safely backed up elsewhere. Remember, a backup isn't a backup if you're only copying the data to another hard drive in the same location. A truly safe backup exists in three physical locations: the original and two separate, off-site locations.

Let me know if you have any questions. I'd expect this document to change over time as it has such a huge scope and I just know there will be typos, mistakes, and many unclear things. So if you are using this information in your own blog or article, link directly to this post rather than copy and paste parts of it.
0

Add a comment

  1. I'll come out with a more "classy" version, but for now, if you want to generate APRS-encoded AFSK with your sound card, this can do it.
    class Bell202:
        import numpy
        import pyaudio
    
        def __init__(self, markFreq=1200, spaceFreq=2200, sampleRate=48000,
                     baud=1200, format=pyaudio.paFloat32):
            self.markFreq = float(markFreq)
            self.spaceFreq = float(spaceFreq)
            self.sampleRate = float(sampleRate)
            self.baud = float(baud)
            self.format = format
            
            # bitLengthSamples is the length of a single bit measured in samples.
            self.bitLengthSamples = self.sampleRate / self.baud
            # bitLengthTime is the length of a single bit measured in seconds.
            self.bitLengthTime = 1.0 / self.baud
            # sampleLengthTime is the length of a single sample in seconds.
            self.sampleLengthTime = 1.0 / self.sampleRate
            self.currentPhase = self.numpy.pi / 2.0
            
        def play(self, audioData):
            try:
                p = self.pyaudio.PyAudio()
                stream = p.open(format=self.format,
                                channels=1,
                                rate=int(self.sampleRate),
                                output=1)
                audio = self.numpy.concatenate(audioData)
                stream.write(audio.astype(self.numpy.float32).tostring())
                stream.close()
                p.terminate()
            except IOError as e:
                print 'failed: %s' % e
                
        def send(self, binaryData):
            self.currentPhase = self.numpy.pi / 2.0
            audio = [self.createSine(self.markFreq, phase=self.currentPhase) if bit == '1'
                     else self.createSine(self.spaceFreq, phase=self.currentPhase)
                     for bit in binaryData]
            self.play(audio)
        
        def createSine(self, frequency, amplitude=1.0, bits=1, phase=0.0):
            w = 2.0 * self.numpy.pi * frequency
            t = self.numpy.arange(bits * self.bitLengthSamples) / self.sampleRate
            audioData = self.numpy.sin(w * t + phase)
            self.currentPhase += w * bits * self.bitLengthSamples / self.sampleRate
            return audioData
    
    def fcs(data):
        fcs = 0xffff
        for bit in data:
            shiftBit = fcs & 0x01
            fcs = fcs >> 1
            if str(shiftBit) != bit:
                fcs = fcs ^ 0x8408
        fcs = fcs ^ 0xffff
        return bin(fcs)[2:].zfill(16)[::-1] # Send FCS low-byte and LSB first
    
    def stuffBits(data):
        from re import sub
        return sub('11111', '111110', data) # Add a 0 after any five 1s
    
    def nrzi(data):
        # Encodes the data in NRZI where 0s in the original data indicate
        # a change in tone and 1s indicate staying on the same tone
        currentState = '0'
        bits = ''
        for bit in data:
            if bit == '0':
                if currentState == '0':
                    currentState = '1'
                else:
                    currentState = '0'
            bits += currentState
        return bits
    
    def sendCoordinates(source, latitude, longitude,
                        destination = 'APRS', digis = ['WIDE1 1','WIDE2 1'],
                        control = 0x03, protocol = 0xf0):
        for address in [source, destination] + digis:
            if len(address) > 7:
                print('Your address %s is greater than 7 characters.\n' +
                      'The maximum length for an address is 7 characters.' % address)
                return False
        SOURCE = source.ljust(7) # 7 characters total (pad with spaces at the end)
        DESTINATION = destination.ljust(7) # Also 7 characters
        DIGIS = ''
        for d in digis:
            DIGIS += d.ljust(7)
        CONTROL = control
        PROTOCOL = protocol
    
        print((latitude,longitude))
        
        from datetime import datetime
        now = datetime.utcnow()
        INFORMATION = '/%02d%02d%02dz%s/%s>Testing new software TNC' % (
                now.day, now.hour, now.minute, latitude, longitude)
        
        data = bytearray()
        data.extend(DESTINATION)
        data.extend(SOURCE)
        data.extend(DIGIS)
        data.append(CONTROL)
        data.append(PROTOCOL)
        data.extend(INFORMATION)
    
        for i in range(len(SOURCE) + len(DESTINATION) + len(DIGIS)):
            data[i] = data[i] << 1 # Addresses' bytes are shifted left
        # The last address should end with high bit to indicate end of addresses
        data[len(SOURCE) + len(DESTINATION) + len(DIGIS) - 1] += 1
        
        binaryData = ''
        for byte in data:
            binaryData += bin(byte)[2:].zfill(8)[::-1] # Send each byte LSB first
        
        binaryData += fcs(binaryData)
        binaryData = stuffBits(binaryData)
        everything = '01111110' * 75 + binaryData + '01111110'
        everything = nrzi(everything)
    
        b = Bell202()
        b.send(everything)
    
    I know that's a bit ugly. But it does work. I only had marginal success this weekend because my transmitter was neither properly placed or properly antenna'd. It's only a 5W. All in all, after ~7 hours of transmitting, only 6 or 7 packets were picked up. But coming from nothing, I'd say that's pretty good.

    pyaudio and numpy are the only required non-standard libraries in this version.

    So, the sendCoordinates method will do just that, send coordinates. You feed it the source callsign, the latitude, and longitude at the very least. The lat & lon should be in APRS form (degrees minutes . fraction of minutes N/S/E/W) where latitude has two digits for degrees and longitude has three digits for degrees, e.g. latitude = 3256.75N, longitude = 01123.01 E

    This method will also have APRS receivers make your map icon a little car. If you want to change that, look up APRS symbol tables and how the symbols are encoded and change the appropriate text in the

    I'll note at the top of this post when I've made progress on my pytnc package. It'll be nicer and all that jazz.
    8

    View comments

  2. If you're seeing this post, it might be because you wandered here from an APRS bulletin board message. If so, celebrate with me, because it was my first successfully sent packet over APRS and the audio was generated with software I wrote rather than a hardware TNC.

    I'll be travelling for a few hours tomorrow and I'm hoping to continually transmit my position as a test for my trip across the country later this summer. I still have to write a parser for the GPS data, but I don't anticipate any problems.

    When I return at the beginning of next week, I'll post working code to generate the packets. It will be limited to broadcasting location. Later on, I hope to develop a full-on TNC Python package (pyTNC) implemented in Python, with complete bindings for AX.25 and APRS. I may have my head in the clouds, but the air's a lot clearer up here.
    1

    View comments

  3. A few months ago, I slapped a cheap, ~30GB SSD into my now four year-old MacBook. For the same reasons I am building my own APRS soft-modem (being a determined cheapskate), I want to get every last ounce of life possible from this laptop.

    The SSD has livened things up quite a bit (~8 second bootup, performance increases in applications). Because the SSD was particularly cheap, and therefore small, I have my /Users folder mounted as a separate volume on my original platter disk, which sits where the Macbook's SuperDrive lived (it had died anyhow).

    As great as things have been, I have been experiencing a problem which would have sent most people to the trash bin with their computer. At least once a month (but usually more often), the OS goes completely wacko and will not boot up despite several attempts to fsck the partition. I have taken to making whole disk backups of the SSD on my Users volume. I usually keep two, for a total of 60GB lost to backup. This isn't really a problem, but it sure is annoying to have to stop what I'm doing for half an hour while dd restores to my last checkpoint.

    I still don't know what's causing it. But maybe (maybe) I've figured it out. This restoration cycle (beginning this past Saturday, I believe) has shown more weird systems. Sometimes, when I try to open a new tab in Chrome, it won't take my where I ask. During those same times, I try to open a new tab in Terminal, and I'm greeted with the following message:
    -bash: fork: Resource temporarily unavailable
    
    Hwhuh? This message means I've reached my personal maximum processes. I decided I was fed up with it and consulted Dr. Google, who referred me to this Facebook post by Kernel Panic Consulting. I've followed the instructions there and thought I'd post all this information before rebooting so I could avoid looking up the link again.

    So that makes me cheap and lazy.

    If I don't report otherwise or remove this post, assume the positive: it worked. And if you guy(s) at KPC visit this post to find from where a very little bit of traffic may be originating, thanks for the tip!

    Edit: Well, it solved my process problem (expected), but an hour after posting, I rebooted and it wouldn't boot into the OS again. Too bad! I hoped that might have been related.

    0

    Add a comment

  4. After a few days of plodding, I am successfully generating continuous phase AFSK. The next step is forming and encoding AX.25 frames correctly, a process for which there are several examples on the web. I just thought I'd share my success story!

    Hopefully, in a few days, I'll share the software I've written.
    0

    Add a comment

  5. My wife and I are preparing to move across the country in a month or two. Naturally, instead of working on useful aspects of the move such as packing logistics or finding a place to live, I've seen the opportunity as a chance to delve further into the world of ham radio and set up an APRS tracker for my vehicle.

    APRS, or Automatic Packet Reporting System, is a protocol whereby packets are transmitted, usually over the 2m band (144.390 MHz in North America), and retransmitted (repeated) to other repeaters. The packet may contain a variety of information, but its most common uses are relaying GPS data or weather station information. For now, I'm hoping to transmit the former.

    The great thing about APRS is that so many people have set up internet gateways, where packets heard are uploaded to a central server and placed appropriately on a map (checkitout: aprs.fi). Although there is a lack of coverage in many areas, the paths tracing the interstate system are pretty dense. The bad thing about APRS is that it is usually implemented with special hardware. Special hardware = $.

    I have the use of my brother's Bluetooth GPS unit from which my computer can obtain my current coordinates. I also have a Quansheng TG-UV2 handheld radio with which I may transmit my coordinates. I just need something to encode or modulate the data into the audio FSK signal.

    So I set out on a Great Internet Adventure to find a software modem for APRS or at least a software Bell 202 modem or some prewritten module for speaking the AX.25 protocol. I was surprised to find no great options. There's an old piece of Windows software I couldn't make work, and there's the AX.25 modem in the Linux kernel without a default way of speaking through the sound card. All in all, any existing solutions are not cross-platform and I couldn't make any work.

    Next step: write my own. I begin with two main requirements:

    1. The software must be cross-platform, playing nicely with the big 3 (Linux, Windows, and Mac).
    2. It must talk and listen through the sound card, using no specialty hardware between the computer and the radio.

    The first broad goal is to be able to encode data into an audio signal. Eventually, I'll also want it to hear packets so it can become a digipeater or internet gateway. Far, far, down the road, I wouldn't mind having written a piece of software that can talk all kinds of protocols.

    But was I beaten to the punch? GNU Radio is a suite of software radio tools, the products of which can yield exactly what I'm looking for. Its advantages are that it already includes much of the backbone I'd need to create APRS modulations. GNU Radio is also cross-platform, one of my main requirements. In fact, most applications for GNU Radio are written in Python. The extreme disadvantage I see with GNU Radio is the lack of pre-built binaries. One way to make sure a piece of software aimed at the amateur radio community won't get used is telling them they'll have to compile software to use it. For this reason, right now, I lean away from GNU Radio.

    For cross-platform-ness and familiarity, I want to build the software in Python. To meet my second requirement, talking through the sound card, I'd need to find a Python module capable of accessing the raw sound card I/O. There aren't many out there, but PyAudio stands out among them. Cross-platform but easily installed, I can use PyAudio to capture and push audio from and to the sound card. It's possible to create audio using math and analyze audio that's be grabbed.

    I think is the path I'll take, but I wanted to put my reasoning in writing. Also, I'll feel a little more committed to completing the project after having announced it. I'll keep you informed as I (read: if I) make progress.
    0

    Add a comment

  6. Rsync is the Python of the backup world. It's flexible, simple, and focuses on the user's experience. It only seems natural the two should be best friends. However, there's a distinct lack of rsync modules for Python, and an even greater lack of documentation and usage examples for those that do exist.

    In this post, I'll be focusing on RSyncBackup, written by Colin Stewart. It allows you to access the powerful functionality of rsync using Python. It shines as a tool for writing custom backup scripts that will fit whatever situation you need. It manages the parsing of the rsync command, handles archive rotation (including deletion) after specifiable intervals, determines if it's been at least 24 hours since the last reboot (great for systems that are online sporadically like laptops), logs the transactions and alerts you by email should something go wrong.

    Download

    We will be editing the RSyncBackup module to add functionality. If you'd rather just download an already-edited archive, you may do so by clicking here, downloading it, reading the Decompress section, and then skipping down to the Installation section of this post.

    Elsewise, download Colin's original code here and follow along with me as I make changes. At the time of this writing, his latest version is 1.3.

    Decompress

    For Windows users, I suggest cygwin's tar utility, in which case you can follow the same instructions as Linux. But if you don't want to go through that hassle right now, download 7zip, which is a superb compression/decompression utility in its own right.

    For Mac, you can just open the file when it's downloaded and it will decompress itself.

    For Linux, I suggest using the command line. cd to the directory where the downloaded file lives and unwrap it with the tar utility. For example, if your file downloaded to your user's Download directory:
    # cd ~/Downloads
    # tar -zxvf RSyncBackup-1.3.tar.gz
    # cd RSyncBackup-1.3/
    
    I won't be going into usage of the tar utility right now. There are lots of tar guides on the Google machine if you're interested.

    Edit the Module

    Using your favorite text editor, open lib/RSyncBackup.py inside the decompressed directory.

    We want to add functionality to use rsync's --password-file switch so we can securely input a password when transferring over the network. All I can figure is that Mr. Stewart didn't account for the possibility that one might need to sync to an rsync server existing on another system.

    In RSyncBackup.py, find the text that reads
    def backup (self, source, destination, archive = None, excludeList = None):
    
    (line 90 in v1.3) and edit it to match the following
    def backup(self, source, destination, archive = None,
               excludeList = None, passwordFile = None):
    
    What this says is, "If the script calls the backup function and doesn't specify a passwordFile parameter, then forget I ever said anything about it." Next, we need to write in code so the function knows what to do if you do specify a passwordFile.

    First, let's add the passwordFile parameter to the documentation which lives just below the function definition. On line 98, I added the following:
    passwordFile - (Optional) A path to a file containing only a password Rsync
                   should use if a password is required.
    
    Format the lines however you wish to make them look good.

    Down below, find these lines
            if (excludeList is not None):
                for exclude in excludeList:
                cmnd = '%s --exclude="%s"' % (cmnd, exclude)
            cmnd = "%s '%s' '%s'" % (cmnd, source, destination)
    
    and add text to make it match this
            if (excludeList is not None):
                for exclude in excludeList:
                    cmnd = '%s --exclude="%s"' % (cmnd, exclude)
            if (passwordFile is not None):
                cmnd = '%s --password-file="%s"' % (cmnd, passwordFile)
            cmnd = "%s '%s' '%s'" % (cmnd, source, destination)
    
    As of now, that's the only edit I wish to make to Colin's code.

    Installation

    Make sure you're in the decompressed directory (e.g. ~/Downloads/RSyncBackup-1.3/ if the .tar.gz file was untarred in your Downloads folder). Execute the following:
    # python setup.py install
    
    Now the module is installed and we can write or execute scripts that use it.

    Configuration

    The time's come to write a script for our backups. We'll start with the backup.py script provided in the examples/ directory because, well, it's a good example. Copy backup.py to some other filename and open the copy with your favorite text editor.
    # cp backup.py backup_test.py
    
    global variables
    At the top of the example backup.py, we import the helper modules and specify the paths to some of our helper files. Feel free to change the path to your helper files. LOG_FILE will be, of course, the location of the log that keeps a record of all the script's actions. LAST_RUN_FILE will keep a timestamp of the last time the script ran. This is used so that you can schedule your backup script to run very frequently (e.g. every hour), but it will consult with LAST_RUN_FILE to determine if it's been long enough to run an actual backup again (default is once every 24 hours, but you can specify your own time amount).

    logging
    Next up, the script sets up its usage of the Python logging module. If you haven't used logging in your own scripts, now's the time to change your ways. It's absolutely fantastic, and I recommend leaving all this in your backup script. It can be the single most useful tool for debugging, and it makes it so easy to flip on and off like a switch. I'll do a feature on useful logging usage later.

    Skip down to the "Logging to email" section. If you want to use this (and I do recommend it), you'll have to change the values to work for you. You can read about configuring the SMTPHandler here. You'll have to be running a mail server on the mailhost you specify. It can even be gmail.com, but you'll have to make sure you configure that correctly.

    the RSyncBackup object
    Now we get to the meat of the script: the actual backup(s). You'll notice the line
    backup = RSyncBackup.RSyncBackup(lastRunFile = LAST_RUN_FILE,
                                     rsync="/usr/bin/rsync",
                                     testRun=1)
    
    This is where we define our RSyncBackup object for usage within our script. You'll want to make sure that rsync= is actually pointing to your rsync binary. For cygwin users, /usr/bin/rsync is the correct path.

    testRun=1 indicates that no file operations should be taken: it should only tell us what it would do. testRun will also prevent the timeToBackup() function from stopping the train. In short, if you're just setting up, setting testRun as enabled is a good idea.

    timeToBackup()
    Down below, we see where timeToBackup() comes into play:
    if (backup.timeToBackup()):
    
    This is where the script determines if it's been long enough since the last backup to continue onto the backup commands. As I said, the default is 24 hours, but if you'd like it to be different, you have to specify when invoking the timeToBackup() function. For instance, if you wanted only a 30-minute window, this line should read as
    if (backup.timeToBackup(backupInterval=30)):
    
    Or if you wanted 3 days
    if (backup.timeToBackup(backupInterval=3*24*60)):
    
    For me, I'm fine with the once-per-24-hour backup, so I'll leave it be.

    exclude
    The next important piece is the exclude directive. The script includes a way to exclude certain files and directories from your backup. I use this when I want to skip my digital archive of media, which changes rarely and therefore should not be considered for backup. Looks like Colin had a similar idea because his example exclusion is
    exclude = ['colin/media']
    
    If he also wanted to exclude his Documents folder, he would have written it as such:
    exclude = ['colin/media', 'Documents']
    
    So it's just a list of strings denoting paths to folders or files we do not want backed up. Note that all these exclusions are paths relative to the source path we'll define below.

    the backup command
    Here we go. This will show the backup command I'd use if I wanted to

    • backup everything under my system's home directory
    • to a remote host named mybackupshare.com using the username brad in a directory called backups
    • with a specific archive directory on that remote host (a path where files that have changed since the last backup will be stored)
    • excluding the files listed above, and
    • using a particular password file
    backup.backup(source="/home/",
                  destination="rsync://brad@mybackupshare.com/backups",
                  archive="backups/archives/",
                  excludeList=exclude,
                  passwordFile="~/.rsync_password")
    
    Very cool. Let's say I want to backup from one folder on my computer to another without archives and without excluding anything:
    backup.backup(source="/one/folder/", destination="/another/folder/")
    
    Colin's example includes a MySQL backup without archives. This may actually be very handy out of the box for a lot of you. Though I might even run it with archives.

    trimArchives()
    Now we can "trim" our archives, which will delete archives older than a specified amount of time so we have a rotating archive of changed files. I recommend intentionally creating some archives first that can be deleted (by making a few backups in a row and changing a small file between each backup) and playing around with this to make sure it works how you want. Remember that if you want to make several backups in a row, you'll have to change or exclude the timeToBackup() function to allow multiple backups within a relatively short period of time.

    Remember: using the trimArchives() function will delete data where you specify.

    The included examples show keeping 5 backups worth of "evolution" archives using a filter. Really, I don't know what "evolution" is, but the filter parameter will only collect archive directories for trimming that match my filter, which is expressed as a regular expression as read by Python's re module. So, let's say I want to keep only 60 backups worth of archive data overall, but I only want to keep the 5 most recent versions of any PhotoShop file since they are large. I could do that by using two trimArchives() functions:
    backup.trimArchives('/backup/archives', filter='.psd$', entriesToKeep=5)
    backup.trimArchives('/backup/archives', entriesToKeep=60)
    
    The optional filter parameter can be an extremely powerful tool. And dangerous. Be careful not to write a regex that will match everything (unless you want it to apply to everything).

    the end
    If you're done trimming and backing up, throw this into the script
    backup.finish()
    And if you're using logging, I would suggest leaving the stuff at the bottom of the example script alone.

    Create A Password File

    If you're planning on using the passwordFile parameter we added, you'll need to create a password file. Open a new document. Name it whatever you want, but I recommend it be named something relevant so you'll know not to delete it later. I also like to keep mine in my user home directory and name it with a dot at the first so it won't show up in a file explorer with default settings. So, for example, the path to and name of my file might be
    /home/brad/.rsync_password
    Create the file and enter a single line of text: your password with no escapes, exactly as written. Now we take care of security and make sure we're the only ones who have access to it:
    sudo chmod 600 /home/brad/.rsync_password
    
    Run Your Script

    Like I said before, if you're still experimenting to make sure your script will work, I recommend keeping the testRun=1 parameter set in your RSyncBackup object. If you have tested and checked the log output to make sure it will run the rsync command the way you want, you can set testRun=0.

    Make sure you're in the directory where your backup script lives and run it like this
    # python backup_test.py
    
    Hopefully everything will work. Check your log output. Check your backup and backup archive directories. Make sure everything went as planned.

    If it did, add your backup script as a cron job:
    # crontab -e
    (Warning: crontab uses vi-type controls; if you don't know how to use them and you're stuck in the file, type in <ESC><ESC>:q! and press Enter to get out, then Google for some instructions to use vi)
    The line in your crontab should read something like this if you want the script to run every hour on the hour:
    0 * * * * /path/to/your/backup/script/backup.py
    
    Remember, it's okay to have it run every hour. It will only actually run the backup once every 24 hours at most, or however long you specified, so you won't be using up a bunch of resources every hour.

    Windows users, you may have installed cron with cygwin, but if not, you can do something similar with Scheduled Tasks.

    Relax

    Now you can rest a little more easy that your data is being safely backed up elsewhere. Remember, a backup isn't a backup if you're only copying the data to another hard drive in the same location. A truly safe backup exists in three physical locations: the original and two separate, off-site locations.

    Let me know if you have any questions. I'd expect this document to change over time as it has such a huge scope and I just know there will be typos, mistakes, and many unclear things. So if you are using this information in your own blog or article, link directly to this post rather than copy and paste parts of it.
    0

    Add a comment

  7. For years, I used Perl's excellent WWW::Mechanize library simply because I didn't think a good alternative existed. I hear people say a lot that a good programmer doesn't learn just one language, but instead uses whatever language best fits the job. I don't know that I'm a good programmer, but I know a good programmer has to know what else is available to be able to decide what's best.

    Enter Python. Becoming beloved for its readability, classic one-liners, and the self-similar, universal nature of the Python object, it was my second programming language. It was preferred in my workplace simply because there was a lot of turnover. We were left with bits of code for projects we'd have to start over because it was nearly unreadable. It's possible that when someone codes in Python, they're under the impression that it should be more readable, so they concentrate on making it better themselves, but all I know is it was much better for the workplace.

    I naturally wanted to know how to scrape with Python so I wouldn't be limited to the use of Perl. I tried BeautifulSoup and the like, but it just didn't do what I expected out of the box. Eventually, I came across a post on StackOverflow detailing how to do what I wanted. Making a small edit, this is how I do it now.

    At the beginning of each Python script where I want to scrape about, especially when I need to authenticate or perform some function that requires the storing of cookies, I add this:

    import urllib
    import urllib2
    import cookielib
    
    cj = cookielib.CookieJar()
    browser = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    
    def open(url, postdata=None):
        if postdata is not None:
            postdata = urllib.urlencode(postdata)
        return browser.open(url, postdata).read()
    


    I'll then go on and write a login algorithm specific to the site I wish to read. In the past, I've mostly used the TamperData plugin to track requests I send to the website using a regular browser so I can duplicate it with the Python script.

    So let's say http://example.com has a login that requires me to send my username, password, and the date (including the time of day in a certain format, something I've never seen, but it makes for a good hypothetical situation). First, I will GET http://example.com, and then I will build my POST request and send it to the same URL.

    import urllib
    import urllib2
    import cookielib
    import datetime
    
    cj = cookielib.CookieJar()
    browser = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    
    def open(url, postdata=None):
        if postdata is not None:
            postdata = urllib.urlencode(postdata)
        return browser.open(url, postdata).read()
    
    USER = 'username'
    PASS = 'password'
    URL = 'http://example.com'
    now = datetime.datetime.now().strftime('%Y-%m-%d %H:%M')
    
    open(URL)
    
    POST = {'username': USER,
            'password': PASS,
            'date': now}
    
    loggedIn = open(URL, POST)
    
    print(loggedIn)
    


    After logging into a page, you usually want to GET other pages, but I'll stop here so I don't keep spinning a hypothetical situation. Suffice it to say that this method has worked well for me.

    Now, all I ask is that if you mention this implementation you link it back to this post in case I change something. Furthermore, please don't use this work to do something illegal, immoral, or against the terms of service for a particular site. I've had my say.
    0

    Add a comment

  8. Let's have a show of hands: all those who use Windows? Okay. All those who use cygwin on Windows? Well, if your hand's not raised, give cygwin a look.

    Cygwin is a collection of Linux tools, libraries, and software in general compiled for the Windows environment. It won't let you run Linux executables natively, but during installation, you can choose to install tools such as rsync, scp, shutdown, ssh, ftp, tftp, and more. My main motivation for working with cygwin was to stop using PuTTY, which is a great piece of software, but I wanted a more "integrated" method of using my Linux-y tools (more on integrating cygwin with your PATH in another post).

    Of course, no CLI is complete without vim. Hate it or love it (read: be unfamiliar with it or be familiar with it), vi and its variants, including vim, have an important place in CLI usage. You may read more about vim here. The problem with the default vim installation on Windows, however, is its non-configured implementation. Most OSs with vim ship it with a pretty standard keymap and color scheme, but not cygwin.

    The fix is fairly simple, though. Just execute the following in your cygwin shell:

    cp /usr/share/vim/vim73/vimrc_example.vim ~/.vimrc
    


    What you're doing here is copying an example vim configuration to your user .vimrc file in your cygwin home directory. Now when you start vim, it will check for this file and find that one exists, and read its contents to find out how it should make vim look and behave. Note again that you must execute this command in the cygwin shell for your computer to make any sense of it.

    So there's another Top Gear Top Tip-- er, I mean cygwin tip. Thanks for this one go to Scott, who also owes his thanks to someone else.
    0

    Add a comment

  9. For years, I've been fixing some stuff and breaking much more. I've sometimes kept sparse notes as to how I fixed something and more importantly, how I broke it.

    There comes a time when those notes are too difficult to read any longer, when their format doesn't lend itself well to searching, indexing, recalling, and following. Thusly, another tech blog enters the world. What makes this one different? Nothing, except it's written by me, Brad Gentry.

    I'm a student of most computer technologies: hardware interfacing, software programming, printer smashing, the works. My interests take me everywhere from building Roomba security guards to ham radio telemetry for high altitude balloons.

    I'll begin by organizing the notes I've kept into posts every few days. From there, whenever I have to learn something new or think of something clever, I'll post it. So thanks for visiting. I'll try and learn something for you.
    0

    Add a comment

Blog Archive
Labels
Loading
Dynamic Views theme. Powered by Blogger.