Thursday, September 20, 2012

rsnapshot - It's for pulling, obviously

It's been a few months since I learned about rsnapshot, a neat little utility that uses hard linking to create multiple point-in-time snapshots of your data. It combines the neat features of rsync into a package that does more than just mirror data.

If you've ever had clients who suffer from "data decay", you'd know that just having the most recent version of a file can be useless. An Excel spreadsheet could have been corrupted weeks ago, and people will continue to write changes to it--despite warning messages--until it really fails hard, which is always too late.

Conceptually I was having a hard time understanding the topology of rsnapshot server and the clients it's supposed to back up. Such questions were:

  1. Are you supposed to have rsnapshot on every client machine, connecting to the server in a push fashion?
  2. Is /etc/rsnapshot.conf supposed to exist on every machine you are backing up?
  3. Or does the server login to all client machines and initiate a pull backup?
  4. Can I store the backup repo on a network share?
  5. Does it use only rsync? 

As per #4, I was quite foolishly modifying the snapshot_root directive in /etc/rsnapshot.conf to point to a network location (CIFS share), even though the comments in that file clearly state that it's supposed to be a local root. I guess this should have been a no-brainer, but in my skimming of the documentation it wasn't clear why I couldn't set the snapshot_root to be a network location!

Only when I tried to use rsnapshot in conjunction with a TeraStation Live did I learn the truth! Behold, rsnapshot sits on the server, manages it's own backup root, and communicates with clients using only rsync, or rsync over SSH (answer to question #5. This makes sense because the rsnapshot server must scan its local repositories for changes, and it does folder rotation. The server can't know what's changed on the client unless it stores state information over there, and if that client dies, then you would lose that data!

For example:

Say you set retention to 5 days. This means that the next daily will cause a rearrangement. daily.2/ becomes daily.1/ and daily.1/ becomes daily.0/, and daily.0/ disappears.

I'll be posting a second note on how to initiate a VSS snapshot of a Windows drive over SSH in preparation for a pull backup from a Windows client to a Linux server. For reference I'm looking at TimeDicer, a GPL tool for doing rdiff-backup of a running Windows system.

No comments:

Post a Comment