Showing posts with label backup. Show all posts
Showing posts with label backup. Show all posts

Saturday, April 5, 2014

Storage Debate: NAS build or buy?

You're a storage guru. You eat storage for breakfast, lunch, and dinner. Storage fears you. I/O buses cower in your presence.

Okay, perhaps you aren't this intense. You don't have a VNX storage array in your bedroom. You aren't any lesser of a man or woman because of that.

But you still need to store and backup your stuff! So you've decided that you want to do it in-house. What are some considerations? Read on to find out.

Well, what about the cloud?

The cloud is in the full throes of popularity, but with the recent NSA and privacy violations from a multitude of companies, it's no wonder that many people, like myself, prefer to keep their data in their own hands.

In the arena of self-hosted data, it's a balance between cost, functionality, and complexity.

Here, I'll look at the difference between Network Attached Storage devices--which are network attached appliances that simply let you store files--and a home-built PC file server.

Cutting Costs

The geeks among you may be saying, "why not just buy a computer with lots of storage bays and create an array and share it? I mean, a NAS is a computer!"

Of course, this is all true. It's also true that a PC is cheaper to build, will be faster than the mostly ARM-based NASes on the market today, and can also do double duty with other roles and services. A PC contains commodity hardware that is both affordable, and readily available.

When it comes to cost, a PC is almost always cheaper.

However, this does not factor in how valuable your time is or whether your intent is to learn the ins-and-outs of storage configuration, which can be a tiresome road to travel.

In my case, I do not have the luxury of spending days tweaking a storage server, or logging in every week to kill runaway processes. I do not have the interest currently in researching all of the best-practice configuration file flags for critical file sharing services.

Future-proof & easy maintenance

NAS RAIDs are standard software RAIDs running on Linux's MD infrastructure. The MD subsystem is tried and true, but lacks some neat features offered at a bit of a higher level--that a NAS will not support. One such feature is the Logical Volume Manager of Linux, which allows very flexible allocation and de-allocation of storage into logical or virtual storage volumes. I use LVM quite frequently to grow shared storage by replacing drive, with almost zero downtime.

Do you really want your NAS to be a full-blown computer? Subject to routine upgrade and maintenance needs, viruses. Do you want to manage a server? Are you a sysadmin?

I've compared Synlogy, QNAP, and Thecus, the 3 underdogs of storage. (and NETGEAR, but I would relegate their NAS offerings to do duty as very well-built doorstops). My current favorite is QNAP.

The boxes are well-built, performant, and QNAP participates in the community. The hardware has gotten so slick, with such cool features as HDMI output, that in the hardware, they've actually pre-installed the XBMC Media Center! I applaud QNAP for going with a best-of-breed existing solution rather than trying to home-bake their own multimedia management solution.  That would have simply ended up being terrible and cause them lots of negative press. XBMC is almost universally recognized as one of the best unified entertainment centers among both the commercial and open-source offerings. It really is that good, and the documentation is even better, the XBMC wiki is second to none.

As I hear, Synology is offering similar features, but if you are one to consider who was there first (me), then you choose that vendor to encourage them to be ahead of the curve.

Joining a domain

If you are building your own storage server and want to integrate it into a domain, you are on your own in trying to get Kerberos and Samba talking Active Directory to your Domain Controllers. It's a secret sauce that Microsoft has kept the recipe under wraps for quite awhile.

Don't get me wrong, it's certainly possible, and many smart people can get this running very quickly on a Linux box, but I am not one of those people.

Finesse and Purposeful Duty

Any PC can be cheap, but a power-mising, living-room-quiet, compact mini PC starts to get into NAS cost territory anyway. A small ITX motherboard, compact case, and other PC components are expensive.

In that respect, a NAS is really great value!

But, even if you thought the PC was the cheaper solution, you've still gotta configure the RAID, keep track of what drives in which bays are on which SATA ports and hope you don't destroy your array if you remove the wrong drive when you go to do maintenance.

Add to this that most NASes are hot-plug ready, meaning zero downtime if you are just swapping out a drive on a RAID5 volume for example.

Plug and Play Apps

Most NAS devices these days come with cool addons, one of which is XBMC as mentioned earlier. I use my QNAP to stream webcams and create time lapses. I use it as a remote music player, since it has a nice streaming interface. I rsync my backups to it, I use it to tunnel SSH ports into my home network as a pseudo-VPN. I use it for VPN (OpenVPN). I use it to Torrent things and time them so I am not torrenting during peak evening hours when the Internet is slow.

You can plug in a USB audio stick and stream tunes straight from the NAS into your stereo system. It talks to your USB UPS and will shut down gracefully.

There are so many neat things you can use a low power, embedded Linux system for, it's quite amazing.

What to do with your hard-earned money?

Give it to QNAP or Synology, browse their respective forums, and enjoy easier and safer access to your files, along with a slew of great add-on apps.

Sunday, March 17, 2013

Product Pick: High-Rely 2-bay AMT disk to disk backup appliance

A great majority of my worrying for my clients comes in the form of backup disaster preparedness. It happens far too often in my line of work that companies don't grasp the idea of how a few hundred dollars now, will save them (and me) countless hours of worry, lost productivity, lost money, and extreme anxiety.

I don't think I'm alone as a sysadmin in feeling guilt about saying to a client "Sorry, there's nothing that I can do." I feel like it's my fault...sigh.

Hardware is replaceable, data is not. One cannot make up data; it was created for a purpose, you pay your employees to create it, and your customers are expecting you to deliver it. If you need to make a shipment tomorrow and the important details were in an Excel file that was lost by a failed hard disk, all the money in the world won't bring it back, and more importantly, even if it did, would it be recreated in time for your deadlines?

Please, backup, and test that you can get the your data back. See some of my other posts for software hints for packages that can help with this.

The Dinosaur that Won't Die - Tape

Sysadmins like myself really dislike tape backup, and this has been the 'go-to' technology for the last, say, 40 years? Linear tape really is an old-fashioned media. It is reliable, but difficult to manage. Since the data is stored in a long stream (linearly) along the tape, individual file restores are very time consuming. The fact that that tape is so cumbersome to write to means that you need a big backup software package to control what goes onto each tape to manage the indexes of which files are located where. The indexes themselves are not usually stored to tape, therefore they have to be recreated if you lose your server, which means manually spending hours feeding tapes and letting your backup software re-index all of the files on each tape.

Then of course there's the human factor issue with tape, where you have to train people to rotate tapes, keep to the schedule, report errors and possibly interact with the server to see status messages (dangerous!). This user(s) may also become "forgetful" and not take the tapes off-site or will forget to bring them back on-site to actually update the tape data. These elaborate retention and rotation schemes are just asking for trouble, and the incremental nature of most tape backups means that you have to have all of your tapes to get all of your data back, the most recent tapes will only contain recently changed files. Yikes!

Another thing to consider is that your disaster recovery plan probably also includes contingencies for if your building burns down. Unfortunately, tapes are no good without a compatible tape drive. They require special hardware to be read. Tape drives run from $2000-$4000 new, and aren't something you can pick up from your local computer fix-it store. This will delay your server/data restoration, unless of course you purchase an extra tape drive that waits and depreciates in an off-site closet for an event that hopefully will never occur.

So you have a lot of things working against you with tape:
  1. Specialized hardware
  2. Recreation of indexes
  3. Inability/difficulty in restoring
  4. Incremental backups
  5. Labourous and error-prone physical management of media
And yet, even Google, in a data loss incident a few years ago, went back to tape to restore data for the 2% of GMail users who had lost emails due to a software update!

While tape is good enough for Google, I believe they have the money for nice automated tape libraries (think "tape jukebox") and dedicated personnel to manage their tapes. Most SMBs don't.

Here I'm going to talk about disk-to-disk backup, because for most of my clients, their Internet connections are not beefy enough to do real on-line backup, and furthermore most online backup houses will not ship you a hard disk when you need to restore everything, you have to download it all. For businesses that I work for, that have average of 1TB of data on shares and Outlook PSTs, that's just not reasonable.

So I recommend disk-to-disk backup to all of my clients.

More on Disk Backup

Disk-to-disk backup is as simple as it seems. You simply use another hard disk(s) to backup the hard disk(s) in your computers and servers. We are up to 4TB density on 3.5" hard disks these days, so data density is very good, for a competitive price. In combination with modern block-level backup (instead of file-based like tape) only the changes to data are stored, and we can do these differential block-based backups granularly on the disk because hard disks are made for random access, unlike tape.

In my own business, I first I tried USB external drives with clients. USB drives always had to be replugged, would spontaneously change drive letters, would have their internal boards fail (Western Digital, I'm looking at you), and people would have this nasty habit of knocking them over while they were running, which would just toast them. On top of that the USB connectors would be destroyed by constant replugging, as they are really not designed for a high number of replug cycles.

Then I looked at purpose made devices like RDX drives. These seem like an intelligent solution until you notice that you need custom software and have to buy media from the manufaturer at inflated prices at storage capacities that are 12-months behind the storage curve. Plus RDX drives were 2.5" meaning they are reduced capacity to begin with, and the portability argument of 2.5" drives over 3.5" wasn't convincing to me.

How could we combine the reliability and robustness of tape with the random access performance and low price of commodity hard disks?

Highly Reliable Systems to the Rescue

After scouring the Internet for a few hours I came upon a company based in Reno, Nevada, making a wide range of devices including a nice appliance that houses 2 x 3.5" standard SATA drives in a nice hot-pluggable configuration. For extra usability they include LCD displays and LEDs right on the device to tell the end user the status of drives and replication.



They call it the High-Rely 2-bay AMT. I call it common sense.

The way it works is that one drive always remains in the AMT. To the OS, the AMT looks like any other eSATA drive. The swapping of drives is not visible to the OS, meaning no problems with backups being missed because drive letters change or because something wonky happened on the USB bus.

The trays are nice and beefy, and on High-Rely's site you can see them chuck a tray with disk inside off of a roof and then demonstrate that the hard disk still functions perfectly.

Because the High-Rely looks like to the host system as a regular fixed disk, the compatibility with backup software becomes ten-fold better than tape, RDX, or USB. You can use Windows' new inbuilt backup to great effect. Not only that, but you can use it in 'exotic' scenarios like hooking it up to a NAS or SAN and do seamless backups of those devices as well!

And what does the user do to manage the High-Rely? When they come into work, and both drives are in green state, just unlock either drive and remove it, insert another one, and watch the lights furiously blink until replication is complete. There is always one drive off-site, just like we IT people like.

All of the RAID1 replication is done inside the High-Rely, so there is no load to the OS, and no management of RAID or the replication process. However, it should be noted that doing a backup to the High-Rely while it is replicating between drives will increase the time until both drives are ready, and will slow the backup job as well.

The sleds are aluminum, with an LCD in front and a hot-swap connector on the back. They don't use the drive's actual SATA connector, to reduce the likelihood of damaging it through continued replugging. There are four screws holding the disk in the caddy, so you can remove and replace the drive with an higher-capacity one down the line. I asked High-Rely support about this and surprisingly they didn't threaten me with claims of invalidating the warranty, they actually laughed and said that's the point! When you need to recover, you can simply take the SATA drive out, connect it to a computer and get your data. What a concept!

Oh, and with Windows Server Backup (wbadmin.exe) you get unlimited retention (until disk is full), so each disk contains weeks of revisions of every file on your server. One client of mine has 800GB of data, and with daily "full" block based backups gets 12 days worth of complete snapshots on each 2TB sled. Very cool.

Some Issues

The one thing I can complain about is optional host software is silly looking. I use it because it offers features like email notification and visual status separate from the front panel.


I called their HQ and spoke to the owner and asked him about integrating other software with this communication channel over SATA and he said the Chinese company that makes the RAID solution used in the High-Rely will not disclose the way that they are sending info over the SATA, so we're stuck with this.

It seems that after a power failure, the High-Rely goes into a state where it doesn't know how to replicate anymore. There is a process I have to go through every months with a client to get it back to normal. It is a simple fix, and no data is lost, I don't have to reconfigure or reset backups. With a proper UPS and stable power grid this would normally never happen, I think.

Summary

So, in conclusion, I always recommend this solution to clients if they have lots of data (over 500GB), and they are willing to shell out the approx. $900 for the appliance and 3 sleds with drives.

High-Rely also offers some more sophisticated NAS-based devices that have large swappable cages that have 3 drives in them. That's 3x4TB, or 12GB, for those in the graphics or video industry.

Tuesday, October 19, 2010

rsnapshot - it's for pulling, obviously!



It's been a few months since I learned about rsnapshot, a neat little utility that uses hard linking to create multiple point-in-time snapshots of your data. It combines the neat features of rsync into a package that does more than just mirror data.

If you've ever had clients who suffer from "data decay", you'd know that just having the most recent version of a file can be useless. An Excel spreadsheet could have been corrupted weeks ago, and people will continue to write changes to it--despite warning messages--until it really fails hard, which is always too late.

Conceptually I was having a hard time understanding the topology of rsnapshot server and the clients it's supposed to back up. Are you supposed to have rsnapshot on every client machine, connecting to the server using only rsync? Actually, no.

Quite foolishly I was modifying the snapshot_root directive in /etc/rsnapshot.conf to point to a network location (CIFS share), even though the comments clearly state that it's supposed to be a local root. I guess this should have been a no-brainer, but in my skimming of the documentation it wasn't clear why I couldn't set the snapshot_root to be a network location!

Only when I tried to use rsnapshot in conjunction with a TeraStation Live did I learn the truth! Behold, rsnapshot sits on the server, manages it's own backup root, and communicates with clients using only rsync, or rsync over SSH. This makes sense because the rsnapshot server must scan its local repositories for changes, and it does folder rotation. For example:

daily.0/
daily.1/
daily.2/
daily.3/
daily.4/

Say you set retention to 5 days. This means that the next daily backup will cause a rearrangement. daily.2/ becomes daily.1/ and daily.1/ becomes daily.0/, and daily.0/ disappears. On a local filesystem, these reflect simple inode reference changes for moves and renames, but on a remote FS, all bets are off.

On a related note. I'll have a quick guide on how to get root access to a TeraStation LIVE, and how to install rsnapshot.

A second post will go over how to initiate a VSS snapshot of a Windows drive over SSH in preparation for a pull backup from a Windows client to a Linux server.

Monday, October 4, 2010

Windows Server Backup - MS gives you less

I have a dream. One where I can backup a server with minimal fuss, with desirable and delicious features like backing up locked files, storing multiple points in time, only backing up changed sectors, and having a command line tool to control jobs.

Believe it or not, there are tons of vendors out there selling backup software that recopies entire files when only a few bytes have changed. So with VM images, this could mean tens of GBs of unnecessary data copies every time you do a backup. Kind of a bummer.

Anyone who's tried to run ntbackup.exe in Server 2008 has discovered the buried, curious, new tool called Windows Server Backup. Maybe this is what we were looking for all along? Maybe not.

Great stuff that MS removed since ntbackup.exe:

  • Backup to tape -- no longer supported, you need a 3rd party backup program to do this
  • Backup to network (CIFS) share -- no longer offered, see note below

A work around for the network share feature is to use the wbadmin.exe tool to manually execute backups to network destinations with the huge limitation that it will completely overwrite the previous backup that was stored there. So for 1TB of data, you are copying 1TB each day!

Other "features"

  • If you select a complete system backup (suitable for bare-metal restore), then all drives on my server are selected because Windows thinks that there are system files on my E:\ drive. But Server Backup won't tell you what/where they are!

Luckily, there are a few bits of candy that Microsoft is going to tease you with:

  • VHD format backups
  • VSS support for getting data in a consistent state (Hyper-V and SQL Server)
  • It's free with Windows Server

I'm a nerd, and I get nerd-fanciful about the fact that VHD is the new backup format. I could conceivably use qemu-img to convert the VHD files to a raw disk image suitable to dd to a new server from a Linux live CD. Seeing as it's used in Hyper-V as well, I can see a lot of 3rd party developers offering tools to recover data from VHDs in the event of corruption.

Having said all of this, the About dialogue shows Windows Server Backup @ v1.0, so maybe all this will be fixed the next time around. How about you? Have you found the perfect backup tool that has the features mentioned at the top of the article?