/ Backups

Tell Me Honestly ... Do You Have a (Current) Backup of your Files?

Why Might You Need a Backup?

  • Hardware failure
    • Probably the most common reason
    • If you have data on a traditional hard drive with spinning platters, those moving parts will break one day. Guaranteed.
    • If you have a solid state drive (SSD), it is less of a concern, but the flash memory in SSDs can only be written to so many times before they fail and can't be written to anymore. Granted, most consumers will never reach this point because most SSDs are designed to be pretty resilient, but it can still happen
  • Accidental deletion of a file
  • You mess up the file in some way
    • You'd be using the backup as a form of version control. However, if you are making lots of changes and want to be able to revert back to old versions, you should probably use a real version control system like git or SVN.
  • Theft or loss of your laptop or other device
  • Destruction of your laptop or device via house fire
    • The National Fire Protection Association (NFPA) provides some interesting statistics on house fires, such as
      • The average household will see 5 home fires (though this includes all home fires, not just ones that completely burn down your home)
      • The average household will have 4 reported home fires
  • Your little children messing with your stuff. This will probably involve one of the other bullet points above

But Maybe You Don't Need One?

I'm pretty sure there are some people out there that don't need a backup - at least not a traditional backup. These people would rely on the cloud. For example, if they use Google Docs and Sheets for all their documents and spreadsheets, and use an app like Google Photos on their phone that can automatically upload photos and videos to their Google Photos account (which provides unlimited storage [1]), then they may not even need a traditional backup[2].

But for everyone else.....you need a backup.

Characteristics of a Good Backup System

Backups Occur Regularly

Backups should occur at least daily. Ideally more often for critical files.

Automated

Running daily backups manually is a pain, and will likely lead to skipping backups. Automate them! Most backup software is able to schedule backups.

Follows the 3-2-1 Rule

You should have at least three copies of your data, on at least two different media, with at least one copy off site.

Stores old versions of files

Replication is not backup. If your backup system only copies the files from your primary computer to the backup destination, and doesn't preserve the old versions of files, then your backup system can't help you if you mess up a file and you don't restore it before the next backup run.

Local Backup Options

Media

These days the only viable local backup media for most people are external hard drives[3] and flash drives. You have to be careful with flash drives though. Even though both solid state drives (SSDs) and flash drives are both composed of flash memory, SSDs are designed to handle many writes, while flash drives generally are not. If you have some files that don't change very often, a flash drive will work fine. But if you are backing up files that change a lot and often, I wouldn't recommend using a flash drive.

If you have multiple terabytes of data, then you may want to consider buying a NAS appliance (or possibly building one yourself). They are standalone machines such as this one that connect to your network and present themselves as a network drive. I will go into more detail about these in a future post.

Software

Software I've used

Windows

On Windows machines I'm a fan of Syncback

  • Can backup to any local folder, a network drive, or a FTP server
  • Can do safe copies - if a file has changed, it can write to a temporary file and then rename it as opposed to writing over the file. This is important if the system crashes during the write.
  • Can compare files via checksums
    • By default it will compare files via file sizes and modification dates. If either of these differ the file will be copied
    • A checksum is like a very short summary of a file. By comparing checksums of files Syncback can determine if a file has changed even if its file size or modification date has not changed
  • Can compress files before transfer
  • Can schedule backups
  • Can email logs

They also have paid versions that add features such as cloud storage support (including Google Drive, Microsoft One Drive, and Dropbox), the ability for files to be synchronized in real time, file versioning, incremental and differential backups, and detecting renamed files. It seems to me that the Pro versions of Syncback may support efficiently storing updated files (by storing only the differences), but it wasn't clear from the documentation and I didn't download a trial.

When I used to use Syncback, the way I kept old versions of files is I made multiple copies of my backups. I had four different sets - ones made daily, four days, seven days, and nine days. Because four, seven, and nine are co-prime (don't share any common factors), I maximized the number of days I could go before all the backups occurred on the same day (252 days). Of course, the big disadvantage of this approach is that I need the disk space for four copies of my data.

Linux (and Mac)

For Linux users (available on the Mac as well), rsync is the tried and true solution. Unfortunately it's a command line only program (though I assume if you're a Linux user, you're probably comfortable with the command line (though its certainly possible to use some Linux distributions like Ubuntu without ever touching the command line)). Its capable of

  • Comparing files via checksum
  • Updating files in place (important for a copy on write (CoW) filesystem like ZFS to reduce disk usage when using snapshots)
  • Preserving all Linux file permissions
  • Preserving symlinks
  • Preserving hardlinks
  • Compressing files before transfer
  • Resuming interrupted transfers
  • Sending files over SSH

Software I Haven't Used

I recently learned about this program called Restic. It is supported on Windows, Linux, OpenBSD, FreeBSD, and Mac, and can backup to both local folders and almost any cloud storage provider you could imagine by optionally using rclone (it's like rsync but for cloud storage). Reminiscent of ZFS, it's designed around snapshots which makes storing differences between files very easy and efficient. It also has a "rolling checksum" feature so that it can actually continue to deduplicate blocks of files even if you insert bits into the file (such that the blocks would not be aligned properly). And according to a podcast I listened to, it's apparently very fast. I've been meaning to look into it but my todo list is constantly growing.

Offsite Backup

The Buddy System

Notice how I didn't say cloud backup, I said offsite. For most people, offsite backup will be cloud backup. However, if like me you have a limited upload speed and a 1TB/month bandwidth cap, cloud backup may not necessarily be feasible if you have multiple terabytes of data to backup. To get around this issue, send a hard drive to a friend or family member with your backup on it. Unfortunately you'd have to mail this back and forth every so often to keep the backup up to date. If you're feeling adventurous, you could set up a Raspberry Pi as a file server so you could backup data over the Internet (probably via SSH). A Raspberry Pi is substantially slower than the typical laptop, but you and your friend's Internet connection is probably the limiting factor anyway.

Cloud Backup Services

  • For photos only: Amazon Photos and Google Photos provide unlimited storage
    • Google Photos will downsample photos that are greater than 16 megapixels (MP) to 16 MP. They will also recompress your photo regardless of its pixel count. On a computer screen it still looks fine from the photos I tested, but if you get a big print, it might make a difference
    • Google Photos also has unlimited storage of videos, but it will downsample videos that are greater than 1080p to 1080p. It will also recompress any video you upload regardless of its size
    • Google Photos is free
    • Amazon Photos is not free - it is bundled with Amazon Prime
    • Unlike Google Photos, Amazon Photos will not recompress or downsample your photos, or otherwise change them in any way [4]
    • Amazon Photos does not have unlimited storage of video
    • Amazon Photos also provides unlimited storage for RAW photos
    • Both Google Photos and Amazon Photos have mobile apps that can automatically upload all photos from your phone to your account
  • Your own Google Drive or other cloud storage account + a backup client capable of uploading to cloud storage
    • My alma mater uses Google to provide email, file storage, etc. to its students. This includes an unlimited Google Drive account that I still have. If your alma mater used Google, check if you have an unlimited Google Drive account
    • Backup clients include
      • Restic as discussed earlier
      • Rclone, but given how Restic offers additional features and can use any destination that rclone supports, I'd suggest giving Restic a try first
      • Arq Backup
        • I used Arq Backup for a while. The main reason I abandoned it was I was tired of running it a Windows virtual machine on my Linux computer (Arq Backup doesn't support Linux)
        • Supports encryption of your files before upload
          • Encryption is mandatory
        • Supports compression of your files before upload
        • Deduplicates data before upload
        • Only uploads changed portions of files
        • Supports the following cloud storage providers
          • Amazon Cloud Drive - 5 GB free, $60/yr per TB
          • Google Drive - 15 GB free, $120/yr per TB
          • Dropbox - 2 GB free, $120/yr per TB
          • OneDrive - 15 GB free, $104/yr per TB
          • and a couple other less common ones
        • Costs a one time $50
          • You can add free upgrades to new versions of Arq for $30
  • Backblaze (affiliate link - thank you for supporting my blog)
    • I used Backblaze for several years before Arq Backup. The reason I abandoned it was because I started spending more time in Linux than Windows (I dual boot), but Backblaze doesn't support Linux. I had concerns about my backups not being done frequently enough
    • It costs just $5/mo, $50/yr, or $95/2yrs for unlimited backup
    • They store older versions of your files
    • They provide a program that will automatically backup all your files in the background
    • You can restore just a single file if you wish
    • If you need to restore a lot of data, instead of downloading it all over the Internet you can have your data shipped to you on a 128 GB flash drive for $99 or a 4TB USB hard drive for $189.
    • All your data is encrypted. You can generate your own passphrase to prevent even Backblaze from reading your data
    • There are no restrictions on sizes of individual files (some services will restrict this)
    • I truly think this is the best backup service out there. Very affordable and it's unlimited storage! Also, I haven't seen other services offer shipping a flash or hard drive with your data. If you have a hardware failure or other complete data loss event and you need to get up and running immediately, that feature will save your bacon
      • The one caveat is that Backblaze's prices are per computer. There are some other services that may end up being cheaper if you have multiple computers, but you'll give up the ship a flash/hard drive feature

Image Backups

Most people probably don't need these, but image backups are really powerful. Image backups essentially create a bit for bit copy of a partition of your hard drive (or the whole disk if you prefer). This creates a snapshot in time of your partition that you can always restore.

Suppose you created an image, and then installed a driver for your graphics card. For whatever reason, the driver installation goes wrong and your system won't display anything. That's ok, just restore the image and you're back to before you installed the driver. Because I have image backups I never spend more than about 10 minutes trying to fix an issue with my computer.

Or suppose your computer gets infected with a virus or ransomware. No problem, just restore from your image backup.

Clonezilla is a free program to create and restore image backups. It does require you to install it to a live USB and boot from it. It's fairly intuitive to use, and is capable of compressing and encrypting your images.

If you're going to image backup your operating system, you should separate out your documents onto a separate partition. If you don't, then when you restore an image, you're going to restore the version of your documents at the time of the backup as well.


  1. With the caveat that photos above 16 megapixels are downsampled to 16 megapixels, and all photos are recompressed (the file sizes change, sometimes substantially, before and after upload), and videos are downsampled to 1080p ↩︎

  2. Yes, Google Photos is serving as a backup here, but given it can't store anything other than photos or videos, it's not your standard backup ↩︎

  3. You can use a second internal hard drive too, but considering most people don't have desktops and laptops with multiple hard drives are pretty rare, I left them out of the discussion ↩︎

  4. I verified this via checksums before and after uploading ↩︎

Tell Me Honestly ... Do You Have a (Current) Backup of your Files?
Share this

Subscribe to Seonwoo's Musings