This article documents the steps I took to set up a daily backup to my Backblaze B2 bucket using Duplicacy on Linux.
For the sake of this article, lets say that my Backblaze B2 bucket is called
duplicacy
(it doesn’t since someone else took that name) and the path to the
directory with all of the data I want to back up is called /path/to/repo/
.
Installation
Binaries are available from here: https://github.com/gilbertchen/duplicacy/releases
I downloaded the latest version (2.1.0 as of when I’m writing this) and
moved it to /usr/local/bin/duplicacy
.
Initialization
Repo/Storage Initialization
There are a variety of options that need to be decided on before initializing
your repository and storage; you can see the options using
duplicacy init help
. The main ones are what you want to call your backup
and whether or not you want your backup to be encrypted;
the upside is that not anyone with access to your storage can
reconstruct the contents of your backup.
To initialize a Duplicacy repo at /path/to/repo/
with a repository id of
foobar
with encrypted storage to my duplicacy
bucket in Backblaze B2,
$ cd /path/to/repo
$ duplicacy init foobar b2://duplicacy -e
Duplicacy will prompt you for your B2 bucket id and key, along with a password
used to encrypt the key used to encrypt your files. The encryption key is stored
in the config
file in your storage bucket. If/when you need to restore from
your backup you won’t necessarily need to keep a copy of the config file, but
you’ll need the chosen storage password, the repository id (or access to the
bucket to look it up), and the Backblaze B2 account ID and key.
If you don’t want ecrypted storage, then don’t use the -e
option.
Saving passwords
By default, you will be prompted for your b2 id, key, and storage password every time you run your backup; this obviously is an impediment to automated backups.
If you feel comfortable with being able to protect the .duplicacy
directory,
then all of this information can be stored in plain text in a
.duplicacy/preferences
file using the following commands. Duplicacy also
has the option to save these credentials in certain keychains if they’re
present; more information on these options is on the project wiki
here.
duplicacy set -storage b2://duplicacy -key b2_id -value accountidgoeshere
duplicacy set -storage b2://duplicacy -key b2_key -value keygoeshere
duplicacy set -storage b2://duplicacy -key password -value passwordgoeshere
Filtering
There are a lot of options for configuring exclusions. For detailed documentation on configuring exclusions that do what you’re looking for, see the wiki page on it. By default it will include everything in the local repository, which might be perfectly fine depending on your use case.
These filters are specified in a .duplicacy/filters
file.
Scripting
Duplicacy expects certain scripts to live in a scripts directory within the
.duplicacy
directory. In order to keep everything in one place, I store
my backup script that ends up getting called by cron in there too.
Backup
At least as of Duplicacy 2.1.0, it does not nicely handle the situation where someone a backup script starts running before the previous one completes. There are more details on this issue here.
Here’s what my backup script currently looks like. I should probably update it to use flock instead of the DIY lockfile technique from stackoverflow, but I haven’t tested and implemented that yet.
This script will run a backup with log-style output, 2 threads, and otherwise
default settings. If you want a slower and more thorough backup that hashes all
files to determine what needs to be backed up rather than looking at file sizes
and timestamps, then add the -hash
option to the duplicacy backup
command. Note that the -hash
option will
generate a lot more output which might be relevant if you’re logging the output
like I’m doing.
The paths will need to be updated for your environment.
/path/to/repo/.duplicacy/scripts/backup.sh
#!/bin/bash
# https://stackoverflow.com/a/185473/1388019
lockfile="/tmp/duplicacy.lock"
if [ -e ${lockfile} ] && kill -0 `cat ${lockfile}`; then
echo "duplicacy already running"
exit
fi
# make sure the lockfile is removed when we exit and then claim it
trap "rm -f ${lockfile}; exit" INT TERM EXIT
echo $$ > ${lockfile}
# run the backup with default settings
cd /path/to/duplicacy/repository
/usr/local/bin/duplicacy -log backup -threads 2
# clean up lockfile
rm -f ${lockfile}
Pruning
Duplicacy will execute scripts that you create in file called
.duplicacy/scripts/post-backup
. The one I currently use below will remove all
snapshots older than 1 year and keep 1 snapshot every 7 days for snapshots older
than 60 days.
#!/bin/sh
# Purge old snapshots after backup
# keep none after a year
/usr/local/bin/duplicacy prune -keep 0:365
# keep a snapshot every 7 days after 60 days
/usr/local/bin/duplicacy prune -keep 7:60
Cron
To actually cause the backup script to run, I call it from cron like with
an entry in my /etc/cron
file. This will run the script at 2am every morning.
0 2 * * * root /path/to/repo/.duplicacy/scripts/backup.sh >> /var/log/duplicacy/backup.log 2>&1
Before this can work, you must first create the /var/log/duplicacy/
directory for the backup script output to be logged.
When you run the backup script, here’s an example of what the output looks like when no changes have been made to the local repository since the last backup. If there have been changes, each file will be listed in the output.
2018-04-10 02:00:01.338 INFO STORAGE_SET Storage set to b2://duplicacy
2018-04-10 02:00:17.504 INFO BACKUP_START Last backup at revision 24 found
2018-04-10 02:00:17.504 INFO BACKUP_INDEXING Indexing /path/to/repo
2018-04-10 02:00:17.513 INFO SNAPSHOT_FILTER Loaded 20 include/exclude pattern(s)
2018-04-10 02:03:54.995 INFO BACKUP_THREADS Use 2 uploading threads
2018-04-10 02:04:46.302 INFO BACKUP_END Backup for /path/to/repo at revision 25 completed
2018-04-10 02:04:46.302 INFO SCRIPT_RUN Running script /path/to/repo/.duplicacy/scripts/post-backup
2018-04-10 02:05:17.876 INFO SCRIPT_OUTPUT Storage set to b2://duplicacy
2018-04-10 02:05:17.876 INFO SCRIPT_OUTPUT Keep no snapshots older than 365 days
2018-04-10 02:05:17.876 INFO SCRIPT_OUTPUT No snapshot to delete
2018-04-10 02:05:17.876 INFO SCRIPT_OUTPUT Storage set to b2://duplicacy
2018-04-10 02:05:17.876 INFO SCRIPT_OUTPUT Keep 1 snapshot every 7 day(s) if older than 60 day(s)
2018-04-10 02:05:17.876 INFO SCRIPT_OUTPUT No snapshot to delete
Log maintenance
The following section can be added to /etc/logrotate.conf
to give you 1 year
(12 months) of logs which are compressed monthly.
/var/log/duplicacy/*.log {
compress
monthly
dateext
dateformat -%Y-%m-%d.log
rotate 12
copytruncate
}
Restoring
Should you computer die and you need to use you backup to restore from scratch, you can do so with * your repository id (e.g., foobar) * the information needed to access your Backblaze B2 bucket * your storage password (if you encrypted your backup).
In a different folder or a new computer,
cd
to the directory you want to restore to
$ cd /path/to/new/folder
Initialize the folder as you did the initial repository. The -e
option is
only need if your backup is encrypted.
$ duplicacy init foobar b2://duplicacy -e
List available snapshots
$ duplicacy list
Then either restore everything using revision 20 (for example)
$ duplicacy restore -r 20
or if you only want to restore certain files, for example everything in a folder
called Documents/Important/
, you can do so like below.
$ duplicacy restore -r 20 +Documents/Important/* +Documents/
For more details, see the wiki here and here.
Other References
- https://github.com/gilbertchen/duplicacy/wiki/
- https://www.reddit.com/r/DataHoarder/comments/6xmq1y/duplicati_and_alternatives/dmhxf7z/