I also back up multiple hosts to the same repository, which actually results in insane storage space savings. One thing I'm missing though is being able to specify multiple repositories for one snapshot such that I have consistency across the multiple backup locations. For now the snapshots just have different ids.
I haven't tried that recently (~3 years), does that work with concurrency or do you need to ensure one backup is running at a time? Back when I tried it I got the sense that it wasn't really meant to have many machines accessing the repo at once, and decided it was probably worth wasting space but having potentially more robust backups. Especially for my home use case where I only have a couple machines I'm backing up. But it'd be pretty cool if I could replace my main backup servers (using rsync --inplace and zfs snapshots) with restic and get deduplication.
Locks are created e.g. when you want to forget/prune data or when doing a check. The way I handle this is that I use systemd timers for my backup jobs. Before I do e.g. a check command I use an ansible ad-hoc command to pause the systemd units on all hosts and then wait until their operations are done. After doing my modifications to the repos I enable the units again.
Another tip is that you can create individual keys for your hosts for the same repository. Each host gets its own key so that host compromise only leads to that key being compromised which can then be revoked after the breach. And as I said I use rest-servers in append-only mode so a hacker can only "waste storage" in case of a breach. And I also back up to multiple different locations (sequentially) so if a backup location is compromised I could recover from that.
I don't back up the full hosts, mainly application data. I use tags to tag by application, backup type, etc. One pain point is, as I mentioned, that the snapshot IDs in the different repositories/locations are different. Also, because I back up sequentially, data may have already changed between writing to the different locations. But this is still better than syncing them with another tool as that would be bad in case one of the backup locations was compromised. The tag combinations help me deal with this issue.
Restic really is an insanely powerful tool and can do almost everything other backup tools can!
The only major downside to me is that it is not available in library form to be used in a Go program. But that may change in the future.
Also, what would be even cooler for the multiple backup locations, is if the encrypted data could be distributed using e.g. something like shamir secret sharing where you'd need access to k of n backup locations to recreate the secret data. That would also mean that you wouldn't have to trust whatever provider you use to back up to (e.g. if it's amazon s3 or something).
Backups are append only and each host gets its own key, the keys can be individually revoked.
Edit: I have to correct myself. After further research, it seems that append-only != write-only. Thus you are correct in that a single host could possibly access/read data backed up by another host. I suppose it depends on use-case whether that is a problem.
I believe that that using non-ECC RAM is a potential cause of silent disk errors. If you read a sector without error, then a cosmic ray flips a bit in RAM containing that sector, you now have a bad copy of the sector with no error indication. Even if the backup software does a hash of the bad data and records it with the data, it's too late: the hash is of bad data. If you are lucky and the hash is created before the RAM bit flip, at least the hash won't match the bad data, so if you try to restore the file, you'll get an error at restore time. It's impossible to recover the correct data, but at least you'll know that.
The good news is that if you backup the bad data again, it will be read correctly, and be different from the previous backup. The bad news is, most backup software skips files based on metadata such as ctime and mtime, so until the file changes, it won't be re-saved.
We are so dependent on computers these days, it's a real shame that all computers don't come standard with ECC RAM. The real reason for that is that server menufacturers want to charge higher prices to data centers for "real" servers with ECC.
Also not sure why this was posted, did a new version release or something?
And that's what I did myself. Organically it grew to ~200 lines, but it sits in the background (created a systemd unit for it, too) and does its job. I also use rclone to store the encrypted backups in an AWS S3 bucket
I so much forget about it that sometimes I have to remind myself to test it out if it still works (it does).
Original size Compressed size Deduplicated size
All archives: 2.20 TB 1.49 TB 52.97 GB
Last time I used restic a few years ago, it choked on not so large data set with high memory usage. I read Borg doesn't choke like that.
The files are on HDD, and the machine doesn't have a lot of RAM, looking at high I/O wait times and low CPU load overall, I'm pretty sure the bottleneck is in loading filesystem metadata off disk.
I wouldn't backup billions of files or petabytes of data with either restic or borg; stick to ZFS for anything of this scale.
I don't remember what the initial scan time was (it was many years ago), but it wasn't unreasonable — pretty sure the bottleneck also was in disk I/O.
Cheap, reliable, and almost trouble-free.
Not affiliated, just a happy user.
I switched over from Duplicati a long while back when my laptop's sole HDD failed and Duplicati was giving me 143 year estimates for the restore to complete. This was true whether I aimed to restore the whole drive or just a single file.
Fair point though, both have enough of a user base that they could be considered safe at this point.
"Baqpaq takes snapshots of files and folders on your system, and syncs them to another machine, or uploads it to your Google Drive or Dropbox account. Set up any schedule you prefer and Baqpaq will create, prune, sync, and upload snapshots at the scheduled time.
"Baqpaq is a tool for personal data backups on Linux systems. Powered by BorgBackup, RSync, and RClone it is designed to run on Linux distributions based on Debian, Ubuntu, Fedora, and Arch Linux."
At: https://store.teejeetech.com/product/baqpaq/
Though personally I use Borg, Rsync, and some scripts I wrote based on Tar.
What are the current recommendations here to do periodic backups of a NAS with lower (not lowest) costs for about 1 TB of data (mostly personal photos and videos), ease of use and robustness that one can depend on (I know this sounds like a “pick two” situation)? I also want the backup to be completely private.
I've been mostly using restic over the past five years to backup two dozen servers + several desktops (one of them Windows), no problems so far, and it's been very stable in both senses of the word (absence of bugs & unchanging API — both "technical" and "user-facing").
https://github.com/restic/restic
The important thing is to run periodic scrubs with full data read to check that your data can actually be restored (I do it once a week; once a month is probably the upper limit).
restic check --read-data ...
Some suggestions for the receiver unless you want to go for your own hardware:https://www.rsync.net/signup/order.html?code=experts
(the code is NOT a referral, it's their own internal thingy that cuts the price in half)
There are plenty of storage server providers where you can get ssh access and 1-2TB for a few dollars per TB per month. You can run multiple repositories from a single server.
As the data is encrypted, even if the storage server is compromised, your data can't be read by others without the key.
We are doing our best to complete existing solutions :)
DSC009847.JPG
were actually named like this: DSC009847-b3-73ea2364d158.JPG
where "-b3-" means "what's coming before the extension are the first x bits (choose as many hexdigits as you want) of the Blake3 cryptographic hash of the file...We'd be living in a better world.
I do that for many of my files. Notably family pictures and family movies, but also .iso files, tar/gzip'ed files, etc.
This makes detecting bitflips trivial.
I've create little shellscripts for verification, backups, etc. that work with files having such a naming scheme.
It's bliss.
My world is a better place now. I moved to such a scheme after I had a series of 20 pictures from vacation with old friends that were corrupted (thankfully I had backups, but the concept of "determining which one is the correct file" programmatically is not that easy).
And, yes, it detected one bitflip since I'm using it.
I don't always verify all the checksums: but I've got a script that does random sampling... It picks x% of the files with such a naming scheme and verifies the checksum of these x% of files picked randomly.
It's not incompatible with ZFS: I still run ZFS on my Proxmox server. It's not incompatible with restic/borg/etc. either.
This solves so many issues, including the "How do you know your data is correct?" (answer is: "Because I've already looked that family movie after the cryptographic hash was added to its name").
Not a panacea but doesn't hurt and it's really not much work.
DSC009847.JPG.b3sum
sidecar files [1] or per-directory checksum files like B3SUMS
because they can be verified with standard tools.
This scheme also allows you to checksum files whose names you can't or don't want to change.
(Though in that situation you have an alternative of using a symlink for either the original name or the name with the checksum.)
I have used the scheme less since I adopted ZFS.I do use very similar example.com/foo/bar/b3-abcd0123.html for https://example.com/foo/bar in the archival tool for outgoing links on my website. It avoids the need to have a date prefix like in the Wayback Machine while preventing duplication.
Speaking of .iso files. A recent PR [2] to my favorite Linux USB-disk-image burning tool Caligula has added support for detecting and verifying sidecar files like foo.iso.sha256 (albeit not Blake).
Doesn't really make much sense for BitTorrent uploads (which provides its own much stronger hashes), it's a holdover from the era of IRC bots.