How should I do backups?

I have a server running Debian with 24 TB of storage. I would ideally like to back up all of it, though much of it is torrents, so only the ones with low seeders really need backed up. I know about the 321 rule but it sounds like it would be expensive. What do you do for backups? Also if anyone uses tape drives for backups I am kinda curious about that potentially for offsite backups in a safe deposit box or something.

TLDR: title.

Edit: You have mentioned borg and rsync, and while borg looks good, I want to go with rsync as it seems to be more actively maintained. I would like to also have my backups encrypted, but rsync doesn't seem to have that built in. Does anyone know what to do for encrypted backups?

CameronDev ,

It depends on the value of the data. Can you afford to replace them? Is there anything priceless on there (family photos etc)?
Will the time to replace them be worth it?

If its not super critical, raid might be good enough, as long as you have some redundancy. Otherwise, categorizing your data into critical/non-critical and back it up the critical stuff first?

taladar ,

RAID is not backup. Many failure sources from theft over electrical issues to water or fire can affect multiple RAID drives equally, not to mention silent data corruption or accidental deletions.

tal ,
@tal@lemmy.today avatar

Yeah...I've never totally lost my main storage and had to recover from backups. But on a number of occasions, I have been able to recover something that was inadvertently wiped. RAID doesn't provide that.

Also, depending upon the structure of your backup system, if someone compromises your system, they may not be able to compromise your backups.

If you need continuous uptime in the event of a drive failure, RAID is an entirely reasonable thing to have. It's just...not a replacement for backups.

taladar ,

Oh, all my drives are RAID too, mostly for the convenience of being able to use them while I order a replacement for a failed drive and not having to restore from backup once I get that.

CameronDev ,

Its not, but if the value of the data is low, its good enough. There is no point backing up linux isos, but family photos definitely should be properly backed up according to 3-2-1.

cybersandwich ,

I don't have nearly that much worth backing up(5TB--and realistically only 2TB is probably critical), but I have a Synology Nas(12TB raid 1) and truenas (zfs striped/mirrored) that I back my stuff to (and they back up to each other).

Then I have a raspberry pi with a USB drive (8tb) at my parents house 4 hours away, that my Synology backs up to (over tailscale).

Oh, and I have a USB HDD(8tb) that I plug in and backup my Synology Nas to and throw in my fireproof safe. But thats a manual backup I do once every quarter or 6 months if I remember. That's a very very last resort backup.

My offsite is at my parents.

And no, I have not tested it because I don't know how I'm actually supposed to do that.

tal ,
@tal@lemmy.today avatar

Synology Nas(12TB raid 1)

I have to say that I was really surprised that apparently there isn't a general solution for gluing together different-sized drives in an array reasonably-efficiently other than Synology's Hybrid RAID. I mean, you can build something that works similarly on a Linux machine, but there apparently isn't an out-of-the-box software package that does that. It seems like the kind of thing that'd be useful, but...shrugs

cybersandwich ,

I think unRAID does that. But I never looked into it much tbh.

7Sea_Sailor ,
@7Sea_Sailor@lemmy.dbzer0.com avatar

Both UnraidFS and mergerFS can merge drives of separate types and sizes into one array. They also allow removing / adding drives without disturbing the array. None of this is possible with traditional RAID (or at least not without a significant time sink for re-making the array), no matter the type of RAID you use.

Dunstabzugshaubitze ,

And no, I have not tested it because I don't know how I'm actually supposed to do that.

depends on what you backup and how.

if it's just "dumb" files (videos, music pictures etc.), just retrieve them from your backups and check if you can open the files.

complex stuff? probably try to rebuild the complex stuff from a backup and check if it works as expected and is in the state you expect it to be in. how to do that really depends on the complex stuff.

i'd guess for most people it's enough to make sure to backup dumb files and configurations, so they can rebuild their stuff rather than being able to restore a complex system in exactly the same state it was in before bad things happened.

narc0tic_bird ,
@narc0tic_bird@lemm.ee avatar

I backup my /home folder on my PC to my NAS using restic (used to use borg, but restic is more flexible). I backup somewhat important data to an external SSD on a weekly basis and very important data to cloud storage on a nightly basis. I don't backup my *arr media at all (unless you count the automated snapshots on my NAS), as it's not really important to me and can simply be redownloaded in most cases.

So I don't and wouldn't apply the 321 rule to all data as it's simply too expensive for the amount of data I have and it'd take months to upload with my non-fiber internet connection. But you should definitely apply it to data that's important to you.

taladar ,

I have just been using Borg with a Hetzner Storagebox as the target. That has the advantage of being off-site and not using up a lot of space since it deduplicates. It also encrypts the backup. It might take a while for the initial backup at 24TB though depending on your connection.

sturlabragason ,

Shit I’ve never heard of Hetzner but their pricing makes de-Googling all my decades of family photos a viable option! Thanks!

ponchow8NC ,

Damn never heard of them looks great. Is there any catch or is it like a small company that might go out of business in a few years? I still haven't had to backup more then 4tb but once I do get up to those numbers they might be the best option compared to offsite hard drives like I been doing

taladar ,

They are anything but small. They are probably one of the biggest German hosting companies out there.

buedi ,

As mentioned already, Hetzner is a very big Hoster in Germany. I am a customer since nearly 15 years now and in all that time they also rised the prices only once for the package I use (and I think it was only recently in 2023 or so where it went from 4,90€ to 5,39€). Also their Storage Box seems to be not only one of the cheapest out there I have seen, but as far as I remember, you do not have to pay for the traffic if you want to restore your data, like it is with other hosters. Also they had a good service, were responsive if I opened a Ticket in the past and I can not remember if I had ever problems with the service I use (Web Hosting package).

7Sea_Sailor ,
@7Sea_Sailor@lemmy.dbzer0.com avatar

Can confirm that there is 0 ingress or egress fees, since this is not an S3 container storage server, but a simple FTP server that also has a borg&restic module. So it simply doesnt fall into the e/ingress cost model.

dan ,
@dan@upvote.au avatar

is it like a small company that might go out of business in a few years?

Hetzner is one of the largest hosting companies in the world.

qaz ,

I have been using their nextcloud service for several years now and it works great.

Deckweiss , (edited )

The software borgbackup does some insane compression.

It is more effective if you backup multiple machines tbh
(my 3 linux computers with ~600gb used each get compressed down to a single ~350gb backup, because most of the files are the same programs and data over and over again)

But it might do a decent enough job in your case.

So one of the solutions might be getting a NAS and setting up borgbackup.

You could also get a second one and put it in your parents or best friends home for an offsite backup.

That way you don't have to buy as large of a drive capacity, but will only have fixed costst (+electricity) instead of ongoing costs for some rented server storage.

I guess that would be about 400$ per such a device, if you get a used office pc and buy new drives for it.


Tape seems to be about half the price per TB, but then you need special reader/writer for it, which are usually connected via SAS and are FUCKING EXPENSIVE (over 4000$ as far as I can see).

It only outscales HDDs in price after like ~600TB

taladar ,

How do you handle the cache invalidation issue with Borg when backing up multiple systems to one repo? For me if I access a Borg repository from multiple computers (and write from each) it has to rebuild the cache each time which can take a long time.

Deckweiss ,

I seperate them by archive name prefix and never had the issue you describe.

Edit: it seems I just never noticed it, but the docu suggest you're right. Now I am confused myself lol.

https://borgbackup.readthedocs.io/en/stable/faq.html#can-i-backup-from-multiple-servers-into-a-single-repository

zeluko ,

big reason why i switched to kopia, borg just doesnt cut it anymore..

Moonrise2473 ,

Easy: I make a Borg repository not only for a single server but for each directory. In this way if I need a file from nextcloud with an extremely generic name like "config" I only search in there and not sift between 100k similarly named files

Decronym Bot , (edited )

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
Git Popular version control system, primarily for code
NAS Network-Attached Storage
PSU Power Supply Unit
RAID Redundant Array of Independent Disks for mass storage
SSD Solid State Drive mass storage
VPS Virtual Private Server (opposed to shared hosting)

6 acronyms in this thread; the most compressed thread commented on today has 10 acronyms.

[Thread for this sub, first seen 30th Mar 2024, 20:45]
[FAQ] [Full list] [Contact] [Source code]

Vilian ,

do you use a COW fs?, using snapshots could be helpful

exscape ,
@exscape@kbin.social avatar

Helpful yes, but far from enough. It only helps in some scenarios (like accidental deletes, malware), but not in many others (filesystem corruption, multiple disks dying at once due to e.g. lightning, a bad PSU or a fire).

Offsite backup is a must for data you want to keep.

raldone01 ,

Just want to note here:

Snapshots are NOT a backup.

While btrfs is quite stable corruption/disk failure can always happen. Bcachefs had a little opsie daisy that caused some FS level corruption. Snapshots won't help in this case.

Snapshots are great for quick restoration on user error.

Vilian ,

bcachefs is very early tho, and the corruption was reversible becuase of the COW nature, but i agree

solrize , (edited )

I've been using Borg and Hetzner Storage Box. There are some small VPS hosts that actually beat Hetzner's pricing but I have been happy with Hetzner so am staying there for now. With 24TB of data you could also look at Hetzner's SX64 dedicated server. It has a 6 core Ryzen cpu and 4x 16TB HDD's for 81 euro/month. You could set it up as RAID 10 which would give you around 29 TiB of usable storage, and then you also have a fairly beefy processor that you can use for transcoding and stuff like that. You don't want to seed from it since Hetzner is sticky about complaints that they might get.

Tape drives are too expensive unless you have 100s of TB of data, I think. Hard drives are too unreliable. If you leave one in a closet for a few years, there's a good chance it won't spin back up.

dan , (edited )
@dan@upvote.au avatar

for 81 euro/month.

You can probably find something cheaper from their auction servers.

I've got a storage VPS with HostHatch for my backups. It's one of their Black Friday deals from a few years ago - 10TB storage for $10/month. Not sure they'll offer that pricing again, but they did have something similar for around double the price during sales last year (still a good deal!)

Tape drives are too expensive unless you have 100s of TB of data, I think

The drives are expensive, and some manufacturers have expensive proprietary software, but the tapes themselves are cheaper per TB than hard drives, and they usually have a 20 or 30 year life guarantee. People seem to think tapes is old technology but modern tapes can fit 18TB uncompressed (they say 45 TB compressed but idk).

The default tier of AWS glacier uses tape, which is why data retrieval takes a few hours from when you submit the request to when you can actually download the data, and costs a lot.

mea_rah ,

The default tier of AWS glacier uses tape, which is why data retrieval takes a few hours from when you submit the request to when you can actually download the data, and costs a lot.

AFAIK Glacier is unlikely to be tape based. A bunch of offline drives is more realistic scenario. But generally it's not public knowledge unless you found some trustworthy source for the tape theory?

douglasg14b ,
@douglasg14b@lemmy.world avatar

I might be crazy but I have a 20TB WD Red Pro in a padded, water proof, locking, case that I take a full backup on and then drive it over to a family members 30m away once a month or so.

It's a full encrypted backup of all my important stuff in a relatively different geographic location.

All of my VM data backs up hourly to my NAS as well. Which then gets backed up onto the large drive monthly.

Monthly granularity isn't that good to be fair but it's better than nothing. I should probably back up the more important rapidly changing stuff online daily.

dan ,
@dan@upvote.au avatar

30m away

30 minutes, 30 miles, or 30 metres?

douglasg14b ,
@douglasg14b@lemmy.world avatar

Yes.

I'm sure one can reasonably infer that I do not mean 30 meters.

Conveniently at highway speeds 30 minutes and 30 miles away are essentially equal.

I'll try and use appropriate notation next time

dan ,
@dan@upvote.au avatar

I was just joking :)

30 minutes can vary a lot depending on traffic. If there's traffic, it can take me 30-40 minutes to get home from work even though it's only 11 miles away and ~15 mins with no traffic.

sepi ,

I put the prndl in r and just goose it

ramble81 , (edited )

I have my BD/DVD/CD collection backed up to S3 Glacier. It’s incredibly cheap, offsite, and they worry about the infrastructure. The amount of Hard drive and infrastructure space you’ll need to back up nearly that amount will cost you the about the same give or take. Yes it’ll cost a bit in the event of a catastrophic restore, but if I have something happen at the house, at least I have an offsite backup.

dan ,
@dan@upvote.au avatar

How much does Glacier cost you? Last time I checked, some hosts had warm storage for around the same price, at least during Black Friday or New Year sales.

bandwidthcrisis ,

I can't recall storage costs (they're on the website somewhere but are not straightforward).

I was paying maybe $7 a month for a few hundred Gb, although not all of that was glacier.

But retrieval was a pain. There's no straightforward way to convert back from glacier for a lot of files and there's a delay. The process creates a non-glacier copy with a limited lifespan to retrieve.

Then the access costs were maybe $50 to move stuff out.

I moved to rsync.net for the convenience and simplicity. It even supported setting up rclone to access s3 directly. So I could do cloud-to-cloud to copy the files over.

rambos ,

I use Kopia to backup all personal data (nextcloud, immich, configs, etc) daily to another disk in the same server and also to backblaze B2. Its not proper 321 but feels good enough. I dont backup downloadable content because its expensive

kylian0087 ,

What I use is Borg. I use Borg to backup the server to a local NAS. Then I have a NAS at my grand parents house which I use to store the backups of the NAS it self.

dan ,
@dan@upvote.au avatar

I have a storage VPS with HostHatch - 10TB for $10/month. That pricing was from a Black Friday sale a few years ago. They may not offer it that cheap again, but it's worth keeping an eye out for their sales. They had something similar last year but double the price, which is still a good deal.

I use Borgbackup to back up the data to the HostHatch VPS. The most important data has a second copy stored with pcloud - I've got a lifetime 2TB storage plan with them. I know lifetime accounts are kinda sketchy which is why it's just a secondary backup and not the primary one.

I don't have any "disposable" files like torrents though. All the stuff I back up are things like servers that run my websites and email, family photos, CDs I've ripped myself, etc. I've only got a few TB total.

ancoraunamoka ,

I am simple man s I use rsync.

Setup a mergerfs drive pool of about 60 TiB and rsync weekly.

Rsync seems daunting at first but then you realize how powerful and most importantly reliable it is.

It's important that you try to restore your backups from time to time.

One of the main reasons why I avoid softwares such as Kopia or Borg or Restic or whatever is in fashion:

  • they go unmantained
  • they are not simple: so many of my frienda struggled restoring backups because you are not dealing with files anymore, but encrypted or compressed blobs
  • rsync has an easy mental model and has extremely good defaults
mea_rah ,

FWIW restic repository format already has two independent implementations. Restic (in Go) and Rustic (Rust), so the chances of both going unmaintained is hopefully pretty low.

RootBeerGuy ,
@RootBeerGuy@discuss.tchncs.de avatar

Two questions, and please don't take it as criticism, I am just curious about rsync but also one point you make.

"They go unmaintained" seeing as Borg is in use for quite some time, how does this look safer for rsync? For me it looks like the risk for that is similar, but I might not know background of development for these.

Second question more something I am asking myself, a lot of people seem to use rsync for backing up, but it is not incremental backup, or is it? I saw some mention of a "time machine" like implementation of rsync, but then we are again at your argument it might go unmaintained as its a separate niche implementation, or does that main rsync support incremental backup? If not, are you not missing that, how do you deal with it when just a file changes? New copy of it being transferred or somehow else?

sloppy_diffuser ,

One method depends on your storage provider. Rsync may have incremental snapshots, but I haven't looked because my storage provider has it.

Sometimes a separate tool like rsnapshot (but probably not rsnapshot itself as I dont think its hard links interact well with rsync) might be used to manage snapshots locally that are then rsynced.

On to storage providers or back ends. I use B2 Backblaze configured to never delete. When a file changes it uploads the new version and renames the old version with a timestamp and hides it. Rsync has tools to recover the old file versions or delete any history. Again, it only uploads the changed files so its not full snapshots.

ancoraunamoka ,

how does this look safer for rsync? For me it looks like the risk for that is similar, but I might not know background of development for these.

Rsync is available out of the box in most linux distro and is used widely not only for backups, but a lot of other things, such as repository updates and transfers from file hosts. This means a lot more people are interested in it. Also the implementation, looking at the source code, is cleaner and easier to understand.

how do you deal with it when just a file changes?

I think you should consider that not all files are equal. Rsync for me is great because I end up with a bunch of disks that contain an exact copy of the files I have on my own server. Those files don't change frequently, they are movies, pictures, songs and so on.

Other files such as code, configuration, files on my smartphone, etc... are backup up differently. I use git for most stuff that fits its model, syncthing for my temporary folders and my mobile phone.

Not every file can suit the same backup model. I trust that files that get corrupted or lost are in my weekly rsync backup. A configuration file I messed up two minutes ago is on git.

RootBeerGuy ,
@RootBeerGuy@discuss.tchncs.de avatar

Thanks for elaborating, the part about the pictures and movies not changing makes a lot of sense actually. Thanks for sharing!

lemmyvore ,

As long as you understand that simply syncing files does not protect against accidental or malicious data loss like incremental backups do.

I also hope you're not using --delete because I've heard plenty of horror stories about the source dir becoming unmounted and rsync happily erasing everything on the target.

I used to use rsync for years, thinking just like you, that having plain old files beats having them in fancy obscure formats. I'm switching to Borg nowadays btw, but that's my choice, you gotta make yours.

rsync can work incrementally, it just takes a bit more fiddling. Here's what I did. First of all, no automatic --delete. I did run it every once in a while but only manually. The sync setup was:

  • Nightly sync source into nightly dir.
  • Weekly sync nightly dir into weekly dir.
  • Monthly tarball the weekly dir into monthly dir.

It's not bad but limited in certain ways, and of course you need lots of space for backups — or you have to pick and choose what you backup.

Borg can't really get around the space for backups requirement, but it's always incremental and between compression and deduplication can save you a ton of space.

Borg also has built-in backup checking and recovery parity which rsync doesn't, you'd have to figure out your own manual solution like par2 checksums (and those take up space too).

bandwidthcrisis ,

Re needing lots of space: you can use --link-dest to make a new directory with hard links to unchanged files in a previous backup. So you end up with de-duplicated incremental backups.
But borg handles all that transparently, with rsync you need to carefully plan relative target directory paths to get it to work correctly.

lemmyvore ,

Yeah Borg will see the duplicate chunks even if you move files around.

ancoraunamoka ,

As long as you understand that simply syncing files does not protect against accidental or malicious data loss like incremental backups do.

Can you show me a scenario? I don't understand how incremental backups cover malicious data loss cases

lemmyvore ,

Let's say you're syncing your personal files into another location once a day.

On Monday you delete files. On Tuesday you edit a file. On Wednesday you maybe get some malware that (unknown to you) encrypts some files (or all of them).

A week later you realize that things went wrong and you want the deleted files back, or the old versions of the file you edited, and of course you'd want back the files that the ransomware has encrypted.

If you simply sync files you have no way to get back deleted files. Every day it synced whatever was in there, overwriting what was there before. If you also sync deletions then sync deletes the files. If you don't sync deletions then files keep piling up when you delete them or you move them around.

An incremental backup system like borg looks at small file chunks, not at files. Whenever a file changes, it makes a copy of only the chunks in it that changed. That way it can give you the latest version of the file but also all the versions before, and it doesn't store the same file over and over, only the chunks that really changed, and only one of each chunk. If you move a file to another folder it still has the same chunks so borg stores that it moved but it doesn't store the chunks twice. Also if several files have identical chunks, those chunks are only stored once each. And of course it never deletes files unless you explicitly tell it to.

Borg can give you perfect recall of all past versions of every file, and can do it in a way that saves tremendous amounts of space (between avoiding the duplication of chunks and compression).

HumanPerson OP , (edited )

I was heavily considering borg but I just looked up rsync and it looks like everything I need. Thank you.

Edit: Actually encryption would also be nice. Is there any way to do that with rsync?

sloppy_diffuser ,

Yes. You compose a crypted vault over your storage vault. I pay about $1/mo for B2 Backblaze. Around 150G last I checked.

ancoraunamoka ,

what other people are saying, is that you rsync over an encrypted file system or other type of storages. What are your backup targets? in my case I own the disks so I use LUKS partition -> ext4 -> mergerfs to end up with a single volume I can mount on a folder

ShortN0te ,

One of the main reasons why I avoid softwares such as Kopia or Borg or Restic or whatever is in fashion:

  • they go unmantained
  • they are not simple: so many of my frienda struggled restoring backups because you are not dealing with files anymore, but encrypted or compressed blobs
  • rsync has an easy mental model and has extremely good defaults

Going unmaintained is a non issue, since you can still restore from your backup. It is not like a subscription or proprietary software which is no longer usable when you stop to pay for it or the company owning goes down.

The design of restic is quite simple and easy to understand. The original dev gave multiple talks about it, quite interesting.

Imho the additional features of dedup, encryption and versioning outweigh the points you mentioned by far.

ancoraunamoka ,

Going unmaintained is a non issue, since you can still restore from your backup. It is not like a subscription or proprietary software which is no longer usable when you stop to pay for it or the company owning goes down.

Until they hit a hard bug or don't support newer transport formats or scenarios. Also the community dries up eventually

ShortN0te ,

Until they hit a hard bug or don't support newer transport formats or scenarios. Also the community dries up eventually

That is why you test your backuo. It is unrealiatic, that in a stable software release there is suddenly, after you tested your backup a hard bug which prevents recovery.

Yes unmaintained software will not support new featueres.

I think you misunderstood me. You should not use unmaintained software as your backup tool, but IMO it is no problem when it suddenly goes unmaintained, your backup will most likely still work.
Same with any other software, that goes unmaintained, look for an alternative.

ancoraunamoka ,

It is unrealiatic, that in a stable software release there is suddenly, after you tested your backup a hard bug which prevents recovery.

How is unrealistic? Think of this:

  • day 1: you backup your files, test the backup and everything is fine
  • day 2: you store a new file that triggers a bug in the compression/encryption algorithm of whatever software you use, now backups are corrupted at least for this file
    Unless you test every backup you do, and consequently can't backup fast enough, I don't see how you can predict that future files and situations won't trigger bugs in a software
ShortN0te ,

We talk about software that is considered stable. That has verification checks for the backup. Used by thousands of ppl. It is unrealistic.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • selfhosted@lemmy.world
  • test
  • worldmews
  • mews
  • All magazines