Having an effective backup scheme is one of the hallmarks of any competent Linux administrator. We‘ve all been there – staring in disbelief at a failed drive that suddenly deleted months of critical configurations and data right before a big launch. While seasoned professionals have usually felt that sting early on and developed rigorous processes afterwards, it still catches many off guard after years of smooth sailing.
In this extensive guide, I‘ll share hard-won insights on how to implement robust and verifiable Arch Linux backups. We‘ll cover everything from foundational strategies with rsync to advanced integrity checking and offsite distribution methods. Follow these comprehensive best practices and sleep soundly knowing your data is safe.
An Introduction to Rsync Backups
The rsync utility has long been the swiss-army knife for flexible system-level file copying and synchronization across machines. The iconic utility was first released by Andrew Tridgell and Paul Mackerras in 1996 and has been a staple tool for administrators ever since.
Some core capabilities make rsync uniquely suited for backup use cases compared to simpler solutions like cp:
Efficient transfers – After the first baseline copy, rsync can detect identical files in subsequent runs and only transfer differences at the byte level without wasting bandwidth resending unchanged data. This makes scheduled backup jobs very fast.
Exactly preserved file attributes – Backed up mirrors retain original timestamps, ownership, permissions, etc. This facilitates flawless restoration of the full environment.
Verifiable transfers – All data copied with rsync is checked end-to-end via checksums to catch any corruption. More on this critical validation concept later.
Encrypted transport – rsync gracefully supports encryption during transfers for securing backups sent over the network to remote servers.
Precise control over inclusions/exclusions – Admins can define exactly which directories get backed up or ignored from the file tree. No surprise changes.
With capabilities tailored for common backup needs, rsync is often the first tool that comes to mind. Now let‘s see how to wield it effectively.
Configuring Your Backup Destination
The first practical decision is choosing the physical storage media where archives will be stored. This could be a directly connected spare hard disk or solid state drive, an internal drive in a NAS if available, an external USB drive, or even cloud object storage like S3.
For local destinations, I recommend using enterprise-grade drives with vibration sensors and dedicated backup duty cycles. Consumer hard drives fail surprisingly often – backup media endures constant read/write churn so investing in quality drives avoids headaches. Solid state drives are fantastic options given their shock resistance and performance.
I also strongly advise against using filesystems prone to silent data corruption like the dated ext2/3. Instead, choose self-validating choices like ZFS or Linux‘s native ext4. More details on protecting integrity later.
For demonstration, our example will backup to a mounted external USB hard disk formatted with ext4:
# fdisk /dev/sdb - Create GPT partition table
# mkfs.ext4 /dev/sdb1
# mkdir /backups
# mount /dev/sdb1 /backups
Adjust mount paths and disk devices accordingly. Now let‘s look at rsync flag essentials.
Key Rsync Backup Options
Here are some key options that should almost always be enabled for backup usage:
Archive mode (-a) – This recursevively copies all subfolders and files while preserving all original attributes like permissions and timestamps. This ensures full system fidelity is retained.
-z for compression – Smaller archives save storage space and bandwidth. Typical line-by-line text configuration files in /etc compress very well.
Excluding non-essential directories – We can explicitly ignore folders not needed in the backup like Linux /dev, /proc, and /sys which get rebuilt at boot anyway. Keeps copies lean.
Deleting extra files on destination (-–delete) – This risky but useful flag will delete anything in the destination archives that doesn‘t exist anymore on the live source filesystem. Carefully test first!
Test runs (–dry-run) – Always do a dry run without changes first to preview what rsync would transfer. Ensure excludes are correct.
Now let‘s employ these common flags to safely backup root:
Executing Your Initial Full Rsync Backup
First, simulate everything that would happen with a dry run:
$ sudo rsync -az --delete --dry-run --exclude={"/dev/*","/proc/*","/sys/*"} / /backups
building file list ... done
[LONG LISTING OF ALL FILES/FOLDERS THAT WOULD GET COPIED]
Carefully verify all data looks correct, /dev/ /proc/ /sys/ are excluded properly, users‘ /home directories are included, and no surprises with –delete.
Output should match source filesystem tree except for removes. If issues, fix excludes and test again before running for real.
Once the dry run outputs looks good, remove –dry-run and execute:
$ sudo rsync -az --delete --exclude={"/dev/*","/proc/*","/sys/*"} / /backups
[ACTUALLY STARTS COPYING DATA]
sent xx bytes received yy bytes zz.zzK bytes/sec
total size is aa.aaG speedup is bb.bb
rsync finished. all files successfully backed up.
Congratulations – you now have a full copy of the root filesystem archives in /backups automatically mirroring permissions, ownerships, and other metadata! Subsequent runs will only incrementally copy changes.
Implementing Automated Backups
While periodic manual rsync jobs work, creating persisting automated backups ensures consistency in the face of admin forgetfulness. We could employ the classic cron utility, but I prefer systemd timers for accuracy and granular control.
First, create a shell script that executes the rsync command (/root/backup.sh):
#!/bin/bash
rsync -az --delete --exclude={"/dev/*","/proc/*","/sys/*"} / /backups >> /var/log/backup.log
Then a systemd service file to invoke it (/etc/systemd/system/backup.service):
[Unit]
Description=Daily backup with rsync
[Service]
Type=simple
ExecStart=/root/backup.sh
And finally a systemd timer for scheduling (/etc/systemd/system/backup.timer):
[Unit]
Description=Daily backup timer
[Timer]
OnUnitActiveSec=1d
Unit=backup.service
[Install]
WantedBy=timers.target
Enable the timer so it persists on reboot:
# systemctl enable --now backup.timer
Created symlink from /etc/systemd/system/timers.target.wants/backup.timer to /etc/systemd/system/backup.timer.
This will now automatically run a daily incremental rsync backup of root! Tweak schedule as needed with OnUnitActiveSec or create additional timers.
Verifying Backup Integrity with Checksums
While the above achieves a recurring backup pipeline, simply copying files isn‘t enough. We must validate the integrity of archives through checksums to catch rare but inevitable data corruption issues.
The rsync utility already computes checksums during transfers for basic validation. However, more rigorous tools like rsync-verify, par2, and rdiff let us periodically scan archives for tampering or random bit errors called bit rot.
Filesystems like ZFS even bake automatic scrubbing into the protocol level with checksums on all data and metadata. This prevents scary silent corruption problems that otherwise evade detection – like database entries getting subtly altered over time.
Here is an example scan using par2 (a much faster rewrite of classic par1) verifying against multiple redundancy checksum files:
# par2 create backup-verification.par2 /backups
# par2 verify backup-verification.par2
Valid data for all files verified
No damage found
I recommend periodically generating newpar2 files when creating monthly or yearly archives as an extra sanity check. Beyond data corruption, these methods also validate no unauthorized tampering of backups or faulty drives writing nonsense.
Now that we are confident in integrity, let‘s discuss backup security.
Securing Backups with Encryption
Protecting physical backup media is prudent for privacy and security compliance. Even bare drives should be encrypted these days via LUKS containers to guard data at rest outside servers.
However, when sending backups over networks or uploading them to cloud providers, we need encryption integrated into the transfer process itself. Luckily, rsync natively supports the secure ssh protocol for end-to-end encrypted transport and authentication.
By passing the -e ssh option, rsync will invoke ssh to create encrypted tunnels between hosts for safe transit along the way. Verify at minimum 2048 bit RSA keys are configured on servers. Modern best practice is ECDSA521 bit keys as the new standard.
Here is an example push backup from localhost to a remote system using forced AES-256 CBC cipher encryption and rsync compression:
rsync -az -e "ssh -c aes256-cbc" /backups username@remote-server:/volume1/backups
If your organization requires FIPS 140-2 compliance, substitute cbc ciphers for fips compliant choices like aes-256-ctr.
The takeaway is rsync over ssh provides robust authentication as well as encryption for securing your backups in transit and at rest. Rotate keys periodically! Next let‘s discuss storage scalability.
Scaling Storage with Rotation Schemes
While daily incremental backups conserve space through hardlinks, storage needs grow over years of accumulating history. One strategy is simply adding more local disks or expanding cloud volumes dynamically when needed. Often cheaper than massive upfront allocation.
Another approach is implementing a rotation scheme that cycles through a fixed set of media. This also provides snapshots reflecting distinct periods over time. A common convention is called grandfather-father-son:
Grandfather: Monthly full backups retained for 2-5 years
Father: Weekly full backups held for 1-2 months
Son: Daily incremental backups kept for 1-2 weeks
By cascading full vs incremental backups across this hierarchy, you retain long term records as well as recent history within a predictable storage range. Automate the generation, validation, and rotation using simple shell scripts.
An additional tip for improving performance of daily jobs is disk short stroking – constraining backups to only use the high speed outer tracks of a drive. As rotational media RPMs increased over the years while platter sizes remained unchanged, the rim sustains much higher throughput. There are even short stroked enterprise SATA drives tuned for this approach.
Exploring Alternative Backup Utilities
While rsync is a venerable staple for good reason, other newer open source backup tools merit consideration as well:
Restic – Backups target data as flexible incremental snapshots stored in a repository. Supports encryption and portability. Minimizes duplicates via timestamps. written in Go.
Borg – Created for the European Organization for Nuclear Research (CERN) by physicists, Borg focuses on compressed deduplication and authentication checking. Handles filesystems and databases. Python-based.
Duplicity – Optimized for encrypted bandwidth-efficient backups to remote storage services. Adaptable to various cloud providers. Also powered by Python.
The landscape continues expanding with promising innovations tailored for specific use cases. While rsync serves traditional system backups well, evaluating these other solutions can be worthwhile for your environment depending on scale and distribution needs.
Monitoring Backup Jobs
Once backups are automated, attention often shifts elsewhere until something breaks. Stay vigilent against backup decay! Here are tactics to continually validate reliability:
Systemd notifications – Configure warning messages for the systemd service on failure or timeouts. These appear prominently on headless servers.
Prometheus metric collection – Record backup durations, sizes, and success metrics in Prometheus for visualization.
Dashboards and Grafana alerts – Import metrics into Grafana dashboards with thresholds to alert on anomalies.
Log monitoring – Tools like Promtail automatically gatherbackup job logs for filtering and alerts.
Armed with metrics and visibility, you can rapidly detect issues like failing drives or exceeded storage capacity to address promptly. Don‘t get lulled into complacency!
Offsite and Georedundant Strategies
Thus far we have focused on local destinations for archives. While certain businesses can survive with on-premise backups alone, many require offsite mirrors for resilience if entire datacenters go offline from fires, floods, or other regional disasters.
Shipping drives with backups to other offices seems antiquated now. Modern cloud storage mediates this nicely. My preferred tools for cloud backups are S3 compatible providers like Wasabi for cost-effectiveness paired with encryption tools like Duplicity or Restic. AWS S3 works fine but charges egress fees.
For servers spread across divergent regions, distribute backups across sites instead of funneling all to one central repository. This honors data residency laws in regulated industries as well. Periodically validate archives in all locations.
Going further, I suggest contractual SLAs with managed disaster recovery partners to restore cloud or physical backups at a secondary commercial site if company datacenters suffer extended outages. Test annually!
Validating Restore Procedures
The best laid backup plans mean nothing if archives cannot actually be restored when catastrophe strikes. Treat verification with the same discipline as running backups themselves:
- Procure identical spare hardware
- Quarterly, rebuild temporary servers or VMs
- Restore latest archives
- Validate correct data and configurations
- Record metrics like recovery time and size
By rehearsing routinely, your team avoids nasty surprises right when uptime matters most. Familiarity with restoration quirks can save precious hours. Ad hoc experiments fail.
I cannot stress enough the importance of testing recovery procedures before an actual crisis. Yes it requires non-trivial effort but proves well worth it after seeing how many moving pieces get forgotten over time. Enforce expectations through quarterly fire drills.
Some best practices around safer restores worth mentioning:
-
When restoring onto existing non-backed-up systems, use rsync‘s
--ignore-existingoption to prevent accidentally overwriting newer data created after the archives. -
For individual file recovery, use rsync inclusive excludes via
--include FILEwithout recursive flags instead of overlaying entire directories when feasible. Minimizes changes. -
For bare metal restores, lay down fresh Operating System installs first then rsync critical folders like
/etc,/opt,/root,/boot, and/varselectively.
Adjust workflows accordingly when staging environments to avoid transplant surprises. With rehearsal, this all becomes routine.
Closing Recommendations
I aimed to provide a comprehensive reference guide taking Arch Linux backups from basics to advanced integrity checking, automation, monitoring, and offsite replication. Apply the core principles here and satisfy even stringent backup objectives.
Before closing, let me underscore key guiding tenets:
- Verify, don‘t trust – Rigorously confirm backups persistently with tools like par2 and test restores quarterly. Trust but verify.
- Monitor backup health – Create visibility via dashboards and alerts to rapidly detect problems.
- Practice recovery drills – Rehearse full and selective restoration before you desperately need it.
- Encrypt everything – Secure data in transit and at rest without exceptions.
- Distribute offsite – Move archives off premises for resilience. Cloud storage is inexpensive.
Stay vigilant in following these guidelines and you will be prepared when disaster eventually strikes (and it will). Please reach out with any other questions!


