As a Linux system administrator, being able to seamlessly copy and synchronize files between servers is an essential skill. Whether copying user directories, replicating databases, or maintaining remote backups, the venerable rsync tool simplifies the process.
Developed in 1996 for efficiently mirroring data, rsync remains a staple in every admin‘s toolkit for good reason – its versatile utility for taming file operations. Let‘s dive into rsync mastery!
Understanding the rsync Utility
At its core, rsync provides fast incremental file transfer by minimizing data moved between source and target files/directories. It accomplishes this by using an algorithm called rolling checksum to quickly compare existing files and only transfer differences.
This makes rsync extremely efficient for copying new and changed files in very large directory structures. It is also resilient to interruptions and can be resumed rather than starting transfers from scratch.
Key capabilities include:
- Local file copying as well as remote transfers over ssh
- Preservation of symbolic links, permissions, ownership, timestamps etc
- Powerful include/exclude rules for precision file selection
- Can delete extraneous files on destination to mirror source
- Daemon mode for persistent backup recipient server
- Return values for scripting into larger automated workflows
Learning rsync is a rite of passage for Linux admins. While it is characterized by a sea of complex-looking options, we will break down practical use cases so you can gain confidence applying rsync for common needs.
Rsync Command Syntax Overview
The syntax structure for rsync commands takes the basic form:
rsync [options] [source] [destination]
Where classic use cases involve:
- Local file copying from one directory to another
- Remote server file transfer and synchronization
- Remote incremental backups from source server to destination
Common scenario examples:
# Local file copy
rsync -azvh /usr/local /backup
# Remote server file copy
rsync -azvh /home user@host:/backup/home
# Remote incremental backup
rsync -azvh --delete /data user@host:/backups/data
We will decipher more realistic examples in the sections below. But first, let‘s ensure rsync is installed and ready.
Installing rsync on Ubuntu
Current versions of Ubuntu and most other Linux distributions ship with rsync pre-installed. But if needed, use apt to install:
sudo apt update
sudo apt install rsync
Verify with:
rsync --version
# rsync version 3.1.3 (protocol 31)
With rsync installed, let‘s unpack some key options.
Understanding rsync Options
With 30+ command options available, rsync functionality is extremely flexible but the abundant options can seem overwhelming.
Let‘s demystify some of the commonly used ones:
Archive mode (-a):
- Recursively transfer files while preserving symbolic links, permissions, ownership, timestamps, etc.
- Essential for maintaining an exact mirror backup copy.
Verbose mode (-v):
- Increase verbosity to monitor the transfer progress.
- Use
-vvor-vvvfor even more detailed logs.
Compress (-z):
- In-transit file compression for faster transfer of remote data.
Delete (–delete):
- Remove extraneous files from destination dirs that no longer exist on sender.
- Important for synchronizing mirror copies and removing outdated backups. Exercise caution using this recursively.
Bandwidth limit (–bwlimit):
- Limits transfer speed in kilobytes/second. Useful for lowering impact if running over metered network links.
Exclude (–exclude):
- Specify a pattern list of files/dirs to exclude from transferring. Crucial for fine-tuning backups.
Stats log (–log-file=FILE):
- Output verbose statistics to log file for analyzation.
There are many more options worth reviewing via man rsync once you are comfortable with the basics. But we are now armed with enough to demonstrate practical examples!
Copying Local Files with rsync
A simple way to start harnessing rsync is copying local files from one directory to another on your filesystem.
For example, let‘s copy our Downloads folder to an external USB drive plugged in at /media/backups.
rsync -azvh /home/user/Downloads/ /media/backups
Breaking down the key parts:
Source (-):
/home/user/Downloads/: Path to files being copied from
Destination ():
/media/backups: Path to location files will copy to
Options:
-aarchive mode preserves metadata-zcompress for faster transfer-vverbose output for monitoring-hhuman-readable file sizes
This will recursively copy all contents from the Downloads folder to our backup USB drive while showing verbose output like:
building file list ... done
./
file1
file2
file3
...
Total transferred file size: 9.32M bytes
...
sent 138 bytes received 122 bytes 142.00 bytes/sec
total size is 15,238 bytes speedup is 67.75 (DRY RUN)
Note a few bits from the output:
- Total transferred file size – total size of changes copied over
- speedup – the ratio of transfer speed vs raw copy speed
Because rsync leverages rolling checksums to only transfer differences, we see the speedup demonstrating major gains!
Now let‘s explore more use cases.
Remote Server Backups with rsync
Where rsync truly excels is doing data backups and transfers between remote servers rather than just local. The ability to mirror directory structuresmakes rsync well-suited for maintaining offsite copies.
For demonstration, we will backup key data from our web server web1 onto a separate host backup1 located at IP 192.168.1.150.
There are two common methods for enabling remote server rsync:
1. Rsync over Remote Shell
The most ubiquitous rsync method utilizes the rsync:// protocol along with remote shell access – typically ssh. This allows securely contacting any remote server accessible over the network.
General syntax for rsync file transfer over remote shell:
rsync [option...] /path/to/source remoteuser@remotehost:/remote/destination
As an example, let‘s do a full recursive mirror copy of our web1 codebase to backup1:
rsync -azh --delete /var/www webuser@192.168.1.150:/backups/web1/
Now everything under /var/www on web1 will copy over to /backups/web1 path on backup1 via ssh.
Key points of note:
- Trailing
/on source path copies contents only rather than the whole directory --deleteoption cleans up stale file leftovers from previous backups
For scheduled backup scripts, it‘s wise to setup ssh public key authentication between hosts rather than relying on password logins.
2. Rsync Daemon Method
An alternative approach is configuring the destination host to run a persistent rsync daemon process. This listens on a defined TCP port allowing source servers to directly contact it without ssh.
On the backup1 recipient server, edit rsyncd.conf with your modules and parameters. Then launch the daemon:
/etc/rsyncd.conf
[backups]
path = /backups
read only = yes
Start daemon:
rsync --daemon --config=/etc/rsyncd.conf
Once running, rsync clients can push data to the daemon server like:
web1 source server:
rsync [options] /local/path/ backup1::backups
The daemon method has some advantages such as avoiding remote ssh and credentials per transfer. The downside is needing to open firewall ports.
Automating Incremental Backups
One of rsync‘s major advantages over other mirroring tools is built-in support for incremental backups. By calculating differences at the file level before transfer, rsync minimizes bandwidth needs for ongoing backup tasks.
It determines changes through its rolling checksum algorithm comparing files byte-by-byte. When a difference is discovered, only the changed bytes transfer.
Let‘s look at an example backup script automating incremental copies:
/home/user/bin/backup.sh
#!/bin/bash
# Backup script via rsync
# Config
SRC=/home # Source dir to backup
USER=remoteuser
HOST=192.168.50.10
DEST=/backups/$HOST
# Create logfile
LOG="$(date +%Y-%m-%d)_backup.log"
# Rsync opts
RSYNC_OPTS="-azh --del --stats --log-file=$LOG"
echo "*** Daily backup from $(hostname) started ***" >> $LOG
# Initial full backup
if [[ ! -d $DEST ]]; then
echo "Running initial full backup..."
rsync $RSYNC_OPTS $SRC $USER@$HOST:$DEST >> $LOG
else
# Incremental backup
echo "Running daily incremental backup..."
rsync $RSYNC_OPTS --ignore-existing $SRC $USER@$HOST:$DEST >> $LOG
fi
echo "*** Backup completed at $(date) ***" >> $LOG
The key points that enable incrementals:
- First initial seed backup transfers full data
- Subsequently only changed files copied with
--ignore-existing - Log stats to analyze over time
We could then schedule this daily using cron. The same logic can extend to hourly backups of critical data silos as needed.
Bidirectional Sync with Rsync
A common scenario is maintaining an active-passive server pair with mirrored data between them – for example dual webheads or a hot standby database replica.
Rsync simplifies keeping files synchronized bidirectionally through its double colon :: path delimiter.
For example on web1 as the primary:
rsync -azvv --delete /var/www/ ::stanby1.ex:/var/www/
And the reverse direction sync configured on standby1:
rsync -azvv --delete /var/www ::web1.ex:/var/www/
With this any files changed on either server will automatically update the counterpart location. Very useful for keeping high availability clusters in sync!
Common rsync Pitfalls & Issues
While extremely powerful, rsync does come with some best practices worth calling out:
1. Accidental large file deletions
Using --delete can lead to data loss if sync directories misconfigured. Always test first without delete option enabled before trusting live. Some tips:
- Start
--dry-runfirst to preview impact - Use
--max-delete=10to limit deleted file count - Setup exclude rules to protect files from removal
2. TimestampTruncate and daylight savings time (DST)
Filesystems such as HFS+ on OSX that truncate timestamp precision can confuse rsync change detection around DST clock changes. Workaround by using --modify-window=1 and run between 2-3 AM local server time.
3. Resuming aborted transfers
If a file transfer aborts mid-stream, rerun using same command with --partial flag and rsync will resume from interrupted point rather than starting over.
4. Remote shell locale confusion
If rsync throws weird character encoding errors during remote transfers, force LANG=C before rsync call. Locale environmental differences can otherwise corrupt file data.
Conclusion & Next Steps
In closing, hopefully this guide has equipped you with both a broad conceptual grasp and readily applicable examples for harnessing rsync. Mastery of rsync will enable you to slash time previously lost to clumsy data copying operations.
Some recommended next steps:
- For protecting backups, combine rsync with client-side encryption tools like gocryptfs before sending data over the wire
- To prevent accidental data destruction, leverage filesystem snapshots on destination backup volumes
- Build redundancy by cascading backups across multiple remote servers
- Containerize rsync as a Docker image for easier portability and availability
- For insights into rsync runtime performance, analyze the output stats logs using tools like rsyncstats
What are your favorite use cases or optimizations for rsync? I welcome any feedback for improving this guide!


