How to Use rsync to Backup Files
Copy-pasting could be a straightforward approach to backing up important files and directories. But rsync takes this process further. This page describes the basic usage of the rsync
utility.
- Updated
- September 3, 2024
- Created
- July 14, 2020
Why Simple Copy-Pasting May Not Be Enough
Imagine a folder with important data and an empty external hard drive for backups. Creating a backup of the folder is easy, right? Right-click on the folder, select Copy, right-click on the empty hard drive window and select Paste. Wait until the transfer completes, and you are done. This is what many people in my family seem to do.
While the previous approach works, it is not perfect for incremental backups. Imagine you change the contents of the important folder after it was copied to the backup hard drive. Now you may want to back up the important folder again. Copy-paste, and you will be greeted with a message telling you that a folder with the same name already exists on the destination hard drive. Usually, you can decide whether to skip files with the same names or overwrite them. It does not tell you whether the content of the files differs. You could either end up re-writing all files over again or skipping files whose content may have changed since the previous backup.
If the cp
command was used, existing files would be overwritten without asking you anything.
Backup Files With rsync
I will cover only the basic functionality of rsync here. For more options, try man rsync
.
Read the man pages (man rsync
) before using any commands you don't understand. Incorrect usage of any of the parameters could result in a data loss!
First-time Backup
Let's say I want to back up a data directory to an external hard drive mounted at /mnt/BHD. I want to preserve permissions and other attributes, so I will use the -a
switch. To avoid permission errors, I use sudo
.
sudo rsync -a ./data /mnt/BHD/
Note that there is no forward slash after the source data directory path! This will copy the whole directory. A forward slash after the destination path is arbitrary.
Future Backups
I prefer to perform the backup in three steps.
First, copy only the new files that do not exist in the destination and skip those that exist. Note that now there is a forward slash after the data directory path. Also, the data directory is specified in the destination path (here the slash is arbitrary). The first command only outputs the files that will be copied without copying anything. The second command will copy the files.
sudo rsync -anv --ignore-existing ./data/ /mnt/BHD/data/
sudo rsync -a --ignore-existing ./data/ /mnt/BHD/data/
Second, copy all remaining files that exist in the destination but have been changed since. This ignores files that have not changed, which is the biggest advantage over plain
cp
or copy-paste.sudo rsync -anv ./data/ /mnt/BHD/data/
sudo rsync -a ./data/ /mnt/BHD/data/
Lastly, remove files from the destination that do not exist in the data directory anymore.
sudo rsync -anv --delete ./data/ /mnt/BHD/data/
sudo rsync -a --delete ./data/ /mnt/BHD/data/
Compare Contents of Two Directories
To quickly compare the contents of two directories, data and /mnt/BHD/data, use the following command. It uses file modification times to decide whether a file has been changed. It will print all files that differ.
sudo rsync -anvi --delete ./data/ /mnt/BHD/data/
To compare the contents of the files (instead of their names and modification times), use the following command. It calculates and compares checksums of the data. This is a reliable method to discover if files differ in at least a single bit. It will take a very long time though because the whole source and destination files need to be read in order to determine whether they differ.
sudo rsync -anvci --delete ./data/ /mnt/BHD/data/
Bonus: Preserve More Attributes, Copy Sparse Files, Show Progress
The heading says it all.
sudo rsync -aAXHS --delete --info=progress2 ./data/ /mnt/BHD/data/
Bonus: Filter Extended Attributes
It is possible to tell rsync to ignore some extended attributes if the -X
(preserve extended attributes) parameter is specified.
- Ignore SELinux labels on files:
--filter='-x security.selinux'
. - Ignore BTRFS attributes like file compression:
--filter='-x btrfs.compression'
. - Ignore all BTRFS attributes:
--filter='-x btrfs.*'
Bonus: Copy Only the Changed Parts of Files
By default, rsync copies the whole file. In situations when only the part of a file that is different between the source and the destination should be copied, use --inplace
together with --no-whole-file
. This is useful for backing up large files on CoW file systems. With these parameters, only the difference between the source file and the destination file is copied, the unchanged part of the file remains intact.
Bonus: Rsync from macOS to Linux
First, I will mention that macOS ships with a very old version of rsync. This version lacks some features. Use Homebrew to install the latest rsync.
The --iconv=utf-8-mac,utf-8
parameter is very useful if the file names contain accented characters. This is because macOS uses UTF-8 NFD (decomposed) normalization form, while Linux uses UTF-8 NFC (composed) normalization form. This means that a UTF-8 string that contains accented letters is encoded into different bytes on macOS and on Linux, even though the text is displayed the same.
rsync -anv --delete --iconv=utf-8-mac,utf-8 --exclude='.DS_Store' /Users/david/FILES/ pc4:macos-files/FILES/
Rsync on Linux doesn't have the utf-8-mac character encoding available, so the conversion can't be used when the rsync transfer is initiated from the Linux host. In that case, the file names can be converted after they have been transferred to Linux:
convmv -r --nfc -f utf8 -t utf8 --notest FILES/