Create a backup server with GNU Linux and rsync

All GNU Linux users know the importance of creating a backup in terms of the server and it is best to make a backup copy of files in GNU Linux using rsync.

Rsync is a tool used to synchronize files and directories in GNU Linux, so it is very useful when making backup copies. To copy a file for example we write:

rsync -v filename backup filename

rsync -v filename backup filename rsync -v filename backup filename

rsync -v filename backup filename
rsync -v filename backup filename

The next time we execute that same command, it will copy the file only if it was modified.

If we want to make a backup copy of a directory:

rsync -av directory_name/ backup_directory_name

rsync -av directory_name/ backup_directory_name

If for example we delete a file from the directory, rsync will not delete it in the backup directory unless we use the –delete option:

rsync -av --delete directory_name/ backup_directory_name

rsync -av --delete directory_name/ backup_directory_name

If we want to make a copy on a remote computer using ssh:

rsync -av --delete directory_name/ user@remote_computer_IP:/ backup_directory_name/

rsync -av --delete directory_name/ user@remote_computer_IP:/ backup_directory_name/

Or otherwise:

rsync -av - remove user@remote_computer_IP:/ backup_directory_name/ directory_name/

rsync -av - remove user@remote_computer_IP:/ backup_directory_name/ directory_name/

Lastly, GRsync is one of the rsync graphical interfaces on GNU / Linux. We install it with:

sudo apt-get install grsync

sudo apt-get install grsync

For more information at http://www.vicente-navarro.com/blog/2008/01/13/backups-con-rsync/

Backups with rsync

What is your laptop is stolen one day along with all your precious data?

I have mentioned that I like to do the backups with rsync. I like to use it for both local backups (copying files from one directory to another on the same system) and remote (copying files from one system to another), and both in Linux and Windows using Cygwin without problem, to make remote backups using one or the other. another as the backup destination.

Rsync is a tool to synchronize the files and directories that we have stored in one place in a different one, minimizing the transfer of data.

At the level of a directory tree with its files, the idea is simple. Rsync will copy those files and directories as they were in the new site but without copying everything, but only what has changed in the origin with respect to the destination. Doing the same copying the files and directories, even remotely using a shared folder, would be equivalent if we only look at the result, but we have to transfer much more information.

At the level of individual files, we can imagine a very large file (for example, several GiBs) in a database. If we wanted to make a backup of it without having tools like rsync, we would have to copy it every time, when in fact in many cases the vast majority of blocks in the file will not have changed. rsync, on the other hand, parses the file at source and destination and only transmits (in compressed form, moreover) the parts that have actually changed.

In any case, if the previous little hits don’t stop us, we will see that rsync is, in short, an excellent command line utility for making local and remote backups.

The list of special features is:

Support for copying links, device files, owners, groups and permissions
Exclusion options (exclude and exclude-from) similar to GNU tar
CVS mode to ignore files that CVS would ignore
Any transparent remote shell can be used, like ssh or rsh
You do not need to be root to use it
pipelining of transferred files to minimize latency
Support for anonymous or authenticated users using the rsync daemon (ideal for mirroring)

In its simplest form of use, it is similar to a cp. If we want to synchronize a file in another we can simply do (the -v is to show us more detailed information about what it does):

$ ll file1

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1

$ rsync -v file1 file2

file1

sent 7626448 bytes received 42 bytes 15252980.00 bytes / sec

total size is 7625431 speedup is 1.00

$ ll file?

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:41 file2

$ ll file1 -rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1 $ rsync -v file1 file2 file1 sent 7626448 bytes received 42 bytes 15252980.00 bytes / sec total size is 7625431 speedup is 1.00 $ ll file? -rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1 -rw-r ----- 1 vicente users 7625431 2008-01-13 11:41 file2

$ ll file1

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1

$ rsync -v file1 file2

file1

sent 7626448 bytes received 42 bytes 15252980.00 bytes / sec

total size is 7625431 speedup is 1.00

$ ll file?

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:41 file2

But if we execute the command from another user (in the example root), we see that the user is not being maintained, although the permissions are, and that even the time is different:

# rsync file1 file3

# ll file?

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:41 file2

-rw-r ----- 1 root root 7625431 2008-01-13 11:44 file3

# rsync file1 file3 # ll file? -rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1 -rw-r ----- 1 vicente users 7625431 2008-01-13 11:41 file2 -rw-r ----- 1 root root 7625431 2008-01-13 11:44 file3

# rsync file1 file3

# ll file?

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:40 file1

-rw-r ----- 1 vicente users 7625431 2008-01-13 11:41 file2

-rw-r ----- 1 root root 7625431 2008-01-13 11:44 file3

And it is not able to do anything with directories either:

$ rsync dirA dirB

skipping directory dirA

$ rsync dirA dirB skipping directory dirA

$ rsync dirA dirB

skipping directory dirA

Therefore, for backup purposes, rsync in most cases is used with the -a option:

-a, –archive archive mode; same as -rlptgoD (not -H, -A)

This option combines the parameter -r so that it goes through the entire directory structure that we indicate, the -l so that it copies symbolic links as symbolic links, the -p so that it maintains the permissions, the -t so that the time is kept of the file, the -g to keep the group, the -o to keep the owner, the -D to keep the device files (for root only). Neither the hard links (-H) nor the ACLs (-A) are maintained by default. In short, with the -a option we obtain an exact copy of a hierarchy of files and directories.

Let’s see an example of synchronizing a directory called dirA that contains other directories and files in another called dirB that, at the moment, does not exist yet:

$ rsync -av dirA/ dirB/

building file list ... done

created directory dirB

will tell/

dirA/ file1

dirA/ file2

dirA/ dirA1/

dirA/ dirA1/ file3

dirA/ dirA2/

dirA/ dirA2/ file4

sent 6540542 bytes received 126 bytes 13081336.00 bytes / sec

total size is 6539349 speedup is 1.00

$ rsync -av dirA/ dirB/ building file list ... done created directory dirB will tell/ dirA/ file1 dirA/ file2 dirA/ dirA1/ dirA/ dirA1/ file3 dirA/ dirA2/ dirA/ dirA2/ file4 sent 6540542 bytes received 126 bytes 13081336.00 bytes / sec total size is 6539349 speedup is 1.00

$ rsync -av dirA/ dirB/

building file list ... done

created directory dirB

will tell/

dirA/ file1

dirA/ file2

dirA/ dirA1/

dirA/ dirA1/ file3

dirA/ dirA2/

dirA/ dirA2/ file4

sent 6540542 bytes received 126 bytes 13081336.00 bytes / sec

total size is 6539349 speedup is 1.00

If now we modify only one of the files a little and we execute the exact same command again, we will see that this time only the modified file is copied:

$ echo test >> dirA/ file1

$ rsync -av dirA dirB

building file list ... done

file1

sent 65884 bytes received 42 bytes 131 852.00 bytes / sec

total size is 6539356 speedup is 99.19

$ echo test >> dirA/ file1 $ rsync -av dirA dirB building file list ... done file1 sent 65884 bytes received 42 bytes 131 852.00 bytes / sec total size is 6539356 speedup is 99.19

$ echo test >> dirA/ file1

$ rsync -av dirA dirB

building file list ... done

file1

sent 65884 bytes received 42 bytes 131 852.00 bytes / sec

total size is 6539356 speedup is 99.19

However, we see that even though the file is only slightly different, rsync copies the entire file each time:

$ rm file2

$ rsync -av file1 file2

file1

sent 7626462 bytes received 42 bytes 15253008.00 bytes / sec

total size is 7625445 speedup is 1.00

$ echo test >> file1

$ rsync -av file1 file2

file1

sent 7626469 bytes received 42 bytes 15253022.00 bytes / sec

total size is 7625452 speedup is 1.00

$ rm file2 $ rsync -av file1 file2 file1 sent 7626462 bytes received 42 bytes 15253008.00 bytes / sec total size is 7625445 speedup is 1.00 $ echo test >> file1 $ rsync -av file1 file2 file1 sent 7626469 bytes received 42 bytes 15253022.00 bytes / sec total size is 7625452 speedup is 1.00

$ rm file2

$ rsync -av file1 file2

file1

sent 7626462 bytes received 42 bytes 15253008.00 bytes / sec

total size is 7625445 speedup is 1.00

$ echo test >> file1

$ rsync -av file1 file2

file1

sent 7626469 bytes received 42 bytes 15253022.00 bytes / sec

total size is 7625452 speedup is 1.00

It is not that there is any defect in its algorithm, it is that for local use, rsync uses the -W option by default, since it considers that the effort in calculating the difference between the files is greater than directly copying the entire file:

-W, –whole-file copy files whole (without rsync algorithm)

If we counter -W with –no-whole-file we will see that now it only copies the block where it has found the change:

$ echo test >> file1

$ rsync -av --no-whole-file file1 file2

building file list ... done

file1

sent 13514 bytes received 16620 bytes 20089.33 bytes / sec

total size is 7625459 speedup is 253.05

$ echo test >> file1 $ rsync -av --no-whole-file file1 file2 building file list ... done file1 sent 13514 bytes received 16620 bytes 20089.33 bytes / sec total size is 7625459 speedup is 253.05

$ echo test >> file1

$ rsync -av --no-whole-file file1 file2

building file list ... done

file1

sent 13514 bytes received 16620 bytes 20089.33 bytes / sec

total size is 7625459 speedup is 253.05

And if we use the -z option above, it will compress the block before passing it:

$ echo test >> file1

$ rsync -avz --no-whole-file file1 file2

building file list ... done

file1

sent 843 bytes received 16620 bytes 34926.00 bytes / sec

total size is 7625466 speedup is 436.66

$ echo test >> file1 $ rsync -avz --no-whole-file file1 file2 building file list ... done file1 sent 843 bytes received 16620 bytes 34926.00 bytes / sec total size is 7625466 speedup is 436.66

$ echo test >> file1

$ rsync -avz --no-whole-file file1 file2

building file list ... done

file1

sent 843 bytes received 16620 bytes 34926.00 bytes / sec

total size is 7625466 speedup is 436.66

Using the -z option can be beneficial or detrimental, since less data transfer results in higher CPU consumption.

By the way, what is rsync based on to decide that a file has changed? Normally it only looks at the date of the file and its size, so if neither of the two things changes, by default rsync will not copy the file. It is very rare that two files with the same date and size are different, but it can happen. If in our environment that case can be given, we will have to use the -c option so that it is determined by CRC if the files are really the same:

-c, –checksum skip based on checksum, not mod-time & size

But of course, this will also significantly increase CPU usage.

The slash at the end of directory names, regarding how to pass the names of the directories, we must pay special attention to whether we put a slash at the end of the name of the directory or not, since they mean different things.