W3C W3C Systems Team


Tests of backup-pool (script that backup the backuppc files using RSync)

Rsync versions

We made tests using 2 different version of the rsync tool.

The latest stable version is the version 2.6.3 this is also the actual version in Debian Sarge. We discovered a bug with that versions, where the transfer hangs when the verbosity is too important.

We ran some tests using the latest version (ie: 2.6.4-pre2) compiled from source. There was some improvements in this version but I switch back to the stable version.

Transfer from hosts plugged in the same switch

The first time, with no files on the destination server (tony)

Backup from apu to tony (2 servers located on the same switch using gigabyte ethernet controllers) using rsync 2.6.3 (debian sarge version).
There are 35Gb of files on apu and none on tony.

apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --timeout 1000 /backup/backuppc root@tony:/backup/backup-apu/
(....)
backup/backuppc/pc/w3cstag5/3/fprofile-stagiaire/attrib => backup/backuppc/pc/w3cstag5/4/fprofile-stagiaire/attrib

sent 38151666336 bytes  received 17836400 bytes  2206010.85 bytes/sec
total size is 184659474425  speedup is 4.84

real    288m22.328s
user    100m34.107s
sys 15m2.230s

rsyncing the changes

We ran rsync a 2nd time as there was some changes during the night also we ensure that the data won't change during the transfer by stopping the backuppc process.

apu:~# /etc/init.d/backuppc stop
Stopping backuppc: ok.
apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --timeout 1000 /backup/backuppc root@tony:/backup/backup-apu/
building file list ... done
io timeout after 1019 seconds - exiting
rsync error: timeout in data send/receive (code 30) at io.c(153)

real    97m43.888s
user    0m53.909s
sys 2m39.045s

rsyncing the changes (2nd attempt) without the rsync timeout option

It failed due to the rsync timeout option (which was set to 1000s).
As the remote server (ie: tony) was compiling its file list there was no transfer for more than 1000s so rsync stopped the transfer.
I experimented this option when we encountered network timeouts between apu and louie but it seems to have been resolved by using the ssh "BatchMode" option.

Form the RSync manpage:

 --timeout=TIMEOUT
              This option allows you to set a maximum I/O timeout in  seconds.
              If no data is transferred for the specified time then rsync will
              exit. The default is 0, which means no timeout.
(...)
EXIT VALUES
       30     Timeout in data send/receive

Trying again without the rsync timeout option.

apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids /backup/backuppc root@tony:/backup/backup-apu/
building file list ... done
(...)
backup/backuppc/cpool/6/7/7/6771af55e03a7e0c4214e933e4feacc4 => backup/backuppc/pc/mygale/46/f%2fhome/attrib

sent 263265988 bytes  received 31344 bytes  8788.74 bytes/sec
total size is 184721733847  speedup is 701.57

real    499m18.153s
user    1m21.650s
sys 2m55.245s

Yoohoo !! This time it seems to work !!
It's taking twice more time as now rsync is computing the file list for both servers and then compare the 2 lists to see which files have changed

same test but with rsync 2.6.4pre3

This new version released on March 15, 2005 is supposed to improve the management of hard links and the "Building file list" stage.

apu:~# time nice /usr/local/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --rsync-path=/usr/local/bin/rsync /backup/backuppc root@tony:/backup/backup-apu/
building file list ... done
(...)
/backup/backuppc/cpool/1/9/7/1978ee2b2025caf24c964e353cba2029 => backup/backuppc/pc/yoda/59/f%2fusr%2flocal%2f/attrib

sent 1638186999 bytes  received 1001435 bytes  42561.40 bytes/sec
total size is 182296454549  speedup is 111.21
rsync error: some files could not be transferred (code 23) at main.c(780)

real    641m53.215s
user    5m16.200s
sys     3m55.995s

I'm surprised that it's taking more time than the previous test with RSync 2.6.3.
I see 2 possible causes:

  1. As I ran backuppc just before, the number of file is higher, so the "Building file list" stage it's taking more time (I think the speed transfer difference is not the cause because we only transfered a few files, which was already the case in the previous test, and it's done on a very fast network). Possible but the difference is quiet important.
  2. This version of RSync is just slower than the previous one. In order to test that, I'm going to run a new test, where I'm going to run this command with each version of RSync insuring there was no modification on the source and remote files (see the next test).

Speed comparaison of the "Building file list" stage between rsync 2.6.3 & 2.6.4pre3

with rsync 2.6.4pre3

apu:~# time nice /usr/local/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --rsync-path=/usr/local/bin/rsync /backup/backuppc root@tony:/backup/backup-apu/
(...)
/backup/backuppc/pc/yoda/59/f%2fusr%2flocal%2f/fshare/fxml/

sent 231277621 bytes  received 54308321 bytes  5210.61 bytes/sec
total size is 182296454549  speedup is 638.32

real    913m28.074s
user    1m25.535s
sys     3m14.011s

with rsync 2.6.3

apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids /backup/backuppc root@tony:/backup/backup-apu/
building file list ... done

sent 175147810 bytes  received 20 bytes  6014.59 bytes/sec
total size is 182296454549  speedup is 1040.81

real    485m19.787s
user    0m59.184s
sys     2m38.756s

It's going faster with the old version of rsync but I'm not sure I understand correctly the "sent xxxxxxx bytes received xxxxxx bytes". I have a hard time believing that the new version which claims to have been improved regarding hardlinks is actually a lot slower. Definitely more tests are required.

Transfer from hosts in the W3C network and the INRIA "Externe" network

The first time, with no files on the destination server (louie)

With no files on the remote server (ie: louie.w3.org), this test successfully completed in approximately 27 hours.
This is due to a number of files which is always increasing and also due to the slow network connection between these 2 servers (linked by two 10Mbits/s routers doing QoS limitating our bandwith to only 4 Mbits/s)

rsyncing the changes

apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids /backup/backuppc root@louie:/
(...)
backup/backuppc/cpool/0/4/a/04a5e661255bd96e99679542ab6205f6 => backup/backuppc/pc/tony/55/f%2ffilez/attrib

sent 1544910085 bytes  received 305984 bytes  25448.85 bytes/sec
total size is 207814247858  speedup is 134.49

real    1011m58.010s
user    4m46.761s
sys     3m40.980s

To complete all my previous tests I had to stop backuppc, so between the full transfer and this test there were 6 days of changes applied in the backuppc files.
This test ran successfully in approximately 17hours.

Previous tests

[[
louie:~# time nice rsync -avzRH --delete --numeric-ids /backup/backuppc/ root@w3c4-bis.w3.org:/u/
building file list ... done

sent 147238698 bytes  received 20 bytes  6262.41 bytes/sec
total size is 127215093253  speedup is 864.01

real  391m51.037s
user  0m58.902s
sys   16m44.631s
louie:~#
]]
# Stats from louie to w3c4-bis (both plugged on the same 100MBit/s switch in the 'externe' network).
# 1- full rsync of /backup              ->      11h20
# 2- another rsync where data exists on both side and are identical     ->      5h30

# Stats from apu to louie (apu is plugged on our W3C network, louie is plugged on the INRIA 'externe' network).
# 1- full rsync of /backup              ->      25 hours

Valid XHTML 1.0! W3C Europe Systems Team <team-weu-system@w3.org>
$Id: RSync-tests.html,v 1.6 2005/03/24 15:45:42 vivien Exp $