We made tests using 2 different version of the rsync tool.
The latest stable version is the version 2.6.3 this is also the actual version in Debian Sarge. We discovered a bug with that versions, where the transfer hangs when the verbosity is too important.
We ran some tests using the latest version (ie: 2.6.4-pre2) compiled from source. There was some improvements in this version but I switch back to the stable version.
Backup from apu to tony (2 servers located on the same switch using gigabyte ethernet controllers) using rsync 2.6.3 (debian sarge version).
There are 35Gb of files on apu and none on tony.
apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --timeout 1000 /backup/backuppc root@tony:/backup/backup-apu/ (....) backup/backuppc/pc/w3cstag5/3/fprofile-stagiaire/attrib => backup/backuppc/pc/w3cstag5/4/fprofile-stagiaire/attrib sent 38151666336 bytes received 17836400 bytes 2206010.85 bytes/sec total size is 184659474425 speedup is 4.84 real 288m22.328s user 100m34.107s sys 15m2.230s
We ran rsync a 2nd time as there was some changes during the night also we ensure that the data won't change during the transfer by stopping the backuppc process.
apu:~# /etc/init.d/backuppc stop Stopping backuppc: ok. apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --timeout 1000 /backup/backuppc root@tony:/backup/backup-apu/ building file list ... done io timeout after 1019 seconds - exiting rsync error: timeout in data send/receive (code 30) at io.c(153) real 97m43.888s user 0m53.909s sys 2m39.045s
It failed due to the rsync timeout option (which was set to 1000s).
As the remote server (ie: tony) was compiling its file list there was no transfer for more than 1000s so rsync stopped the transfer.
I experimented this option when we encountered network timeouts between apu and louie but it seems to have been resolved by using the ssh "BatchMode" option.
Form the RSync manpage:
--timeout=TIMEOUT
This option allows you to set a maximum I/O timeout in seconds.
If no data is transferred for the specified time then rsync will
exit. The default is 0, which means no timeout.
(...)
EXIT VALUES
30 Timeout in data send/receive
Trying again without the rsync timeout option.
apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids /backup/backuppc root@tony:/backup/backup-apu/ building file list ... done (...) backup/backuppc/cpool/6/7/7/6771af55e03a7e0c4214e933e4feacc4 => backup/backuppc/pc/mygale/46/f%2fhome/attrib sent 263265988 bytes received 31344 bytes 8788.74 bytes/sec total size is 184721733847 speedup is 701.57 real 499m18.153s user 1m21.650s sys 2m55.245s
Yoohoo !! This time it seems to work !!
It's taking twice more time as now rsync is computing the file list for both servers and then compare the 2 lists to see which files have changed
This new version released on March 15, 2005 is supposed to improve the management of hard links and the "Building file list" stage.
apu:~# time nice /usr/local/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --rsync-path=/usr/local/bin/rsync /backup/backuppc root@tony:/backup/backup-apu/ building file list ... done (...) /backup/backuppc/cpool/1/9/7/1978ee2b2025caf24c964e353cba2029 => backup/backuppc/pc/yoda/59/f%2fusr%2flocal%2f/attrib sent 1638186999 bytes received 1001435 bytes 42561.40 bytes/sec total size is 182296454549 speedup is 111.21 rsync error: some files could not be transferred (code 23) at main.c(780) real 641m53.215s user 5m16.200s sys 3m55.995s
I'm surprised that it's taking more time than the previous test with RSync 2.6.3.
I see 2 possible causes:
apu:~# time nice /usr/local/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids --rsync-path=/usr/local/bin/rsync /backup/backuppc root@tony:/backup/backup-apu/ (...) /backup/backuppc/pc/yoda/59/f%2fusr%2flocal%2f/fshare/fxml/ sent 231277621 bytes received 54308321 bytes 5210.61 bytes/sec total size is 182296454549 speedup is 638.32 real 913m28.074s user 1m25.535s sys 3m14.011s
apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids /backup/backuppc root@tony:/backup/backup-apu/ building file list ... done sent 175147810 bytes received 20 bytes 6014.59 bytes/sec total size is 182296454549 speedup is 1040.81 real 485m19.787s user 0m59.184s sys 2m38.756s
It's going faster with the old version of rsync but I'm not sure I understand correctly the "sent xxxxxxx bytes received xxxxxx bytes". I have a hard time believing that the new version which claims to have been improved regarding hardlinks is actually a lot slower. Definitely more tests are required.
With no files on the remote server (ie: louie.w3.org), this test successfully completed in approximately 27 hours.
This is due to a number of files which is always increasing and also due to the slow network connection between these 2 servers (linked by two 10Mbits/s routers doing QoS limitating our bandwith to only 4 Mbits/s)
apu:~# time nice /usr/bin/rsync --rsh="ssh -o BatchMode=yes" -avzRH --delete --numeric-ids /backup/backuppc root@louie:/ (...) backup/backuppc/cpool/0/4/a/04a5e661255bd96e99679542ab6205f6 => backup/backuppc/pc/tony/55/f%2ffilez/attrib sent 1544910085 bytes received 305984 bytes 25448.85 bytes/sec total size is 207814247858 speedup is 134.49 real 1011m58.010s user 4m46.761s sys 3m40.980s
To complete all my previous tests I had to stop backuppc, so between the full transfer and this test there were 6 days of changes applied in the backuppc files.
This test ran successfully in approximately 17hours.
[[ louie:~# time nice rsync -avzRH --delete --numeric-ids /backup/backuppc/ root@w3c4-bis.w3.org:/u/ building file list ... done sent 147238698 bytes received 20 bytes 6262.41 bytes/sec total size is 127215093253 speedup is 864.01 real 391m51.037s user 0m58.902s sys 16m44.631s louie:~# ]] # Stats from louie to w3c4-bis (both plugged on the same 100MBit/s switch in the 'externe' network). # 1- full rsync of /backup -> 11h20 # 2- another rsync where data exists on both side and are identical -> 5h30 # Stats from apu to louie (apu is plugged on our W3C network, louie is plugged on the INRIA 'externe' network). # 1- full rsync of /backup -> 25 hours