I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.
In order to sync those files, I have been using rsync command as follows:
rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/
The contents of proj.lst are as follows:
+ proj1
+ proj1/*
+ proj1/*/*
+ proj1/*/*/*.tar
+ proj1/*/*/*.pdf
+ proj2
+ proj2/*
+ proj2/*/*
+ proj2/*/*/*.tar
+ proj2/*/*/*.pdf
...
...
...
- *
As a test, I picked up two of those projects (8.5GB of data) and executed the command above. Being a sequential process, it took 14 minutes and 58 seconds to complete. So, for 1.2TB of data, it would take several hours.
If I would could have multiple rsync processes in parallel (using &, xargs or parallel), it would save me time.
I tried with below command with parallel (after cding to the source directory) and it took 12 minutes 37 seconds to execute:
parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .
This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.
How can I run multiple rsync processes in order to reduce the execution time?
rsyncs is our first priority.rsyncs in parallel is the primary focus now.