22 April, 2015

Use rsync instead of scp to resume copying big data files!

For my dissertation I am conducting experiments on big data sources, such as 10k time series with 100k+ records each. The corresponding files comprise several gigabytes of data. Copying such files may take very long, since I work from a remote location, not sitting next to the data centers where the data is to be processed. Therefore, I need to be able to resume big data file uploads to the machines of the data centers.

I usually use scp to copy files between machines:
scp data/*.csv
Unfortunately, scp can't resume any previous file transfers. However, you can use rsync with ssh to be able to resume:
rsync --rsh='ssh' -av --progress --partial data/*.csv \ 
If you cancel the upload, e.g., via CTRL+C, yo can later resume the upload using the --partial option for rsync.

Very simple. No GUI tools required. Ready for automation.

No comments: