2

I am copying files from remote server to my local server using below scp command. I just type below command on the terminal and it starts copying.

scp -r user@machineA:/data/process/* /data/process/

Now since on remote servers we have around 100 files and each file size is around 11 GB so above command will copy one file at a time. Is there any way by which I can copy 5 files at a time in parallel with some command that I can run directly on the terminal?

I also have GNU parallel installed but not sure how can I use it here which can help me copy files in parallel by running directly on terminal? Or if there is any other way then I am open for that as well.

david
  • 2,147
  • 7
  • 25
  • 31
  • 1
    You're quite possibly disk or network bound, in which case running multiple copies in parallel will gain you nothing. Have you done the maths to see if it's worth it? – roaima Dec 14 '17 at 23:20
  • Yes it will help me for sure. And there is no harm in trying out since we do this very rare and whenever we do this it takes time so I want to experiment and see whether this parallel copy helps or not. If it doesn't help then we won't use it but if it helps then it's a plus thing for us. – david Dec 14 '17 at 23:21
  • Are they text files or binary data? – roaima Dec 14 '17 at 23:22
  • They all are binary data, mostly memory mapped files. – david Dec 14 '17 at 23:23
  • Would you want to parallelise one file each from five servers, or five files from one server? – roaima Dec 15 '17 at 00:03
  • My above command is copying from one remote server (machineA) and we have around 100 files to copy so five files in parallel from that remote server and keep doing until all the files get copied successfully to local server. – david Dec 15 '17 at 00:06
  • It is true that there is no harm in trying, but it would be interesting to find out where the bottleneck actually is. How much bandwidth is left unused during the transfer? What speed are you achieving in MB/s? Do you have an SSD on the receiving box? What's the sustained write speed? – simlev Dec 15 '17 at 09:47
  • Yes we do have SSD on the local server. – david Dec 15 '17 at 21:07

1 Answers1

1

Here's the command to be run on the remote server, involving find and parallel:

find /data/process/ -type f | parallel scp {} user@machineB:/data/process/

Edit:

See the documentation on how to control the number of jobs to be executed in parallel.

The number of concurrent jobs is given with --jobs or the equivalent -j.
By default --jobs is the same as the number of CPU cores.
--jobs 0 will run as many jobs in parallel as possible.

Edit:

This should be another question, and has already been asked and answered: how to run a command on a remote machine?

ssh user@machineA 'find /data/process/ -type f | parallel scp {} user@machineB:/data/process/'
simlev
  • 1,445
  • 2
  • 12
  • 19
  • I need to run this command on local server where I am copying right? And how many files will it copy? I dont see a number to control that? And what is `9001` here? – david Dec 15 '17 at 06:43
  • This command is to be run on the remote server containing the files to be copied. It will copy all files found by `find`, there's no limit on the total number. Ignore the `9001` as it is a leftover from a copy/paste. – simlev Dec 15 '17 at 06:46
  • If I have around 100 files then it will start 100 files copying in parallel? It will be great if I can control the number of parallel copy. There is no way to run the command on the local server instead? – david Dec 15 '17 at 06:47
  • `--jobs` controls the number of `scp` processes running in parallel. It defaults to the number of CPU cores. – Ole Tange Dec 15 '17 at 09:35