Asynchronous downloading from virtual machine?

Question

I have some files stored on a virtual machine that I'm downloading onto my PC. There are approximately 1 million files and I have been using the following command:

scp vm_user@IP:/home/vm_user/path_to_files /Users/documents

As you can imagine, this is slow as it downloads the files one by one. Are there quicker alternatives that can download the files asynchronously or apply concurrency to the downloads to increase download speed?

Marcus Müller · Answer 1 · 2022-02-24T11:03:12.127

As you can imagine, this is slow as it downloads the files one by one.

Define "one by one": That's one connection, and there's nothing to re-establish after each file. (by the way, although the program's called scp, the protocol used is almost certainly not SCP, but SFTP – which is more modern.)

Note that SFTP already uses request queuing in all implementations I'm aware of, so there's no "dead time" between finishing the first and starting the next file data / name / attribute transfer.

Are there quicker alternatives that can download the files asynchronously

what would "asynchronous" mean in this context? Because you're waiting for a thing to finish in the background is no faster by any means than to wait for it blockingly,

or apply concurrency to the downloads to increase download speed?

Concurrency does not in itself increase download speed at all. On the contrary, it increases overhead and potentially file system fragmentation on the receiver side, and seek times / cache invalidation on the transmitter side.

Where it helps is when e.g. web servers limit the per-connection speed. Then you're circumventing an artificial limit. I don't think you're limited artificially per-connection here.

Note that the assumption here is that these files are all small, a couple of network buffers in size. If that's not the case, SSH's internal buffer architecture as offered to higher layers (SFTP) limits your throughput; in that case, just use rsync instead, or something like ssh user@host tar -cf - --zstd folder/to/be/sent | tar -xf - --zstd. (Both options are still sequential, as sequentiality is not your problem.)

One by one, as it it downloads the first file in the list, once complete it downloads the next in line and so forth. Are there any methods to download multiple files at once? or what do you suggest is the best approach with this case? — dollar bill, Feb 24 '22 at 10:48
nothing's better than that; the solution is to do nothing. It's not slow because downloads are sequential, it's slow because that's a lot of files and presumably a lot of data. Can't fix that. You can try to instead use `tar` to create a compressed (e.g. `--zstd`) image and transfer it as data stream instead of files, but again, I doubt it's going to be much faster. — Marcus Müller, Feb 24 '22 at 10:50
Are there other programming languages that can do this much quicker which can access the virtual machine? — dollar bill, Feb 24 '22 at 11:05
Could you elaborate on what you mean? I mean, I gave an example of what work, and I mention rsync, and both can be used in the programming language of your choice. — Marcus Müller, Feb 24 '22 at 11:35

Asynchronous downloading from virtual machine?

1 Answers1