0

How can I combine multiple existing zip files into a single file? I'm asking because I have many PDFs to zip, and I want to leverage parallel to save time in zipping them all up (as in ls ./*.pdf | parallel "zip {}.zip {}"). This creates many zip files, but I want to combine the zip files into a single file. I'll then send all these PDFs as one zip file (instead of as many files).

I've heard that two concatenated zip files form a single valid zip file, but in my testing I haven't actually been able to unzip a big concatenated zip file back into multiple files-- For the most part only the first file gets preserved as far as I've been able to make it.

I've considered tarring all my zip files into a single bigfile.zip.tar, but I'd also prefer to avoid this method if possible because I don't think the standard decompression tools my recipient has access to will make it easy to restore a file like that back into many individual PDFs.

mttpgn
  • 212
  • 4
  • 12
  • What's the requirement on the other end? Can they use https://unix.stackexchange.com/q/4367/117549 or do you need all of the native PDF files as-is in a top-level zip file? – Jeff Schaller Mar 03 '20 at 19:37
  • There's also https://unix.stackexchange.com/questions/151644/how-should-i-combine-many-compressed-files-into-one-archive?rq=1, if it helps – Jeff Schaller Mar 03 '20 at 19:38
  • 1
    Also: why not `zip all.zip *.pdf` and be done with it? – Jeff Schaller Mar 03 '20 at 19:39
  • @JeffSchaller That's what we're doing now. Just want to speed things up a bit. – mttpgn Mar 03 '20 at 21:12
  • 1
    [off topic] for your example, you can avoid the ls with `parallel "zip {}.zip {} ::: *.pdf` – Larry Apr 11 '20 at 19:59
  • I think you're trying to get zip to act like a data base, in that you're asking multiple zips running at the same time to add PDF files to the same zip file, like adding rows in parallel to a data base. Zip doesn't haven any mechanism (nor does tar) to allocate an unknown amount of disk space to a zip file in order to add one PDF file, so that it doesn't interfere with another PDF file. Even if zip files could be concatenated you'd either have to concatenate one at a time, obviating the need for GNU parallel, or have zip allocate the unknown amount of disk space for each PDF. – Larry Apr 11 '20 at 20:04

2 Answers2

1

I don't think you can concatenate ZIP files easily.

How about gzip files? Unlike ZIP files, they should be easily concatenated as shown here: https://stackoverflow.com/questions/8005114/fast-concatenation-of-multiple-gzip-files. You can also use parallel gzip called pigz: https://zlib.net/pigz/. However, it is difficult to extract a single file from such concatenated gzip file, as stated in the answer from the first link.

Are you sure your recipients are not able to decompress tar files. Most of widely used de/compress programs now support tar archives. There is also an interesting option combining tar and pbzip2, which allows you to use parallel processing from pbzip2: https://linuxconfig.org/how-to-perform-a-faster-data-compression-with-pbzip2.

nobody
  • 1,545
  • 12
  • 19
  • You'd still have to add gzip'd PDFs one at a time to the gzip, and you'd end up with a bunch of concatenated PDFs that aren't very likely to be viewable as PDFs. Even if you used parallel compression it's not addressing the issue of adding multiple PDFs to the same zip/gzip/bzip2 file simultaneously. I don't think pixz (https://github.com/vasi/pixz) would be able to do that either with each PDF being a separate stream in the xz file, added simultaneously. – Larry Apr 11 '20 at 20:16
0

You basically want to concatenate zip-files. The zip file format does not support that, but it does support not compressing. This may make sense when using .pdf-files that are often already compressed.

So use -0:

zip -0 my.zip *pdf
Ole Tange
  • 33,591
  • 31
  • 102
  • 198