75

How do I break a large, +4GB file into smaller files of about 500MB each.

And how do I re-assemble them again to get the original file?

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Stefan
  • 24,830
  • 40
  • 98
  • 126

2 Answers2

92

You can use split and cat.

For example something like

$ split --bytes 500M --numeric-suffixes --suffix-length=3 foo foo.

(where the input filename is foo and the last argument is the output prefix). This will create files like foo.000 foo.001 ...

The same command with short options:

$ split -b 100k -d -a 3 foo foo

You can also specify "--line-bytes" if you wish it to split on line boundaries instead of just exact number of bytes.

For re-assembling the generated pieces again you can use e.g.:

$ cat foo.* > foo_2

(assuming that the shell sorts the results of shell globbing - and the number of parts does not exceed the system dependent limit of arguments)

You can compare the result via:

$ cmp foo foo_2
$ echo $?

(which should output 0)

Alternatively, you can use a combination of find/sort/xargs to re-assemble the pieces:

$ find -maxdepth 1 -type f -name 'foo.*'  | sort | xargs cat > foo_3
rogerdpack
  • 1,553
  • 3
  • 15
  • 24
maxschlepzig
  • 56,316
  • 50
  • 205
  • 279
  • 2
    Try this command: `man split cat md5sum` – Kevin M Sep 04 '10 at 19:13
  • 6
    When assembling, I recommend `cat foo.{000..NNN}` where `NNN` is the last expected piece. That way you get an error message if one of the pieces is missing. But note that `-d` to get numeric suffixes is specific to GNU split; on other platforms you have to make do with `foo.aaa`, `foo.aab`, etc. – Gilles 'SO- stop being evil' Oct 17 '10 at 11:16
  • 2
    And bear in mind that, for `split`, KB = 1000, K = 1024, MB = 1000*1000, M = 1024*1024 etc. – Zorawar Nov 29 '12 at 18:05
  • 1
    Shouldn't this `... cat > foo_3` be `... cat >>foo_3`? – alk Jul 08 '15 at 12:10
  • @alk, no, it should not. The part that `xargs` sees as arguments (and thus potentially forks/execs multiple times) is `cat`. The part `> foo_3` is interpreted by the shell (the shell creates the output redirection for the xargs process). Thus, everything is ok. – maxschlepzig Jul 08 '15 at 16:09
  • Ah yes, sure. Temp brain laps ... sry. – alk Jul 08 '15 at 16:22
  • 1
    If you decide to ease pain by using a utility. `rar` and `7zip` are often used in making such splits easier to reassemble cross-platform – infixed Jun 03 '16 at 18:05
4

You can also do this with Archive Manager if you prefer a GUI. Look under 'Save->Other Options->Split into volumes of'.