43

I have a big file and need to split into two files. Suppose in the first file the 1000 lines should be selected and put into another file and delete those lines in the first file.

I tried using split but it is creating multiple chunks.

don_crissti
  • 79,330
  • 30
  • 216
  • 245
Aravind
  • 1,559
  • 9
  • 31
  • 44

4 Answers4

60

The easiest way is probably to use head and tail:

$ head -n 1000 input-file > output1
$ tail -n +1001 input-file > output2

That will put the first 1000 lines from input-file into output1, and all lines from 1001 till the end in output2

Michael Mrozek
  • 91,316
  • 38
  • 238
  • 232
29

I think that split is you best approach.

Try using the -l xxxx option, where xxxx is the number of lines you want in each file (default is 1000).

You can use the -n yy option if you are more concerned about the amount of files created. Use -n 2 will split your file in only 2 parts, no matter the amount of lines in each file.

You can count the amount of lines in your file with wc -l filename. This is the 'wordcount' command with the lines option.

References

  • man split
  • man wc
slm
  • 363,520
  • 117
  • 767
  • 871
Lucien Raven
  • 399
  • 2
  • 3
  • 1
    This is how to split into a bunch of files with a fixed number of lines, or how to split evenly into a fixed number of files. Is there a way to split into one 1000-line file and one file with everything else? That's what he was asking for; I couldn't find it in the man page – Michael Mrozek Oct 21 '14 at 17:05
  • You´re correct Michael. I think I took a simplistic view on the question. You solution is the best one in this case. Another way would be to use the 'sed' command: sed -n 1,1000 originalfile > first_1000_lines. sed '1,1000d' originalfile > remaining_lines. – Lucien Raven Oct 21 '14 at 17:17
  • Of course you could do `split -l 1000 bigfile && mv xaa piece1 && cat x?? > piece2 && rm x??`. – G-Man Says 'Reinstate Monica' Oct 21 '14 at 23:40
  • `split` is what I was looking for – Daniel Apr 08 '20 at 20:53
  • split with both -l and -n options doesn't run ('split: cannot split in more than one way'). Question wanted file into 2 parts, but at a specific line: split is the wrong tool for this job. csplit is the correct tool – RGD2 Jun 28 '21 at 23:40
17

This is a job for csplit:

csplit -s infile 1001 

will silently split infile, the first piece xx00 - up to but not including line 1001 and the second piece xx01 - the remaining lines.
You can play with the options if you need different output file names e.g. using -f and specifying a prefix:

csplit -sf piece. infile 1001 

produces two files named piece.00 and piece.01


With a smart head you could also do something like:

{ head -n 1000 > 1st.out; cat > 2nd.out; } < infile
don_crissti
  • 79,330
  • 30
  • 216
  • 245
  • 2
    Wow, it really *is* a job for `csplit`. Very nice. (I'm just reading through the list of POSIX commands and had enormous trouble wrapping my head around the `csplit` command's purpose at first. Turns out it's really really simple.) :) – Wildcard Nov 02 '16 at 05:38
5

A simple way to do what the question asks for, in one command:

awk '{ if (NR <= 1000) print > "piece1"; else print > "piece2"; }' bigfile

or, for those of you who really hate to type long, intuitively comprehensible commands,

awk '{ print > ((NR <= 1000) ? "piece1" : "piece2"); }' bigfile