5

I am writing a shell script where I am reading all the files in a directory, performing some criteria checks and splitting certain files on number of records (lines)

I want to split file abc.txt as abc(AA or 01).txt (I don't mind anything as long as it begins with abc and ends in .txt)

Is there a simple way to do this ?

I am using split command:

split -l $line_count $file $????   

I am confused what should be in place of ????

I am open to other methods too but I would prefer to just change the ??? as the other parts of the script is ready.

Many Thanks

kpython
  • 167
  • 2
  • 5
  • 6
    Check this question [Split a file by line and have control over resulting files extension](http://unix.stackexchange.com/questions/32626/split-a-file-by-line-and-have-control-over-resulting-files-extension) – MikeA Oct 17 '16 at 19:16

1 Answers1

5

Try:

split -l 5 --additional-suffix=.txt abc.txt abc

Or, if you want numbers in place of letters:

split -l 5 -d --additional-suffix=.txt abc.txt abc

The abc that we added after the file names serves as the prefix.

Because you wanted .txt as a suffix, we added the option --additional-suffix=.txt.

The optional -d tells split to use numbers instead of letters.

Example

Let's start with a directory with one file:

$ ls
abc.txt

Now, let's split that file:

$ split -l 5 -d --additional-suffix=.txt abc.txt abc
$ ls
abc00.txt  abc01.txt  abc02.txt  abc03.txt  abc.txt

Work-around 1: using shell

Current versions of GNU split support the --additional-suffix option and split is part of GNU coreutils. That means that this option will be eventually be available on all linux systems.

For systems that currently lack it, one work around is to rename the files after split creates them. For example:

$ split -l 5 -d abc.txt abc
$ for f in ./abc??; do mv "$f" "$f.txt"; done
$ ls
abc00.txt  abc01.txt  abc02.txt  abc03.txt  abc.txt

The above assumes that the default suffix length of 2 applies. If not, change the number of ? to match the suffix length that you are using. For example, if you are using a suffix length of 5:

$ split -l 5 -a 5 -d abc.txt abc
$ for f in ./abc?????; do mv "$f" "$f.txt"; done
$ ls
abc00000.txt  abc00001.txt  abc00002.txt  abc00003.txt  abc.txt

Work-around 2: using awk

Here, the option l specifies the number of lines to include in each splitted file and d specifies the number of digits to use in the splitted files name. Make sure that d is large enough.

$ awk -v l=5 -v d=2 '{n="0000" int((NR-1)/l); f="abc" substr(n,length(n)+1-d) ".txt"; if (f!=old) close(old); old=f; print >f}' abc.txt
$ ls
abc00.txt  abc01.txt  abc02.txt  abc03.txt  abc.txt
John1024
  • 73,527
  • 11
  • 167
  • 163
  • This only applies to certain versions of `GNU split` so it may be distro specific. RHEL 6.7 for example does not have this option. – MikeA Oct 17 '16 at 19:15
  • @MikeA OK. Thanks for that info. I just checked with the GNU website and [GNU split](https://www.gnu.org/software/coreutils/manual/coreutils.html#split-invocation) does include the `--additional-suffix` option and `split` is part of coreutils. So, if RHEL doesn't have it yet, it will soon. – John1024 Oct 17 '16 at 19:20
  • Hopefully, it's a very nice and needed option. ;-) – MikeA Oct 17 '16 at 19:27
  • @John1024 It didnt work for me. unrecognized option '--additional-suffix=.txt' – kpython Oct 17 '16 at 19:31
  • perhaps like @MikeA mentioned, it is not available for all versions – kpython Oct 17 '16 at 19:35
  • @kpython I just updated the answer with a simple work-around. By the way, what OS are you on? – John1024 Oct 17 '16 at 19:46
  • @John1024 I am on Linux - writing ksh script – kpython Oct 17 '16 at 20:10