5

I have two textfiles: file1 and file2, both with several lines.

$ cat file1
line one
line two
line three
line four
line five

$ cat file2
line A
line B
line C
line D
line E
line F

I would like to substitute a range of lines of file1 (from line 1_start to line 1_end) with a range of lines of file2 (from line 2_start to line 2_end).

For example, substitute lines 2,4 in file1 with lines 3,5 from file2.

What I only could do till now is to extract the needed lines from file2 with

$ sed -n 3,5p file2

But it doesn't help putting them in file1. Is it possible with sed? If not, is it possible with a similar tool?

BowPark
  • 4,811
  • 12
  • 47
  • 74

4 Answers4

8

sed can print a given range of lines with something like this:

sed -n 'X,Yp' filename

Where X is first line in a range and Y is the last line, both inclusive. -n tells sed not to print anything unless explicitly told to do so and that's what the p following the range does.

So you can easily call this three times, appending to a temporary file, then move that file to wherever you want. You can also combine them all using cat and process substitution as this example shows (I'm using line numbers I just pulled out of thin air; $ is last line in a file):

cat <(sed -n '1,5p' file1) <(sed -n '10,12p' file2) <(sed -n '9,$p' file1) > file1.tmp && mv file1.tmp file1

Here, we'd be replacing lines 6, 7 and 8 in file1 with lines 10, 11 and 12 from file2.

Update: Thanks to @MiniMax for pointing out that catand the process substitution can be avoided by doing the following:

{ sed -n '1,5p' file1; sed -n '10,12p' file2; sed -n '9,$p' file1; } > file1.tmp && mv file1.tmp file1

KISS, after all. :)

B Layer
  • 5,107
  • 1
  • 17
  • 34
  • This is inefficient, especially with huge input files (you're reading the entire `file1` twice and the entire `file2`). – don_crissti Dec 06 '17 at 13:34
  • 2
    @don_crissti Meh. You're nitpicking. Yes reading the file twice is relatively inefficient. So is using shell scripts. But go run a million liner through `sed` and see how long it takes. Less than a second on the very slow machine I'm sitting at. It's pointless to worry about performance here short of OP telling us they're dealing with _HUGE_ files. By the way, "especially with huge files"? What does that have to do with efficiency? Put a two liner through it or a 4B line monster...same efficiency. – B Layer Dec 06 '17 at 14:23
  • 1
    1. It's _not_ pointless to worry about efficiency here (or on any similar site). You've been here long enough to know that you're not posting only for this particular OP but for future visitors too. Even if this was the only person in the world to benefit from my answer I would still want to do it in the most efficient way (and it's no big deal really to add some `q`s to your `sed`s). 2. You're right. What I meant was "the difference between efficient and inefficient would be more visible with huge files"... I'm lazy (typing on my phone). – don_crissti Dec 06 '17 at 14:42
  • 1
    We're going to have to agree to disagree. I'm a professional sw dev/programmer for 20 years (you may have similar credentials...my point is not to have a size measuring contest just to tell you where I'm coming from) and KISS and "don't optimize until you need to" are the accepted best practices I embrace. I will always favor a bit less complexity over a modicum of performance gain. (I concede, though, that "pointless" was not the best choice of words.) – B Layer Dec 06 '17 at 15:11
  • Why not just: `{ sed -n '1,5p' file1; sed -n '10,12p' file2; sed -n '9,$p' file1; } > file1`? `cat` doesn't needed, process substitution doesn't needed. – MiniMax Dec 06 '17 at 21:05
  • @BLayer I test it just now and it doesn't work, because of this: [bash redirect input from file back into same file](https://stackoverflow.com/q/6696842/2913477). But your version have the same problem - check the `file1` after commands execution. The temporary file needed. – MiniMax Dec 06 '17 at 21:42
  • That's what I get for changing to `file1` without thinking about it. I should have left it the way it was originally. Thanks for the heads up. – B Layer Dec 06 '17 at 22:47
5

Another way to do with sed is using r command, handy if -i inplace option has to be used as well

$ sed -n '3,5p; 5q;' f2 | sed -e '2r /dev/stdin' -e '2,4d' f1
line one
line C
line D
line E
line five

$ # if /dev/stdin is not supported
$ sed -n '3,5p; 5q;' f2 > t1
$ sed -e '2r t1' -e '2,4d' f1

Thanks to don_crissti for reminding that we could quit as soon as required line(s) are obtained from file 2.

Sundeep
  • 11,753
  • 2
  • 26
  • 57
2

With huge input files this may be faster:

# replacing lines m1,m2 from file1 with lines n1,n2 from file2
m1=2; m2=4; n1=3; n2=5
{ head -n $((m1-1)); { head -n $((n1-1)) >/dev/null; head -n $((n2-n1+1));
} <file2; head -n $((m2-m1+1)) >/dev/null; cat; } <file1

It's explained here, the only difference being the one-line ranges in that particular case.

don_crissti
  • 79,330
  • 30
  • 216
  • 245
1

I've started doing everything with Python lately, so here's a Python program that does what you want:

#!/usr/bin/env python2
# -*- coding: ascii  -*-
"""replace_range.py"""

import sys
import argparse

parser = argparse.ArgumentParser()

parser.add_argument(
    "matchfile",
    help="File in which to replace lines",
)
parser.add_argument(
    "matchrange",
    help="Comma-separated range of Lines to match and replace",
)
parser.add_argument(
    "replacementfile",
    help="File from which to get replacement lines"
)
parser.add_argument(
    "replacementrange",
    help="Comma-separated range of lines from which to get replacement"
)

if __name__=="__main__":

    # Parse the command-line arguments
    args = parser.parse_args()

    # Open the files
    with \
    open(args.matchfile, 'r') as matchfile, \
    open(args.replacementfile, 'r') as replacementfile:

        # Get the input from the match file as a list of strings 
        matchlines = matchfile.readlines()

        # Get the match range (NOTE: shitf by -1 to convert to zero-indexed list)
        mstart = int(args.matchrange.strip().split(',')[0]) - 1
        mend = int(args.matchrange.strip().split(',')[1]) - 1

        # Get the input from the replacement file as a list of strings 
        replacementlines = replacementfile.readlines()

        # Get the replacement range (NOTE: shitf by -1 to convert to zero-indexed list)
        rstart = int(args.replacementrange.strip().split(',')[0]) -1
        rend = int(args.replacementrange.strip().split(',')[1]) - 1

        # Replace the match text with the replacement text
        outputlines = matchlines[0:mstart] + replacementlines[rstart:rend+1] + matchlines[mend+1:]

        # Output the result
        sys.stdout.write(''.join(outputlines))

And here's what it looks like in action:

user@host:~$ python replace_range.py file1 2,3 file2 2,4

line one
line B
line C
line D
line four
line five
igal
  • 9,666
  • 1
  • 42
  • 58