2

I have a file with following contents

..\..\src\modules\core\abc\abc.cpp
..\..\src\modules\core\something\xyz\xyz.cpp
..\..\src\other_modules\new_core\something\pqr\pqr.cpp
..\..\src\other_modules\new_core\something\pqr\abc.cpp

The result I am expecting is

..\..\src\abc\abc.cpp
..\..\src\xyz\xyz.cpp
..\..\src\pqr\pqr.cpp
..\..\src\pqr\abc.cpp

How can I achieve this using sed?

I am unable to write an regular expression to capture two groups at the same time.

  1. initial group (....\src) - this will be same in all the lines
  2. variable group (abc\abc.cpp) or (xyz\xyz.cpp) or (pqr\pqr.cpp) or (pqr\abc.cpp)
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250

4 Answers4

2

With BSD sed or recent versions of GNU sed (for older versions, replace -E with -r):

sed -E 's#(.*\\src).*(\\[^\]+\\[^\]+$)#\1\2#' file.txt
  • # is used as the delimiter for substitution (s) command of sed, to avoid ambiguity involving \s in the input

  • (.*\\src) matches upto src from start, and put the match in captured group 1

  • (\\[^\]+\\[^\]+$) matches the portion having two \s till end, and put in captured group 2, the .* preceding this matches everything in between the first and second captured groups

  • In the replacement we have used the two captured groups

POSIX-ly:

sed 's#\(.*\\src\).*\(\\[^\]\+\\[^\]\+$\)#\1\2#' file.txt

Example:

% cat file.txt
..\..\src\modules\core\abc\abc.cpp
..\..\src\modules\core\something\xyz\xyz.cpp
..\..\src\other_modules\new_core\something\pqr\pqr.cpp
..\..\src\other_modules\new_core\something\pqr\abc.cpp

% sed -E 's#(.*\\src).*(\\[^\]+\\[^\]+$)#\1\2#' file.txt
..\..\src\abc\abc.cpp
..\..\src\xyz\xyz.cpp
..\..\src\pqr\pqr.cpp
..\..\src\pqr\abc.cpp
heemayl
  • 54,820
  • 8
  • 124
  • 141
  • could you let me know why did you use `sed -E` – dhiraj suvarna Oct 04 '16 at 05:39
  • 1
    @phoenix The `-E` option lets us to use ERE (Extended RegEx) patterns, otherwise we have to use BRE (Basic RegEx) patterns. In practice, we then need to escape the `()`s in the captured groups, `+` token, and also `{}`, `?` if present and used as Regex token, otherwise these will be treated literally. Without `-E`: `sed 's#\(.*\\src\).*\(\\[^\]\+\\[^\]\+$\)#\1\2#' file.txt` – heemayl Oct 04 '16 at 05:43
  • 1
    @phoenix, not only that, but `\+` (as in above comment) is much less portable than using `-E` and `+`. `-E` is [fairly standard](http://unix.stackexchange.com/q/310446/135943). – Wildcard Oct 04 '16 at 05:50
  • can also use `sed -E 's#(.*\\src).*((\\[^\]+){2})$#\1\2#'` to avoid repeating regex pattern, can easily change number as well if requirement changes.... – Sundeep Oct 04 '16 at 07:36
0

Create a file with data

-rwxr-xr-x. 1 sasi   webApp  190 Oct  4 13:42 file.txt

Execute below command

[sasi@localhost temp]$ sed -E 's#(.*\\src).*(\\[^\]+\\[^\]+$)#\1\2#' file.txt
..\..\src\abc\abc.cpp
..\..\src\xyz\xyz.cpp
..\..\src\pqr\pqr.cpp
..\..\src\pqr\abc.cpp
[sasi@localhost temp]$
[sasi@localhost temp]$
[sasi@localhost temp]$
Sparhawk
  • 19,561
  • 18
  • 86
  • 152
0

Alternate solutions:

With GNU grep and paste

grep extracts the two patterns .*\\src or (\\[^\]+){2}$ and prints them on separate lines. The output is then combined using paste

$ grep -oE '.*\\src|(\\[^\]+){2}$' ip.txt | paste -d '' - -
..\..\src\abc\abc.cpp
..\..\src\xyz\xyz.cpp
..\..\src\pqr\pqr.cpp
..\..\src\pqr\abc.cpp

With perl

$ perl -pe 's/.*\\src\K.*(?=(\\[^\\]+){2}$)//' ip.txt 
..\..\src\abc\abc.cpp
..\..\src\xyz\xyz.cpp
..\..\src\pqr\pqr.cpp
..\..\src\pqr\abc.cpp

Here the text between the patterns .*\\src and (\\[^\\]+){2}$ is deleted by making use of positive lookarounds

Sundeep
  • 11,753
  • 2
  • 26
  • 57
0

Why bash this with regex? Path munging doesn't require regular expressions; OS kernels don't use regexes to follow paths.

With Awk, we just use backslash as a separator and components become fields:

awk 'BEGIN { FS = OFS = "\\" } { print $1, $2, $3, $(NF-1), $NF }'
Kaz
  • 7,676
  • 1
  • 25
  • 46