3

What a long title. Essentially, what I have is a collection of files that need to be searched recursively with a regex, and replaced.

What I have so far works without capture groups, however it does nothing when using them. I am currently using a command that I found on another question:

grep -rlP "/\* *(\d+) *\*/ (.*)" . | xargs sed -i "s/\/\* *(\d+) *\*\/ (.*)/$2 \/\/ JD $1/g"

This regex is very confusing because it contains a lot of escaped asterisks and slashes, but essentially it takes in the string (for example)

/*  73 */   private static int last = -1000;

and replacing it with

private static int last = -1000; // JD 73

However, as I said earlier, it simply does not work, and the files are unchanged.

It works fine with an alternate regex that does not utilize capture groups

grep -rl "/\* *\*/ " . | xargs sed -i "s/\/\* *\*\/ //g"

but as soon as I try to introduce capture groups, it just silently fails.

I can tell it's searching through the files, as I can hear the drive spin up for a moment like with the successful one, but in the end the files remain unchanged.

Could it be possible to modify the command such that it works, or must I do it in a completely different way? Also, ideally the solution wouldn't require a bash loop. Thanks.

schrodingerscatcuriosity
  • 12,087
  • 3
  • 29
  • 57
Moiré
  • 75
  • 7
  • `(` and `)` are literal in sed basic regular expressions - see [Why does my regular expression work in X but not in Y?](https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y) – steeldriver Apr 28 '20 at 22:33
  • So you are saying they must be escaped? – Moiré Apr 29 '20 at 01:00

3 Answers3

4
  • Replace -P with -E in grep and use [[:digit:]] or [0-9]+ instead of (\d+) since you don't use any other Perl-compatible things and you don't need the parentheses
  • Remove (.*) from grep, this is redundant
  • Add -E to sed or you have to escape your capturing groups (...) and the +
  • Sed doesn't understand \d+, replace it with [[:digit:]] or [0-9]+
  • Replace the backreferences $1 with \1 and $2 with \2
  • I think you can safely remove the g, JD only creates one comment at the beginning of the line.

grep -Erl '/\* *[[:digit:]]+ *\*/' . |
  xargs sed -Ei 's/\/\* *([[:digit:]]+) *\*\/ (.*)/\2 \/\/ JD \1/'
Freddy
  • 25,172
  • 1
  • 21
  • 60
  • Ah, so you recognized it's from JD! I was wondering if anybody would. I was under the impression the `g` would be required in order for it to work multiple times on the same file. And replacing the `$` with `\` creates an error: `sed: -e expression #1, char 42: invalid reference \2 on `s' command's RHS`. (Command is `grep -rlE "/\* *[0-9]+ *\*/ .*" . | xargs sed -iE "s/\/\* *([0-9]+) *\*\/ (.*)/\2 \/\/ JD \1/"`) – Moiré Apr 29 '20 at 01:05
  • 1
    Yes, guilty, Java guy. No, you need the `g` to apply the substitution more than once on each line. Your command looks okay, you even have removed the parenthesis I forgot. Just replace `-iE` with `-Ei`. This confuses `sed`, it thinks `E` is the backup suffix and the expression is run without the `-E` option. You may also remove the ```.*``` in `grep`. – Freddy Apr 29 '20 at 01:21
  • Hmm. That almost worked. My command is now `grep -rlE "/\* *[0-9]+ *\*/ " . | xargs sed -Ei "s/\/\* *([0-9]+) *\*\/ (.*)/\2 \/\/ JD \1/" `. However, it now puts `\1` on a newline from `\2` and I am not sure why. I assume that it is picking up the newline at the end, and inserting it in `\2`. However, I don't know how to get rid of this. I tried putting a `\n` at the end of the capture group (`grep -rlE "/\* *[0-9]+ *\*/ .*\n" . | xargs sed -Ei "s/\/\* *([0-9]+) *\*\/ (.*)\n/\2 \/\/ JD \1/"`, however that made it never activate, I assume the newline isn't given to sed. Maybe I could trim it? – Moiré Apr 29 '20 at 03:20
  • 1
    Check if your files have DOS CRLF line endings (`cat -A file` on a modified file). This would "move" the modified comment to the the beginning of each line when printed with `cat` without `-A`. – Freddy Apr 29 '20 at 03:49
  • (Deleted comment was irrelevant) - Yes. It is DOS CRLF, as it displays `^M$` at the end of the line. After realizing this, I simply added a `\r` after the capture group so as not to capture it, and that solved it! Thank you for the help. – Moiré Apr 29 '20 at 16:46
1

In sed, captured groups are referenced with \1,\2, etc. instead of $1, $2, etc. See Back_002dreferences-and-Subexpressions.html

simonz
  • 131
  • 3
1

Use only sed, like this example

echo "/*  73 */   private static int last = -1000;" | 
    sed 's#^/\*[[:blank:]]*\([0-9]*\)[[:blank:]]*\*/[[:blank:]]*\(.*\)$#\2 // JD \1#g'
private static int last = -1000; // JD 73
schrodingerscatcuriosity
  • 12,087
  • 3
  • 29
  • 57