In the comments to this question a case came up where various sed implementations disagreed on a fairly simple program, and we (or at least I) weren't able to determine what the specification actually requires for it.
The issue is the behaviour of a range beginning at a deleted line:
1d;1,2d
Should line 2 be deleted even though the start of the range was removed before reaching that command? My initial expectation was "no" in line with BSD sed, while GNU sed says "yes", and checking the specification text doesn't entirely resolve the matter.
Matching my expectation are (at least) macOS and Solaris sed, and BSD sed. Disagreeing are (at least) GNU and Busybox sed, and numerous people here. The first two are SUS-certified while the others are likely more widespread. Which behaviour is correct?
The specification text for two-address ranges says:
The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits.
and
An editing command with two addresses shall select the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second. [...] Starting at the first line following the selected range, sed shall look again for the first address. Thereafter, the process shall be repeated.
Arguably, line 2 is within "the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second", regardless of whether the start point has been deleted. On the other hand, I expected the first d to move on to the next cycle and not give the range a chance to start. The UNIX™-certified implementations do what I expected, but potentially not what the specification mandates.
Some illustrative experiments follow, but the key question is: what should sed do when a range begins on a deleted line?
Experiments and examples
A simplified demonstration of the issue is this, which prints extra copies of lines rather than deleting them:
printf 'a\nb\n' | sed -e '1d;1,2p'
This provides sed with two lines of input, a and b. The program does two things:
Deletes the first line with
1d. Thedcommand willDelete the pattern space and start the next cycle. and
- Select the range of lines from 1 to 2 and explicitly prints them out, in addition to the automatic printing every line receives. A line included in the range should thus appear twice.
My expectation was that this should print
b
only, with the range not applying because 1,2 is never reached during line 1 (because d jumped to the next cycle/line already) and so range inclusion never begins, while a has been deleted. The conformant Unix seds of macOS and Solaris 10 produce this output, as does the non-POSIX sed in Solaris and BSD sed in general.
GNU sed, on the other hand, prints
b
b
indicating that it has interpreted the range. This occurs both in POSIX mode and not. Busybox's sed has the same behaviour (but not identical behaviour always, so it doesn't seem to be a result of shared code).
Further experimentation with
printf 'a\nb\nc\nd\ne\n' | sed -e '2d;2,/c/p'
printf 'a\nb\nc\nd\ne\n' | sed -e '2d;2,/d/p'
finds that it appears to treat a range starting at a deleted line as though it starts on the following line. This is visible because /c/ does not match to end the range. Using /b/ to start the range does not behave the same as 2.
The initial working example I was using was
printf '%s\n' a b c d e | sed -e '1{/a/d;};1,//d'
as a way to delete all lines up to the first /a/ match, even if that is on the first line (what GNU sed would use 0,/a/d for — this was an attempted POSIX-compatible rendition of that).
It has been suggested that this should instead delete up to the second match of /a/ if the first line matches (or the whole file if there's no second match), which seems plausible - but again, only GNU sed does that. Both macOS sed and Solaris's sed produce
b
c
d
e
for that, as I expected (GNU sed produces the empty output from removing the unterminated range; Busybox sed prints just d and e, which is clearly wrong no matter what). Generally I'd assume that their having passed the certification conformance tests means that their behaviour is correct, but enough people have suggested otherwise that I'm not sure, the specification text isn't completely convincing, and the test suite can't be perfectly comprehensive.
Clearly it isn't practically portable to write that code today given the inconsistency, but theoretically it should be equivalent everywhere with one meaning or the other. I think this is a bug, but I don't know against which implementation(s) to report it. My view currently is that GNU and Busybox sed's behaviour is inconsistent with the specification, but I could be mistaken on that.
What does POSIX require here?