6

This is what I tried, when intending to replace /path/to/a with /path/to/b using NUL as the separator/delimiter:

$ cat pathsList| sed -r -e 's\0/path/to/a\0/path/to/b\0g'
sed: -e expression #1, char 27: number option to `s' command may not be zero

My wanting to go for NUL: NUL and / are the only characters that are disallowed on ext4fs, and / is already used heavily as the pathname separator. Also, I want to avoid quoting and unquoting my data just to be able to used sed.

If NUL can't be used as a delimiter (let's say), I'll be okay with any workaround better than quoting and unquoting of my data.

$ sed --version
sed (GNU sed) 4.4
Harry
  • 802
  • 2
  • 9
  • 21
  • could you clarify? this works... `echo "/path/to/a/folder" | sed 's/o/x/g'` .. am not getting why you need any other delimiter here.. perhaps a give a better example? – Sundeep Jul 31 '18 at 03:47
  • as per manual, `Any character other than backslash or newline can be used` as delimiter for `s` command.. however, I don't know how to specify ASCII NUL as delimiter.. may be you have to use some inserting special character trickery on the terminal.. – Sundeep Jul 31 '18 at 03:50
  • My example was not about my difficulty in replacing `o` with `x`, rather about my difficulty in using the `NUL` character. – Harry Jul 31 '18 at 09:45
  • Your example has no need for a `nul` delimiter in the `sed` expression. Do you have a better example? – Kusalananda Aug 01 '18 at 16:57
  • @Kusalananda Edited the example. – Harry Aug 02 '18 at 10:56

6 Answers6

8

Unfortunately, it doesn't seem like it's possible to use NUL as a separator for the s/// command in sed.

If you want to create a string with a NUL character in it, you could use the $'...' form which bash and other shells recognize, so you might think this would work:

sed -r -e $'s\0o\0x\0g'

But the way arguments are passed in Linux (and Unix in general) makes it so that it's not really possible to pass strings with embedded NULs, since all you get is an argc (number of arguments) and argv which is an array of char *, then NUL-terminated strings (C strings) is the only possible way to take the arguments. In other words, all sed (or any program) will see if passed $'s\0o\0x\0g' is simply "s" (and the NUL, which they must take as the end of the string.)

I thought perhaps passing that as an external file to sed might work, since in that case sed can know that the NULs are embedded and potentially track the full string by its length, so I tried this:

$ cat -v script.sed 
s^@o^@x^@g

The ^@s are the NUL bytes. I inserted them in vim using Ctrlv000 (three zeroes) which is the vim keystroke to enter a character by its ASCII value.

But that doesn't seem to work either:

$ echo "/path/to/a/folder" | sed -r -f script.sed 
sed: file script.sed line 1: delimiter character is not a single-byte character

Interestingly, that is different from when there's only a single s in the script file, in which case sed complains of unterminated 's' command... So it seems to be keeping track of the string by its length, but still doesn't look happy to use NUL as its separator character.

Looking at the source code of sed, it's unclear whether this was intended or whether it was a bug. In function is_mb_char() which tries to detect whether the byte is part of a multi-byte character, handling for NUL goes like this:

case 0: /* Special case of mbrtowc(3): the NUL character */
  /* TODO: test this */
  return 1;

In this case, return 1 means "yes, it's a multi-byte char", which is not really the case.

A comment a few lines above says:

/*
 * Return zero in all other cases:
 *   CH is a valid single-byte character (e.g. 0x01-0x7F in UTF-8 locales);
 *   CH is an invalid byte in a multibyte sequence for the currentl locale,
 *   CH is the NUL byte.
 */

So perhaps return 0 was intended?

The commit which introduced this code doesn't have that much more context here...

The man page for mbrtowc(3) mentions L'\0' which I assume is some kind of multi-byte NUL, so maybe that's why they decided to handle it this way?

I hope this information is still helpful!

filbranden
  • 21,113
  • 3
  • 58
  • 84
  • 4
    +1 for taking the time to look into its source and enlightening us about it here. This means, `sed` is broken. – Harry Jul 31 '18 at 07:42
  • sed does support \0 as delimiter since 2012 (sed 4.2.2) – Ding-Yi Chen Nov 29 '22 at 06:27
  • 3
    @Ding-YiChen I believe you're talking about using `\0` as a line separator, which is available with the `-z` command-line option and indeed available since 2012. This question is about using `\0` as a separator for a `s/.../.../` or `s#...#...#` command. While it's possible to use characters other than `/`, using a NUL character is not really possible here. – filbranden Nov 30 '22 at 20:26
  • 1
    @filbranden you are right – Ding-Yi Chen Jan 25 '23 at 06:36
3

While NUL can't be found in a file name (for the similar reason it can't be found in a command argument), . (very common), ^, *, [, $, \ all can and would also have to be escaped anyway as they are regular expression operators understood by sed's s command.

You can always do that escaping in an automated fashion.

Note that beside NUL, newline and all multi-byte characters can't be used in GNU sed either. Other implementations may have different limitations. POSIX also prohibits backslash (though it works for GNU sed), so I would recommend sticking with graphical characters other than backslash from the portable character set.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
2

If you want to replace single characters (bytes) with single characters (bytes), use tr:

$ echo "/path/to/a/folder" | tr ao xy
/pxth/ty/x/fylder

For arbitrary strings, you could use Perl:

$ echo "/path/to/a/folder" | patt=o repl=xx perl -pe 's/$ENV{patt}/$ENV{repl}/g'
/path/txx/a/fxxlder

(I passed patt and repl through the environment, since perl -p implies taking the command line arguments as names of files to process.)

Here, of course, patt is taken as a regular expression, with everything that implies:

$ echo "/path/to/a/folder" | patt='a.' repl=x perl -pe 's/$ENV{patt}/$ENV{repl}/g'
/pxh/to/xfolder

So you'll need to either escape the dots (\.) and other special characters, or use \Q$ENV{patt}:

$ echo "/path/to/a/folder.txt" | patt=. repl=, perl -pe 's/\Q$ENV{patt}/$ENV{repl}/g'
/path/to/a/folder,txt

In both the above cases (command-line arguments and environment variables), the interface between the OS and the utility passes the strings as NUL-terminated strings, as used by the C standard library. This interface makes it impossible to inject literal NUL bytes in the arguments, and sed -e 's\a\x\g' has sed use the literal backslash as a separator for the s command.

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
1

GNU sed supports the option -z since 2012.

Example:

$ printf 'foo\0bar\0' | sed -z 's/$/!/' | tr '\0' '\n'
foo!
bar!

But in most cases it is better to use Perl.

$ printf '%s\n' path1 path2 | perl -pe 'BEGIN {($a, $b) = (shift, shift)} s($a){$b}g' 'path' $'some/fance/new/name\t'
some/fance/new/name     1
some/fance/new/name     2
ceving
  • 3,461
  • 5
  • 21
  • 30
0

Answer from @cerving' is close, but no need to use tr.

cat pathsList| sed -z 's/\n/\x0/g'

-z for using \x0 as delimiter. It essentially turns your file into a long string (if pathsList does not contain \x0 already). Thus your file should not be too big to fit the available memory.

Ding-Yi Chen
  • 150
  • 4
-2

You can try if this works:

$ echo "/path/to/a/folder" | sed -r -e 's/\0o/\0x/g'
slm
  • 363,520
  • 117
  • 767
  • 871
Nisha
  • 37
  • 4