1

I have multiple files in multiple subdirectories where I need to remove all instances of the control character "^@". Just a basic grep doesn't seems to be working. This what I've tried most recently.

 grep -rl '\^@' ./ | xargs sed -i 's/[\^@]//g' 

Any suggestions?

Bryan
  • 13
  • 2

2 Answers2

4

^@ is usually a representation of the NUL character (byte value 0).

Many non-GNU text utilities can't deal with that as it's not meant to be found in text.

Some versions of GNU grep could find it with:

grep -P '\0'

GNU sed can remove it with sed 's/\x0//g', so:

grep -rlZP '\0' . | xargs -r0 sed -i 's/\x0//g'

If your grep won't find them, try GNU awk:

find . -type f -exec gawk -vORS='\0' '
  /\0/{print FILENAME; nextfile}' {} + |
  xargs -r0 sed -i 's/\x0//g'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • Thanks, the grep works awesome. I forgot to mention, some of the files are gzipped and this solution seems to mess them up. Is there an easy fix for that? – Bryan Oct 09 '18 at 16:33
  • OK, I think I'm good here. The zipped files are not a concern. I just need to avoid them. I added -name *.dat and the seems to have solved my problem. – Bryan Oct 09 '18 at 17:21
0

^@ is how the NUL byte (numerical value 0) is often represented, e.g.

$ printf "null\000byte\n" > nullbyte
$ cat -A nullbyte 
null^@byte$

One problem with dealing with it, is that you can't pass it literally on the command line. It's just impossible, as the same byte is used to terminate the command line arguments. Instead, you'll have to escape it somehow (and \^@ will not work.)

Regular expressions as supported by GNU grep on my system don't seem to provide a way to deal with it. GNU sed on the other hand appears to understand \x00, so this works to remove it:

$ sed -e 's/\x00//g' nullbyte  |cat -A
nullbyte$

tr should also work, though it doesn't have -i:

$ tr -d '\000' < nullbyte  | cat -A
nullbyte$
ilkkachu
  • 133,243
  • 15
  • 236
  • 397