3

How can I bury an invisible mark into random lines of text? Such a mark has to be there, though it will be invisible to someone reading that text printed out on the console.

I want to identify those lines by means of an invisible mark in order to, for instance, grep them in or out later.

I tried 0x00 without success. I expected grep to print lines matching 0x00 somewhere. But this didn't work:

$ echo -e "a\0b" | hexdump -C
00000000  61 00 62 0a                                       |a.b.|
00000004
$ echo -e "a\0b" | grep "a\0b"
n.r.
  • 2,173
  • 3
  • 18
  • 30
  • My thesis was on steganography in text data; rather than paste all of it I'll simply say "it's a difficult problem with multiple tradeoffs". – Bandrami Dec 30 '13 at 10:11

2 Answers2

5

There's no fully reliable way to put an invisible mark in a text file. A text file has no room for anything that isn't plain text, after all. Comments (text that doesn't belong in the main text) are a form of markup.

Null bytes are a bad idea not only because they may be rendered as ^@ or or or other ways, but also because many text processing tools choke on them. Null bytes are an end-of-string marker in the C programming language and many programs treat it as the end of a text chunk (e.g. of a line, but not necessarily) because they're written in C or use libraries written in C.

If your text is encoded in Unicode, you can use one of its several zero-width characters:

  • U+200B ZERO WIDTH SPACE (a zero-width breaking space)
  • U+200C ZERO WIDTH NON-JOINER (a zero-width word constituent that prevents ligatures)
  • U+200D ZERO WIDTH JOINER (a zero-width word constituent that forces ligatures)
  • U+2060 WORD JOINER (a zero-width non-breaking space)

The spaces are not word constituents, the others are. Although none of these characters are visibly rendered (assuming a viewer with reasonable Unicode support), this has an impact when selecting text, moving around, searching, etc. The breaking space can be rendered as a line break.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • Another trick is Unicode characters that look like Latin characters. The interesting thing is that what fools humans is easily spotted by software, and vice versa. – Bandrami Dec 30 '13 at 10:13
2

You can grep for null or other special characters using the -P flag and the hex code:

echo -e "a\0b\nhello" | grep -a -P '\x0'

You could also hide text by putting backspace characters after them, for example:

$ echo -e "the matrix\0\0\0\0\n\bh\ba\bs\b \by\bo\bu\b\0\0:-)"
the matrix
:-)

$ echo -e "the matrix\0\0\0\0\n\bh\ba\bs\b \by\bo\bu\b\0\0:-)"  | hexdump -C
00000000  74 68 65 20 6d 61 74 72  69 78 00 00 00 00 0a 08  |the matrix......|
00000010  68 08 61 08 73 08 20 08  79 08 6f 08 75 08 00 00  |h.a.s. .y.o.u...|
00000020  3a 2d 29 0a                                       |:-).|
00000024
janos
  • 11,171
  • 3
  • 35
  • 53
  • 1
    Backspace characters only work when displaying text directly in a terminal or with some terminal-based programs. It doesn't work in most of the tools people use to view text files (editors, browsers). – Gilles 'SO- stop being evil' Dec 29 '13 at 23:18