17

I would like to replace a set of characters with corresponding characters from another set, something like this:

original set: ots
"target" set: u.x

foobartest → fuubar.ex.

Translations/transliterations like this are the specialty of the tr command:

$ echo 'foobartest' | tr 'ots' 'u.x'
fuubar.ex.

Unfortunately tr doesn't support changing files in-place like sed does.
I would like to use sed so I don't have to reinvent the wheel of juggling temp files.

n.st
  • 7,918
  • 4
  • 35
  • 53
  • Self-answering this question since I couldn't seem to find any results for "sed translate characters". The magic keyword ended up being "transliterate", but I figured it's worth making this feature as easily findable as possible. – n.st Sep 13 '17 at 16:28
  • Something to keep in mind when trying to implement workarounds for this: `tr` (correctly) ignores recursion in the replacement sets: `echo 'abc' | tr ab bx` → `bxc`. A primitive solution might butcher that to `xxc` because it re-applies the translation to characters that have already been translated. – n.st Sep 13 '17 at 16:41
  • Related: [tr analog for unicode characters?](//unix.stackexchange.com/q/389615) (GNU `sed` contrary to GNU `tr` can transliterate multi-byte characters) – Stéphane Chazelas Sep 13 '17 at 16:51
  • If you want another possibility: perl can do translate, and -i, and (unless ancient) multibyte. Not POSIX, but pretty common. – dave_thompson_085 Sep 14 '17 at 09:26

3 Answers3

28

sed has the y command that works just like tr at least in most implementations:

$ echo 'foobartest' | sed 'y/ots/u.x/'
fuubar.ex.

The y command is part the POSIX sed specification, so it should work on just about any platform.

And since it's sed, you can have it replace a file with its edited version, sparing you the bothersome temp file business (provided your implementation of sed supports the -i option, which is not specified by POSIX):

$ sed -i 'y/ots/u.x/' some-file.txt

Currently BSD implementation of sed does not actually mirror the behavior of tr in some corner cases

Jason Hemann
  • 174
  • 6
n.st
  • 7,918
  • 4
  • 35
  • 53
  • Thanks, this is extraordinarily useful! I was expecting it to work in VIM (8.0.1092 on CentOS 7.3) but it doesn't. Shouldn't anything sed does, VIM do? – dotancohen Sep 14 '17 at 09:39
  • 1
    @dotancohen Just because Vim's *substitution* function is modelled after `sed`'s doesn't mean the other functions are as well. ;) The Vim mailing list has [a thread](http://vim.1045645.n5.nabble.com/equivalent-of-sed-y-abc-def-in-vim-td1178535.html) about finding a `y/abc/def/` equivalent; the best option seems to be `:%call setline(".", tr(getline("."),"abc","def"))`. – n.st Sep 14 '17 at 10:25
8

If like in your case, you're transliterating characters without changing their size (anyway, some implementations like GNU tr only support single-byte characters), you can do:

tr 'ots' 'u.x' < file 1<> file

That is, have tr overwrite the file over itself.

That's better than sed -i on several accounts:

  • it doesn't need extra disk space (except for some sparse file, copy-on-write special cases)
  • it preserves inode numbers, ownership, permissions, ACLs...
  • it works OK with symlinks, it doesn't break hard links
  • it doesn't leave temp files lying about when killed.

One drawback is that if it's interrupted, the file will end up being half-translated (in this case, though, you can run it again to finish it). Some sed implementations would handle that correctly by making sure the original file remains unchanged unless the command succeeds.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • 3
    Be careful re-running the translation if you've got recursion in the translation sets, e.g. `echo 'abc' | tr ab bx`. – n.st Sep 13 '17 at 16:44
  • 1
    @n.st, yes, that's why I said _in this case_, though I agree it's worth spelling it out. – Stéphane Chazelas Sep 13 '17 at 16:48
  • In the end, I had to work with temp files after all: https://gist.github.com/n-st/048facd0c12f105ac122030fb58b962f — The multibyte characters made it impossible to use GNU `tr` and in our symlink-heavy PXE environment, `sed -i` was a screw-up waiting to happen… :/ – n.st Sep 13 '17 at 17:32
  • @n.st, `iconv -t cp437` seems more appropriate for that. – Stéphane Chazelas Sep 13 '17 at 19:55
  • `iconv` breaks when the input file already contains cp437-encoded bytes, or a mixture of multiple encodings. So while it's preferable in the general case, it's more robust to do manual replacements on this case. – n.st Sep 14 '17 at 07:11
  • @n.st. I see, you're only translated a few characters whose encoding happen to be disjunct in all three charsets. You can't use `tr` (even those supporting multi-byte encoding) or `sed`'s `y` if they are characters in different encodings, so you're down to doing a number of translation of sequences of bytes. – Stéphane Chazelas Sep 14 '17 at 08:28
  • For anyone who is confused by the `1<>` part: https://stackoverflow.com/questions/5767180/opening-a-file-in-write-mode/5767227#5767227 – btwiuse Jan 10 '19 at 04:11
4

As another alternative, if your main issue is the lack of support for changing files in-place, you might be interested in the sponge tool from the moreutils package:

tr 'ots' 'u.x' < file | sponge file

will write to file, but only open file for writing once the input is complete. From the manpage:

sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constructing pipelines that read from and write to the same file.

Unless you have really large files which cannot be held in memory, sponge could work for you.

mindriot
  • 501
  • 1
  • 4
  • 6
  • 2
    One issue with `sponge` is that it still overwrites `file` if `tr` fails (for instance if you had write but not read access to `file`) – Stéphane Chazelas Sep 14 '17 at 08:36
  • Oh, indeed it does; I didn't expect that. Thanks. – mindriot Sep 14 '17 at 09:57
  • See the `cat file >; file` operator of ksh93 which writes the output to a tempfile which is renamed to the destination only if the command succeeds (but like `sed -i`, that creates a new file instead of overwriting the original). – Stéphane Chazelas Sep 14 '17 at 10:27