How do you remove all punctuation using the sed command?

Question

I'm trying to remove all punctuation from a text file using the sed command, but I don't quite know how to.

Does this answer your question? [using sed with ampersand (&)](https://unix.stackexchange.com/questions/296705/using-sed-with-ampersand) — mashuptwice, Feb 25 '22 at 22:23

score 4 · Answer 1 · answered Feb 25 '22 at 22:40

4

If by "punctuation", you mean any of the characters in the set

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

(which is the set of "POSIX punctuation characters", written as [:punct:] in a regular expression) and if by "remove" you mean "delete completely", then it would be more efficient to do this with tr like so:

tr -d '[:punct:]' <file.in >file.out

This tells tr to delete all characters from the above set in its input stream, reading from a file called file.in and writing the result to some file file.out.

With sed, you would do the same thing with

sed 's/[[:punct:]]//g' <file.in >file.out

... but I would expect this to be slightly slower (possibly only noticeably so on large input data).

answered Feb 25 '22 at 22:40

Kusalananda

320,670
36
633
936

Though I in this case I'm usually interested in the words and replace the punctuation with spaces or new lines, as this makes the result better processable. So: tr '[[:punct:]]' ' ' or tr '[[:punct:]]' '\n' might help the OP better. – JdeHaan Feb 26 '22 at 09:46
@JdeHaan The user in the question did not further specify what they wanted to do beyond removing the punctuation. Your `tr` command would be more correct if written as `tr '[:punct:]' '[\n*]'` (see the `tr` manual for that syntax). – Kusalananda Feb 26 '22 at 09:54

How do you remove all punctuation using the sed command?

1 Answers1