I'm trying to remove all punctuation from a text file using the sed command, but I don't quite know how to.
Asked
Active
Viewed 1,122 times
0
Kusalananda
- 320,670
- 36
- 633
- 936
Kris Omelia
- 1
- 1
-
Does this answer your question? [using sed with ampersand (&)](https://unix.stackexchange.com/questions/296705/using-sed-with-ampersand) – mashuptwice Feb 25 '22 at 22:23
1 Answers
4
If by "punctuation", you mean any of the characters in the set
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
(which is the set of "POSIX punctuation characters", written as [:punct:] in a regular expression) and if by "remove" you mean "delete completely", then it would be more efficient to do this with tr like so:
tr -d '[:punct:]' <file.in >file.out
This tells tr to delete all characters from the above set in its input stream, reading from a file called file.in and writing the result to some file file.out.
With sed, you would do the same thing with
sed 's/[[:punct:]]//g' <file.in >file.out
... but I would expect this to be slightly slower (possibly only noticeably so on large input data).
Kusalananda
- 320,670
- 36
- 633
- 936
-
Though I in this case I'm usually interested in the words and replace the punctuation with spaces or new lines, as this makes the result better processable. So: tr '[[:punct:]]' ' ' or tr '[[:punct:]]' '\n' might help the OP better. – JdeHaan Feb 26 '22 at 09:46
-
@JdeHaan The user in the question did not further specify what they wanted to do beyond removing the punctuation. Your `tr` command would be more correct if written as `tr '[:punct:]' '[\n*]'` (see the `tr` manual for that syntax). – Kusalananda Feb 26 '22 at 09:54