-1

I am using the following command to replace the non-ASCII characters, single quotes and non printable characters:

sed -i -e "s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' filename

However, I am getting an error:

sed: -e expression #3, char 18: Invalid collation character

How can I replace these characters?

MatthewRock
  • 6,826
  • 6
  • 31
  • 54
Azhar
  • 11
  • 3
  • 7
  • The six characters `'` are all ASCII. Why would you want to replace those? Please [edit] your question to explain. – roaima Jan 21 '16 at 15:37
  • @roaima, Yes I want to remove ' character also from the file because ehen I load this file using informatica ETL tool to another flat file the ' character is loading as single quotes and I dont want the single quotes in my file – Azhar Jan 21 '16 at 15:42
  • Within an ASCII alphabet, `'` has no special meaning. – roaima Jan 21 '16 at 16:59
  • 2
    I can't help thinking that starting off with `tr -cd '[[:print:]]'` instead of `sed` might be worth a look (`-d`: delete, `-c` the complement of). – Ulrich Schwarz Jan 21 '16 at 17:05
  • 1
    The subject of the question asks one thing, then your attempts suggests another thing and your comments yet another thing. Try and write your question with clear and consistent requirements. – Stéphane Chazelas Oct 15 '21 at 12:31
  • You're probably in a multibyte locale (such as UTF-8). Set `LC_COLLATE=C` for the command. And I second Ulrich's recommendation to delete the non-printing characters using `tr`. – Toby Speight Oct 20 '21 at 07:57
  • Does this answer your question? [Replace non-printable characters in perl and sed](https://unix.stackexchange.com/questions/201751/replace-non-printable-characters-in-perl-and-sed) – Toby Speight Oct 20 '21 at 07:59

1 Answers1

0

Try it this way:

LANG=iso-8859-1 sed -i -e"s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' 

or you might find this useful (will replace non-printable and single quotes):

sed -i 's/[^[:print:]]//;s/'\''//g;s/'//g' filename
rush
  • 27,055
  • 7
  • 87
  • 112
  • Are you replacing the "(double quotes) in your sed command i. e sed -i 's/"//g statement. But i dont want to replace the double quotes from file – Azhar Jan 21 '16 at 15:44
  • Thanks rush, Both the command works fine but the performance is very slow. I have 12 files and size of each file varies from 1GB to 6GB and I am removing the non printable characters and single quotes from these files and the process is taking too long. approx 2min for each file. can we improve the performance by any chance. – Azhar Jan 21 '16 at 16:29
  • Hello everyone, the command is taking around 20 min to complete for 12 files..please advise if I can improve the performance.Its urgent – Azhar Jan 21 '16 at 16:52
  • I don't think you can easily increase processing time. `sed -i` copies original file to a new one and then simply replaces the original one. this means the whole process takes approximately the same time as simple copy. 12 files with average size 3GB (from 1 to 6 :)) will make 36GB to copy. So it's fine to take about 20 minutes. – rush Jan 21 '16 at 16:58
  • Hi, When I am running the same script(sed -i -e "s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' filename) in development environment its running fine and completing in 15 min but where as when I am running the same script in TEST environment its throwing error as sed: -e expression #3, char 18: Invalid collation character but when I use the script LANG=iso-8859-1 sed -i -e"s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' in TEST environ ment its running fine but taking around 30min. why there is discrepancy? – Azhar Jan 22 '16 at 14:29
  • are `sed` and operation systems in test and dev envs identical? – rush Jan 22 '16 at 14:33
  • can you please elaborate it more, what exactly need to check. – Azhar Jan 22 '16 at 15:13
  • DEV OS - Linux sed --version GNU sed version 4.1.5 UAT OS - Linux sed --version GNU sed version 4.2.1 – Azhar Jan 22 '16 at 15:22
  • Hey Guys, I am using the command sed -i 's/\o000//g' filename or sed -i 's/\x0//g' filename to remove the NUL character but the command works fine in DEV environment but doesnt work in UAT. OS and sed version are below DEV OS - Linux, sed --version GNU sed version 4.1.5: UAT OS - Linux, sed --version GNU sed version 4.2.1 ..Please advise – Azhar Jan 27 '16 at 11:52