0

I'm using cygwin to connect to a tiny VM with limited RAM (512M).

Also, I'm trying to import to a sqlite3 db from a 4GB csv file and I don't have any clue on import, except 2 lines (8.717.201 total)

Seems that I have a control-m char (^M) on 2 lines, so it break csv format and fail to import.

When I try to use sed 's|,^M|,|' file.csv control-m char is write textual ASCII (2 chars), so it doesnt search-replace.

When I do it with a test file, opened in vi for search and replace, I can see that is write as code (blue colored ^M and act like a single char)

How can I fix the csv file? (or how I can write again the control-m sequence on cygwin?

Example problematic line:

$ cat -e test
keyword3,keyword1,keyword4$
keyword1,keyword2,keyword3^M$
,keyword4$
keyword5,keyword1,keyword2$

How should be:

$ cat -e test
keyword3,keyword1,keyword4$
keyword1,keyword2,keyword3,keyword4$
keyword5,keyword1,keyword2$

PS: As you can see, english is not my native language, so.. sorry for any mistake ¯_(ツ)_/¯

user319660
  • 131
  • 2
  • 1
    Related: [What is `^M` and how do I get rid of it?](https://unix.stackexchange.com/questions/32001/what-is-m-and-how-do-i-get-rid-of-it). In the Cygwin terminal, you should be able to use `Ctrl-V` then `Enter`. At least with GNU sed, you can also use `\r` in place of `^M` – steeldriver Jun 08 '21 at 00:03
  • This recent question [How to remove \n in a string](https://unix.stackexchange.com/questions/653224/how-to-remove-n-in-a-string) is also relevant since you are both dealing with line endings that are (legally) embedded within [CT]SV files. – steeldriver Jun 08 '21 at 00:23
  • why comma in `sed 's|,^M|,|' file.csv` ? It should be `sed 's|^M||' file.csv` – matzeri Jun 09 '21 at 06:09
  • If the control character is only on two lines, as you say, it might make sense with such a large file to only do substitution on the affected lines, e.g. `sed '1,2 s|^M||'` if they were on first two lines. As for the first sentence of your question, is that related in any way? If that's a separate issue you should create a new question for it. – B Layer Jun 09 '21 at 21:39

2 Answers2

0

Actually, that carriage return helps you identify wrong line breaks:

s '/^M$/{N;s/^M\n//;}' test

As steeldriver wrote, you can usually produce that ^M by ctrlV followed by ctrlM.

The command means

  • /^M$/{...}: On lines with a page break at the end of a line execute commands in curly brackets
  • Next appends the next line to the buffer with the newline between the lines embedded
  • s/^M\n// substitutes the carriage return + newline with nothing (removes the line break)

This simple script assumes that a line is broken maximum one time. Otherwise you'd need something like

sed 'H;1h;$!d;x;s/^M\n//g' file
Philippos
  • 13,237
  • 2
  • 37
  • 76
0

One of the method to obtain a ^M representing a new line (or enter) for replacement on sed or vi is to type :

ctrlV enter

freeeflyer
  • 41
  • 4