0

I insert the data into the service csvlint.io because I get the error 22 Problem encountered on line 2 in Weka in trying to import CSV file the following way.

`java -jar weka.jar` > Explorer > 
    Preprocess > Open file > [select file format CSV] 
    > [Choose CSV file]

Similar error message is in the thread Not recognised as an csv file in Weka which I have before solved by inserting the data into LibreOffice, autofixing there and saving as CSV but I would like to find a commandline solution there. I get the following warning in the csvlint.io service from there although I have generated Data in Debian 9.

Structural problem: Non-standard Line Breaks on row 1

Your CSV appears to use LF line-breaks. While this will be fine in most cases, RFC 4180 specifies that CSV files should use CR-LF (a carriage-return and line-feed pair, e.g. \r\n). This may be labelled as "Windows line endings" on some systems.

Data

Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
"Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
"Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

To remove horizontal white space there, you can run tr -d "[:blank:]" on the data but it should not necessary. I think line-endings are not an issue here because fixing the file with dos2unix or unix2dos (meuh) does not solve the issue.

OS: Debian 9

Léo Léopold Hertz 준영
  • 6,788
  • 29
  • 91
  • 193
  • 1
    check out `unix2dos` from package `dos2unix`. – meuh Jul 25 '17 at 18:51
  • @meuh Please, see the body for my attempt with `dos2unix`. – Léo Léopold Hertz 준영 Jul 25 '17 at 18:56
  • No, the command is `unix2dos` to go from Linux to Windows line-endings. – meuh Jul 25 '17 at 19:01
  • @meuh Yes, but I think Weka wants Linux line-endings because it is open-source program, not source though. What do you think? - - I also tried to open the file from `unix2dos` but I still get the same error. – Léo Léopold Hertz 준영 Jul 25 '17 at 19:02
  • 1
    If you get the same error whether you pass the file through `dos2unix` or `unix2dos`, then it strongly suggests that line-endings are not the issue. – steeldriver Jul 25 '17 at 19:38
  • Doesn't your Weka error message say at what character position the error starts on line 2? Perhaps your data `24.10.2010` is not recognised as a valid date. In your previous question you converted `23072010-2200+0300` to a valid date format, but you do not show it in this post. – meuh Jul 25 '17 at 19:39
  • 1
    GNU [recutils](http://www.gnu.org/software/recutils/) may be able to do what you need (especially its `csv2rec` and `rec2csv` tools). BTW is the third field `Aika_l Un` supposed to be a date with a space and a number (`24072010-0600+0300 70`) - that would make it a string, not a date field. – cas Jul 26 '17 at 00:51
  • 1
    another idea is to fix up the date formats to common, easily-parsed forms. try something like `awk -F', ' -v OFS=, '{gsub(" ",",",$3)}; NR==1 {$1=$1;print}; NR > 1 {split($22,a,"."); $22 = a[3]"-"a[2]"-"a[1]; print }' data.csv`. That replaces the space in field 3 with a , (i.e. turning it into two fields, Aika_l and Un), and changes the date in field 22 from DD.MM.YYYY to YYYY-MM-DD. I'm guessing that this might stop csvlint.io from complaining about your csv file. – cas Jul 26 '17 at 01:17
  • @cas Sorry, typo in the data. Please, see again. What do you think? I used `tr` to remove spaces so I did not notice the mistake. – Léo Léopold Hertz 준영 Jul 26 '17 at 06:38
  • 2
    It looks like valid CSV to me. The date format is a bit weird, but that doesn't stop it being valid csv (will probably affect anything that tries to parse it though). You might want to try converting the dates in fields 2 & 3 to YYYY-MM-DD hh:mm +TZ or similar. – cas Jul 26 '17 at 08:39

1 Answers1

0

Cas' answer in comment

awk -F', ' -v OFS=, '{gsub(" ",",",$3)}; NR==1 {$1=$1;print}; NR > 1 {split($22,a,"."); $22 = a[3]"-"a[2]"-"a[1]; print }' data.csv
Léo Léopold Hertz 준영
  • 6,788
  • 29
  • 91
  • 193