18

Excel files can be converted to CSV using:

$ libreoffice --convert-to csv --headless --outdir dir file.xlsx

Everything appears to work just fine. The encoding, though, is set to something wonky. Instead of a UTF-8 mdash (—) that I get if I do a "save as" manually from LibreOffice Calc, it gives me a \227 (�). Using file on the CSV gives me "Non-ISO extended-ASCII text, with very long lines". So, two questions:

  1. What on earth is happening here?
  2. How do I tell libreoffice to convert to UTF-8?

The specific file that I'm trying to convert is here.

Scott Deerwester
  • 391
  • 1
  • 3
  • 14

2 Answers2

17

Apparently LibreOffice tries to use ISO-8859-1 by default, which is causing the problem. In response to this bug report, a new parameter --infilter has been added. The following command produces U+2014 em dash:

libreoffice  --convert-to csv --infilter=CSV:44,34,76,1 --headless --outdir dir file.xlsx

I tested this with LO 5.0.3.2. From the bug report, it looks like the earliest version containing this option is LO 4.4.

See also: https://ask.libreoffice.org/en/question/13008/how-do-i-specify-an-input-character-coding-for-a-convert-to-command-line-usage/

Jim K
  • 591
  • 2
  • 6
  • Thanks! Still no success though. With this command line: libreoffice --headless --convert-to csv --infilter=CSV:44,34,76,1 file.xlsx --outdir dir; it's still got 0x97 for the em dash. I'm baffled. I'm running LO 4.2.8.2 420m0(Build:2) on Ubuntu 14.04. – Scott Deerwester Feb 02 '16 at 22:00
  • You probably need to upgrade to LO 4.4 or newer, as mentioned in my answer. – Jim K Feb 02 '16 at 22:11
  • 1
    `loffice --convert-to xlsx --infilter=csv:44,34,76 input.csv` worked for me. [Reference](https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options). – Adobe May 20 '17 at 14:22
  • Do you have a link where these `infilter` options are listed ? The link posted by @Adobe is long outdated. – kebs Aug 24 '18 at 14:51
  • `--infilter` seems to be about the input file and that would be why @Adobe 's command works (CSV input) and the OP's command (XLSX input) doesn't - just a guess – golimar Aug 26 '19 at 16:43
1

You could try,

    $ libreoffice --convert-to \
    > csv:"Text - txt - csv (StarCalc)":"44,34,0,1,,0" \
    > --headless --outdir dir file.xlsx 

Here, you have a very detailed help about.

xae
  • 1,971
  • 16
  • 10
  • Thanks for the reply. I'm still not getting it to accept the additional tokens. I've tried --convert-to "csv:Text - txt - csv (StarCalc):44,34,76,1,,0", --convert-to "csv:Text - txt - csv (StarCalc):44,34,76,1,1/2/2/2/3/2/4/2/5/2/6/2/7/2/8/2/9/1/10/3" and various other combinations. Any suggestions? – Scott Deerwester Feb 02 '16 at 21:54
  • `csv:"Text - txt - csv (StarCalc)":"44,34,0,1,,0"`, csv`:`"double quoted"`:`"double quoted" – xae Feb 02 '16 at 21:59
  • That's only going to be relative to the shell, but I tried it anyway with the same results. – Scott Deerwester Feb 02 '16 at 22:03
  • [Here](https://ask.libreoffice.org/en/question/21916/cli-convert-ods-to-csv-with-semicolon-as-delimiter/) are using `unoconv` and directly `soffice` for a related task,maybe could help. – xae Feb 02 '16 at 22:08