2

I have a programming book in EPUB format and I'm trying to convert it to TXT. For that I'm using the utility ebook-convert from calibre. The problem is that the standard usage:

ebook-convert book.epub book.txt

removes indentation in source code samples. E.g. a sample in the book looks so:

class A {
  private int a;
}

But in the resulted TXT:

class A {
private int a;
}

After reading the utility's man page I've tried the following options:

--keep-ligatures
--pretty-print
--change-justification=original

but with no result. How to achieve it?

ka3ak
  • 1,235
  • 4
  • 18
  • 30
  • What OS and language settings are you using? Please recall that many docs are using *non-breaking spaces* (NBSP) that are coded into UTF-8 or with several other bytes, when not in ASCII. Try fiddle with your OS/terminal language or *locale* settings. – not2qubit May 02 '21 at 10:21
  • The book is english. I'm using Ubuntu 20. `$ locale LANG=en_US.UTF-8` – ka3ak May 02 '21 at 10:24
  • @not2qubit Are you sure the utility shouldn't be responsible for this? For example the utility `pdftotext` has `-layout` option to keep original formatting of a PDF in TXT. – ka3ak May 02 '21 at 10:27
  • I have no idea. I just had a similar issue with OCR reading a PDF and prog was insisting to extract *nbsp*'s since the doc was coded in a foreign language. – not2qubit May 02 '21 at 10:30
  • you could convert to HTML (or just unzip the EPUB and use the HTML within directly) and try your luck with `links -dump` or similar. if that doesn't work either you might have to have a look at the HTML directly and write your own helper script for converting the code snippets. – frostschutz May 02 '21 at 10:35

1 Answers1

8

Use pandoc instead of ebook-convert. For example:

$ pandoc -f epub -t plain -o filename.txt filename.epub

I just tested this with a python epub, and it retained the indentation without a problem.

pandoc can also convert to other formats, including various flavours of markdown, asciidoc, latex, odt (Libre/Open Office text), rst, rtf, pdf, and more.

cas
  • 1
  • 7
  • 119
  • 185
  • Yeah, I know the tool. Not sure why I've not tried it this time. The output looks good. The original indentation is preserved. Thanks! – ka3ak May 02 '21 at 11:28