[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: pdfttotext in pure lua?
- From: Nagaev Boris <bnagaev@...>
- Date: Sun, 23 Oct 2016 23:32:46 +0300
On Sun, Oct 23, 2016 at 11:03 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> 2016-10-23 20:26 GMT+02:00 Dietmar Segbert <didi.segbert@arcor.de>:
>
>> is there a module in pure lua, that converts a pdf-file to a text-file?
>
> I once spent a great deal of time, without gettinga s far as I wanted
> to, on a pure Lua program that produces Markdown starting from the
> XML output given by "pdftohtml -xml".
>
> Among the difficulties are: recognizing page headers and footers;
> reassembling words hyphenated at the end of a line; handling
> footnotes and citations; recognizing tabular input; etc.
>
> All that makes me doubt very strongly that the desired module exists.
>
Debian has package poppler-utils [1] which provides utility pdftotext.
[1] https://packages.debian.org/sid/poppler-utils
--
Best regards,
Boris Nagaev