Re: Unix cmd line utility for Multibyte PDF -> Text

Hi Michael,


Quoting Michael Monaghan <Michael.Monaghan@Sun.COM>:

> 
> Hi,
> 
> I need a pdf -> text command line utility for Unix/Solaris that
> won't corrupt non-ASCII characters.


A few years ago I used PDFBox, a Java PDF library, to extract text from 
PDF (http://www.pdfbox.org/). I seem to remember that it also worked 
for non-ASCII characters.

Best regards,

Christophe

-- 
Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group on 
Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/ 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Received on Friday, 29 September 2006 05:05:08 UTC