PDF to txt

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
teemonie
Posts: 2
Joined: 2016-05-13T08:55:24-07:00
Authentication code: 1151

PDF to txt

Post by teemonie »

Hi -

I keep spinning my wheels on this one and apologize if it is an easy one. All I am trying to do is convert a PDF to txt. I dont need additional settings now just need to convert.

Windows box with 7.0.1-Q16

Thanks in advance.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PDF to txt

Post by fmw42 »

What do you mean by converting to txt? Do you mean ascii text or do you mean txt: format. Imagemagick will not process PDF to extract ascii text. You need OCR software. If you want to convert the raster equivalent of the PDF to Imagemagick txt format, then

Code: Select all

convert image.pdf txt:-
That will send the pixel information data to the terminal. Or send it to a file as

Code: Select all

convert image.pdf txt:- > textfile.txt
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: PDF to txt

Post by snibgo »

Does the PDF contains text as ASCII text? (You can check this with Adobe Viewer.) Is so, then ImageMagick is the wrong tool for the job. A better tool is "pdftotext".
snibgo's IM pages: im.snibgo.com
teemonie
Posts: 2
Joined: 2016-05-13T08:55:24-07:00
Authentication code: 1151

Re: PDF to txt

Post by teemonie »

@fmw42 I am trying to convert into a text format. The first think I am trying to do is convert a PDF that contains text and pictures to be all Text in the .txt format. I hope that makes sense.

I thought that Imagemagick was running OCR in the background that that is why I went this route.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PDF to txt

Post by fmw42 »

snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: PDF to txt

Post by snibgo »

As Fred says, IM doesn't do OCR. A program that does is "tesseract".
snibgo's IM pages: im.snibgo.com
Post Reply