Page 1 of 1

PDF to txt

Posted: 2016-05-13T09:27:40-07:00
by teemonie
Hi -

I keep spinning my wheels on this one and apologize if it is an easy one. All I am trying to do is convert a PDF to txt. I dont need additional settings now just need to convert.

Windows box with 7.0.1-Q16

Thanks in advance.

Re: PDF to txt

Posted: 2016-05-13T09:30:40-07:00
by fmw42
What do you mean by converting to txt? Do you mean ascii text or do you mean txt: format. Imagemagick will not process PDF to extract ascii text. You need OCR software. If you want to convert the raster equivalent of the PDF to Imagemagick txt format, then

Code: Select all

convert image.pdf txt:-
That will send the pixel information data to the terminal. Or send it to a file as

Code: Select all

convert image.pdf txt:- > textfile.txt

Re: PDF to txt

Posted: 2016-05-13T09:38:33-07:00
by snibgo
Does the PDF contains text as ASCII text? (You can check this with Adobe Viewer.) Is so, then ImageMagick is the wrong tool for the job. A better tool is "pdftotext".

Re: PDF to txt

Posted: 2016-05-13T10:41:56-07:00
by teemonie
@fmw42 I am trying to convert into a text format. The first think I am trying to do is convert a PDF that contains text and pictures to be all Text in the .txt format. I hope that makes sense.

I thought that Imagemagick was running OCR in the background that that is why I went this route.

Re: PDF to txt

Posted: 2016-05-13T10:45:23-07:00
by fmw42

Re: PDF to txt

Posted: 2016-05-13T18:26:33-07:00
by snibgo
As Fred says, IM doesn't do OCR. A program that does is "tesseract".