Page 1 of 1
PDF to txt
Posted: 2016-05-13T09:27:40-07:00
by teemonie
Hi -
I keep spinning my wheels on this one and apologize if it is an easy one. All I am trying to do is convert a PDF to txt. I dont need additional settings now just need to convert.
Windows box with 7.0.1-Q16
Thanks in advance.
Re: PDF to txt
Posted: 2016-05-13T09:30:40-07:00
by fmw42
What do you mean by converting to txt? Do you mean ascii text or do you mean txt: format. Imagemagick will not process PDF to extract ascii text. You need OCR software. If you want to convert the raster equivalent of the PDF to Imagemagick txt format, then
That will send the pixel information data to the terminal. Or send it to a file as
Code: Select all
convert image.pdf txt:- > textfile.txt
Re: PDF to txt
Posted: 2016-05-13T09:38:33-07:00
by snibgo
Does the PDF contains text as ASCII text? (You can check this with Adobe Viewer.) Is so, then ImageMagick is the wrong tool for the job. A better tool is "pdftotext".
Re: PDF to txt
Posted: 2016-05-13T10:41:56-07:00
by teemonie
@fmw42 I am trying to convert into a text format. The first think I am trying to do is convert a PDF that contains text and pictures to be all Text in the .txt format. I hope that makes sense.
I thought that Imagemagick was running OCR in the background that that is why I went this route.
Re: PDF to txt
Posted: 2016-05-13T10:45:23-07:00
by fmw42
Re: PDF to txt
Posted: 2016-05-13T18:26:33-07:00
by snibgo
As Fred says, IM doesn't do OCR. A program that does is "tesseract".