Hi -
I keep spinning my wheels on this one and apologize if it is an easy one. All I am trying to do is convert a PDF to txt. I dont need additional settings now just need to convert.
Windows box with 7.0.1-Q16
Thanks in advance.
PDF to txt
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to txt
What do you mean by converting to txt? Do you mean ascii text or do you mean txt: format. Imagemagick will not process PDF to extract ascii text. You need OCR software. If you want to convert the raster equivalent of the PDF to Imagemagick txt format, then
That will send the pixel information data to the terminal. Or send it to a file as
Code: Select all
convert image.pdf txt:-
Code: Select all
convert image.pdf txt:- > textfile.txt
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: PDF to txt
Does the PDF contains text as ASCII text? (You can check this with Adobe Viewer.) Is so, then ImageMagick is the wrong tool for the job. A better tool is "pdftotext".
snibgo's IM pages: im.snibgo.com
Re: PDF to txt
@fmw42 I am trying to convert into a text format. The first think I am trying to do is convert a PDF that contains text and pictures to be all Text in the .txt format. I hope that makes sense.
I thought that Imagemagick was running OCR in the background that that is why I went this route.
I thought that Imagemagick was running OCR in the background that that is why I went this route.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to txt
No, IM does not do OCR. See http://www.imagemagick.org/Usage/formats/#vector
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: PDF to txt
As Fred says, IM doesn't do OCR. A program that does is "tesseract".
snibgo's IM pages: im.snibgo.com