convert problem for pdf with text

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
vdvb
Posts: 3
Joined: 2012-12-18T05:31:17-07:00
Authentication code: 6789

convert problem for pdf with text

Post by vdvb »

I am trying to get a "screenshot" of the first page for a pdf file(thumbnail for webpage).
these pdf files are scanned books later processed with an ocr tool
the convert is only working if there are no text fields on the cover(first page) of the book.

am i missing some parameters?
thx in advance


convert command :
convert -colorspace rgb -quality 80 -thumbnail 150x150 ./BE-KBR00_A-0580834_0000-00-00.pdf[0] ./BE-KBR00_A-0580834_0000-00-00.jpg
os linux
files
thumbnail
https://docs.google.com/open?id=0ByL7Fu ... nVjZkpwUHM
original pdf(download for good resolution)
https://docs.google.com/open?id=0ByL7Fu ... UlSdUsxTlE
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert problem for pdf with text

Post by fmw42 »

I do not seem to be able to get to that pdf file
vdvb
Posts: 3
Joined: 2012-12-18T05:31:17-07:00
Authentication code: 6789

Re: convert problem for pdf with text

Post by vdvb »

same PDF maybe this is downloadable
https://docs.google.com/open?id=0ByL7Fu ... DRjbVdvbjA
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert problem for pdf with text

Post by fmw42 »

vdvb wrote:same PDF maybe this is downloadable
https://docs.google.com/open?id=0ByL7Fu ... DRjbVdvbjA


identify same.pdf

same.pdf[0] PDF 662x975 662x975+0+0 16-bit Bilevel Gray 569KB 0.010u 0:00.009
same.pdf[1] PDF 668x968 668x968+0+0 16-bit Bilevel Gray 569KB 0.000u 0:00.009
same.pdf[2] PDF 678x964 678x964+0+0 16-bit Bilevel Gray 569KB 0.000u 0:00.009
same.pdf[3] PDF 667x951 667x951+0+0 16-bit Bilevel Gray 569KB 0.000u 0:00.009
same.pdf[4] PDF 669x961 669x961+0+0 16-bit Bilevel Gray 569KB 0.000u 0:00.000
same.pdf[5] PDF 671x969 671x969+0+0 16-bit Bilevel Gray 569KB 0.000u 0:00.000
same.pdf[6] PDF 674x971 674x971+0+0 16-bit Bilevel Gray 569KB 0.000u 0:00.000


shows 7 pages, but

convert same.pdf +adjoin same_%d.tif

or

convert same.pdf[0-6] +adjoin same_%d.tif

Only gets the first page.


This is probably indicative of the fact that your pdf file has alpha data (though perfectly opaque alpha channel). So you need to find your delegates.xml file, edit it the line ps:alpha so that the Device is pnmraw rather than pngalpha. IM cannot do both at the same time. One or the other only --- transparency in one page, or no transparency in multiple pages.

On my system:

find /usr | grep "delegates.xml"
/usr/local/etc/ImageMagick/delegates.xml

<delegate decode="ps:alpha" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pnmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

I edited it as above and then ran:

convert same.pdf +adjoin same_%d.tif

and got all seven tif files labeled same_0.tif ... same_6.tif
vdvb
Posts: 3
Joined: 2012-12-18T05:31:17-07:00
Authentication code: 6789

Re: convert problem for pdf with text

Post by vdvb »

thx for the help already
i changed the delegates.xml but no succes

the problem is not that the convert not works, but the quality
check a screenshot from the converted pdf
https://docs.google.com/open?id=0ByL7Fu ... Vp1eXEtbXc
and check the screenshot from the original pdf
https://docs.google.com/open?id=0ByL7Fu ... jJ0d0tuUmc
i dont know if you can see the images(problem with gdrive) but you can download them and see the difference.

Thx in advance
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert problem for pdf with text

Post by fmw42 »

After editing my delegates.xml file to set sDEVICE=pnmraw, everything works just fine for me. The results are noise free with or without the thumbnail. But note the thumbnail must be after reading the input pdf. I am on IM 6.8.1.0 Q16 Mac OSX Snow Leopard


convert BE-KBR00_A-0580834_0000-00-00.pdf -thumbnail 150x150 -quality 80 +adjoin BE-KBR00_A-0580834_0000-00-00_%d.jpg


convert BE-KBR00_A-0580834_0000-00-00.pdf -quality 80 +adjoin BE-KBR00_A-0580834_0000-00-00_%d.jpg


Perhaps you need to upgrade your version of ImageMagick, Ghostscript and/or libjpeg
plang
Posts: 1
Joined: 2013-01-04T09:48:34-07:00
Authentication code: 6789

Re: convert problem for pdf with text

Post by plang »

It worked for me too: after editing my delegates.xml file with set sDEVICE=pnmraw, my PNG thumbnail generation from a PDF file works again with ImageMagick 6.8 as it used to with ImageMagick 6.6.

Just a quick note: I don't understand exactly where this problem comes from, if ImageMagick or Ghoscript is the cause, but I have the feeling it will break quite a lot of softwares, and quickly. If it is feasible, I would suggest making the change in ImageMagick as soon as possible...
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert problem for pdf with text

Post by fmw42 »

I believe it is a Ghostscript issue and not IM.
Post Reply