Page 1 of 1
Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-24T12:57:08-07:00
by NoRulez
Hello,
I want to get the image size from an image which is embedded in a PDF file.
The size of the image itself is 1024 x 768.
If a open the PDF in Acrobat Reader, mark the image and copy it to e.g. paint, then the image size is correct (1024x768)
However, if I use the API function PingImage, then columns and rows (which i was expected to be the correct size) is 1475x1106 which is definitivly wrong.
Any hints?
Does anyone know how to get the correct image size?
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-24T13:13:18-07:00
by snibgo
Does the image occupy the entire page? IM will rasterize the page, and return those dimensions. If the page contains a smaller raster image, IM won't give that to you. Pdfimages is a better tool for that.
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-24T13:35:25-07:00
by NoRulez
But I need to get the image out from the PDF with IM, is there an other way?
The requirement is to use only the IM C API for that
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-24T14:02:15-07:00
by snibgo
IM will rasterize the entire page, at whatever resolution you give it in "-density".
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T03:57:38-07:00
by NoRulez
The information of the image stands in the dictionary of the PDF.
Code: Select all
<</BitsPerComponent 8/ColorSpace/DeviceRGB/DecodeParms<</Blend 1/ColorTransform 1/Colors 3/Columns 1024/HSamples[1 1 1 1]/QFactor 0.0/Rows 768/VSamples[1 1 1 1]>>/Filter/DCTDecode/Height 768/ImageName/ab123.jpg/Intent/RelativeColorimetric/Length 599237/Name/ab123.jpg/Subtype/Image/Type/XObject/Width 1024>>stream
Is there a way to get the dictionary and the stream from the pdf itself as raw data?
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T04:36:17-07:00
by snibgo
IM doesn't read the PDF file. It simply passes the PDF to Ghostscript, which does the rasterizing.
Perhaps Ghostscript has a command to do what you want.
I think pdfimages can do what you want. (You can test this.) Perhaps you can add an entry to delegates.xml to call pdfimages and pass the results back to IM.
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T05:37:05-07:00
by snibgo
snibgo wrote:Perhaps you can add an entry to delegates.xml to call pdfimages and pass the results back to IM.
Yes, that works, for IM v6.9.5-3 on Windows 8.1. Assuming convert and pdfimages are both in the system path, I add this to delegates.xml:
Code: Select all
<delegate decode="expdf" command="cmd.exe /c (pdfimages -all "%i" "%u" ) & (convert "%u-*.*" "miff:%o" )" />
Note that the command string in the delegates.xml line is specific to Windows. It will be different for bash.
Now, when an input file to convert is prefixed with "expdf:" I get all the raster images from the PDF, with no resizing. For example:
Code: Select all
f:\web\im>convert expdf:pump.pdf info:
expdf:pump.pdf[0] MIFF 1620x1306 1620x1306+0+0 8-bit TrueColor sRGB 11.41MB 0.031u 0:00.023
expdf:pump.pdf[1] MIFF 2000x633 2000x633+0+0 8-bit TrueColor sRGB 11.41MB 0.141u 0:00.141
expdf:pump.pdf[2] MIFF 2000x633 2000x633+0+0 8-bit Palette Gray 2c 11.41MB 0.156u 0:00.147
I am using "convert" as a delegate to ImageMagick. The command first calls pdfimages to extract all the images (which could be JPG, PNG, or whatever), then it calls "convert" to read all those images and put them in a single MIFF file, which is %o, the output of the delegated task. Then IM reads the miff file in the ordinary way.
EDIT: This leaves the images from pdfimages as files on disk (%u). They should probably be deleted.
EDIT2: Some versions of IM sanitise delegate commands, removing wildcards "*" and "?". For those, we can put the commands in a shell script, which we call with %i, %o and %u.
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T06:14:41-07:00
by NoRulez
Thanks for the hint
Are there C API calls too?
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T06:26:11-07:00
by snibgo
If you edit delegates.xml as I have shown, C API calls such as PingImage will indirectly do the extraction.
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T13:15:53-07:00
by NoRulez
After I changed the xml file i got the following on the command line:
Code: Select all
>convert expdf:pdf_test.pdf info:
pdfimages version 3.04
Copyright 1996-2014 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
-f <int> : first page to convert
-l <int> : last page to convert
-j : write JPEG images as JPEG files
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-q : don't print any messages or errors
-cfg <string> : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
convert: unable to open image 'C:\Users\user1\AppData\Local\Temp\magick-113316WkMC-8vc-UNR-*.*': Invalid argument @ error/blob.c/OpenBlob/3094.
convert: no images defined `miff:C:\Users\user1\AppData\Local\Temp\magick-11331617V09dbGPIik' @ error/convert.c/ConvertImageCommand/3254.
convert: delegate failed `cmd.exe /c (pdfimages -all "%i" "%u" ) & (convert "%u-*.*" "miff:%o" )' @ error/delegate.c/InvokeDelegate/1845.
convert: unable to open module file 'C:\Program Files\ImageMagick-7.0.5-Q16\modules\coders\IM_MOD_RL_EXPDF_.dll': No such file or directory @ warning/module.c/GetMagickModulePath/680.
convert: unable to open file 'C:/Users/user1/AppData/Local/Temp/magick-113316VJ90QPEiEmqR': No such file or directory @ error/constitute.c/ReadImage/549.
convert: no images defined `info:' @ error/convert.c/ConvertImageCommand/3254.
In my program i call
Code: Select all
Image *image = PingImage(imageInfo, exception);
but i doesn't see any results
Re: Wrong image size when reading ImageInfo from PDF
Posted: 2017-04-25T14:00:57-07:00
by snibgo
Your version of pdfimages is old. It doesn't have the "-all" option. The version I have (from Cygwin) says:
Code: Select all
pdfimages version 0.30.0
Copyright 2005-2015 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
-f <int> : first page to convert
-l <int> : last page to convert
-png : change the default output format to PNG
-tiff : change the default output format to TIFF
-j : write JPEG images as JPEG files
-jp2 : write JPEG2000 images as JP2 files
-jbig2 : write JBIG2 images as JBIG2 files
-ccitt : write CCITT images as CCITT files
-all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt
-list : print list of images instead of saving
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-p : include page numbers in output file names
-q : don't print any messages or errors
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information