Converting PDF to image results in hidden differences
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Converting PDF to image results in hidden differences
Hi there. I'm new with ImageMagick and have been using it to convert PDFs to jpgs to compare them through Python's Imaging Library(PIL). However, when I do the compare two supposedly identical images don't match. There's an option in PIL to show the difference and the difference is shown as white in a black image, but what appears is just a black image, meaning there should be no difference. Then I tried using Beyond Compare and lo and behold, there are sections that have little discrepancies on them if I use Picture Compare with Tolerance Mode on.
My question is if ImageMagick converts a PDF file into an image shouldn't white be white and there shouldn't be any hidden stuff behind it? How could I convert the PDFs so that those parts hidden in white are converted fully to white?
If you like, I'd post sample images to show what I'm talking about.
My question is if ImageMagick converts a PDF file into an image shouldn't white be white and there shouldn't be any hidden stuff behind it? How could I convert the PDFs so that those parts hidden in white are converted fully to white?
If you like, I'd post sample images to show what I'm talking about.
I'm not really Stephen Malkmus.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to image results in hidden differences
Yes it would be best to give us an example. Please note that IM uses Ghostscript to deal with PDF conversions to other formats. So be sure you have the latest Ghostscript installed and note that if your image is CMYK or has transparency that could be a factor. So it would be best to have one of your pdf files to examine.
Does the pdf have an imbedded image or is it a true vector pdf? That may also have some effect as the format of the imbedded image may be a factor also.
You should post your pdf to some free hosting service such as drop box and then put a link from there on your next post here.
Does the pdf have an imbedded image or is it a true vector pdf? That may also have some effect as the format of the imbedded image may be a factor also.
You should post your pdf to some free hosting service such as drop box and then put a link from there on your next post here.
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
The Ghostscript installed seems to be 9.05. How would one be able to discern if an image is indeed CMYK? I tried saving the images into different file formats such as BMP, PNG and GIF and I think with some of them the discrepancies seem to get reduced but it never goes away completely.
The PDF does have embedded image on it but that part doesn't get flagged as a discrepancy. It seems that with some of the elements in the PDF such as text there's an invisible coating for each letter that is wrapped around it. How would one know if a PDF is a true vector one?
I'm very sorry kind sir, but I can't upload PDF samples because they contain text of highly confidential matter.
Here are images taken from Beyond Compare:
Image 1:
What this is are three images the first and third one are being compared and the difference is the image on the center. The blue is the supposed difference. Notice that the two images are identical, and that the differences are located inside the black box. The images show a colon and a black box. The box is drawn using PIL.
Image 2:
This image shows an underline and a black box beneath it. You can see that there are two differences on the whites and three on the blacks.
These images are supposed to be identical, and yet, Beyond Compare shows them not to be identical through Tolerance Mode. I've used Binary Mode and they appear to be identical. The problem with that is I have no idea how to compare that way using PIL or in ImageMagick.
I would like to add that I've tried using the simple tutorials for comparing images through ImageMagick but still, the images aren't identical no matter what method I use.
The PDF does have embedded image on it but that part doesn't get flagged as a discrepancy. It seems that with some of the elements in the PDF such as text there's an invisible coating for each letter that is wrapped around it. How would one know if a PDF is a true vector one?
I'm very sorry kind sir, but I can't upload PDF samples because they contain text of highly confidential matter.
Here are images taken from Beyond Compare:
Image 1:
What this is are three images the first and third one are being compared and the difference is the image on the center. The blue is the supposed difference. Notice that the two images are identical, and that the differences are located inside the black box. The images show a colon and a black box. The box is drawn using PIL.
Image 2:
This image shows an underline and a black box beneath it. You can see that there are two differences on the whites and three on the blacks.
These images are supposed to be identical, and yet, Beyond Compare shows them not to be identical through Tolerance Mode. I've used Binary Mode and they appear to be identical. The problem with that is I have no idea how to compare that way using PIL or in ImageMagick.
I would like to add that I've tried using the simple tutorials for comparing images through ImageMagick but still, the images aren't identical no matter what method I use.
I'm not really Stephen Malkmus.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to image results in hidden differences
I have no idea what beyond compare is doing. Can you not create or get some non-proprietary pdf that shows this issue?
Note that PNG now has a rendering intent and that could cause differences with other formats. GIF has limited colors. JPG is lossy. So it is hard to say what differences you might see between formats.
IM has its own compare function. So perhaps you should check that out. See
http://www.imagemagick.org/Usage/compare/
http://www.imagemagick.org/Usage/compare/#statistics
To see if an image is cmyk, just look at the verbose information of that file.
identify -verbose yourimage
The colorspace will say whether the image is cmyk. But if you have some image imbedded in the PDF, I cannot say for sure that IM will get the colorspace of the imbedded image as opposed to the pdf itself. You can look at the verbose information and see if there is other information particular to the imbedded image, such as profiles.
Profiles may also be an issue if there are any.
As a start, get the verbose info and report it back here.
What version of IM and on what platform are you doing this work?
Note that PNG now has a rendering intent and that could cause differences with other formats. GIF has limited colors. JPG is lossy. So it is hard to say what differences you might see between formats.
IM has its own compare function. So perhaps you should check that out. See
http://www.imagemagick.org/Usage/compare/
http://www.imagemagick.org/Usage/compare/#statistics
To see if an image is cmyk, just look at the verbose information of that file.
identify -verbose yourimage
The colorspace will say whether the image is cmyk. But if you have some image imbedded in the PDF, I cannot say for sure that IM will get the colorspace of the imbedded image as opposed to the pdf itself. You can look at the verbose information and see if there is other information particular to the imbedded image, such as profiles.
Profiles may also be an issue if there are any.
As a start, get the verbose info and report it back here.
What version of IM and on what platform are you doing this work?
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
I tried comparing the two images using ImageMagick and viewing the differences by specifying a difference image output and here are the results which are the same as Beyond Compare's:
Here's the verbose information of both the images being compared and it doesn't say CMYK on both images:
Image 1
Image 2
The version of ImageMagick seems to be the latest according to the verbose information, and I'm using Windows 7 to do the converting. I'm having a hard time trying to find a way to replicate the PDFs. What I'm thinking is I'll do a Word document and convert it to PDF by some program. Any suggestions so I could try to replicate the PDF? PrimoPDF, PDFCreator and CutePDFs not available to me.
Here's the verbose information of both the images being compared and it doesn't say CMYK on both images:
Image 1
Code: Select all
Image: 1A.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Class: DirectClass
Geometry: 612x792+0+0
Units: Undefined
Type: TrueColor
Endianess: Undefined
Colorspace: sRGB
Depth: 8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 255 (1)
mean: 218.127 (0.855401)
standard deviation: 87.3755 (0.342649)
kurtosis: 2.21227
skewness: -2.04777
Green:
min: 0 (0)
max: 255 (1)
mean: 218.145 (0.855472)
standard deviation: 87.3578 (0.342579)
kurtosis: 2.21479
skewness: -2.04827
Blue:
min: 0 (0)
max: 255 (1)
mean: 218.024 (0.854996)
standard deviation: 87.3708 (0.342631)
kurtosis: 2.19984
skewness: -2.04355
Image statistics:
Overall:
min: 0 (0)
max: 255 (1)
mean: 218.099 (0.85529)
standard deviation: 87.368 (0.34262)
kurtosis: 2.20896
skewness: -2.04653
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgb(223,223,223)
Matte color: grey74
Transparent color: black
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 75
Orientation: Undefined
Properties:
date:create: 2013-02-21T13:15:32+08:00
date:modify: 2013-02-21T12:57:09+08:00
jpeg:colorspace: 2
jpeg:sampling-factor: 2x2,1x1,1x1
signature: a834a61bf31084219c8623c0b71f8490d250846a9ef6256b8c2ccd8872667e90
Artifacts:
filename: 1A.jpg
verbose: true
Tainted: False
Filesize: 120KB
Number pixels: 485K
Pixels per second: 34.62MB
User time: 0.016u
Elapsed time: 0:01.013
Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
Code: Select all
Image: 1B.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Class: DirectClass
Geometry: 612x792+0+0
Units: Undefined
Type: TrueColor
Endianess: Undefined
Colorspace: sRGB
Depth: 8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 255 (1)
mean: 218.127 (0.855401)
standard deviation: 87.3754 (0.342649)
kurtosis: 2.21226
skewness: -2.04777
Green:
min: 0 (0)
max: 255 (1)
mean: 218.145 (0.855472)
standard deviation: 87.3577 (0.342579)
kurtosis: 2.21479
skewness: -2.04827
Blue:
min: 0 (0)
max: 255 (1)
mean: 218.024 (0.854996)
standard deviation: 87.3707 (0.34263)
kurtosis: 2.19984
skewness: -2.04355
Image statistics:
Overall:
min: 0 (0)
max: 255 (1)
mean: 218.099 (0.85529)
standard deviation: 87.3679 (0.342619)
kurtosis: 2.20896
skewness: -2.04653
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgb(223,223,223)
Matte color: grey74
Transparent color: black
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 75
Orientation: Undefined
Properties:
date:create: 2013-02-21T13:15:32+08:00
date:modify: 2013-02-21T12:57:09+08:00
jpeg:colorspace: 2
jpeg:sampling-factor: 2x2,1x1,1x1
signature: b11f55a5997d769ee240141c41f2e290de8571831c1f36d2a2173e587d95319c
Artifacts:
filename: 1B.jpg
verbose: true
Tainted: False
Filesize: 120KB
Number pixels: 485K
Pixels per second: 34.62MB
User time: 0.016u
Elapsed time: 0:01.013
Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
Last edited by discretiongrove on 2013-02-20T23:15:19-07:00, edited 1 time in total.
I'm not really Stephen Malkmus.
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
Hey there, this is quite an update. I don't know how but it seems that if I use ImageMagick to compare and do this:
The value returned is 0 meaning there are no differences!
Code: Select all
compare -metric AE -fuzz 5% 1A.jpg 1B.jpg output.jpg
I'm not really Stephen Malkmus.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to image results in hidden differences
That is probably due to the -fuzz 5% which says do not consider them different if within 5% of being the same.discretiongrove wrote:Hey there, this is quite an update. I don't know how but it seems that if I use ImageMagick to compare and do this:The value returned is 0 meaning there are no differences!Code: Select all
compare -metric AE -fuzz 5% 1A.jpg 1B.jpg output.jpg
But there is perhaps a misunderstanding. I am not sure what you are comparing. Are you trying to convert the PDF to jpg with two different applications? Or are you converting two different PDF files? Using two applications going to jpg, the two applications can compress totally differently or use different compression codes.
Try converting to png or to tif so that you do not get any losses that could be different.
Also I was asking for the verbose information from the PDF files not the jpg files. I want to see what you are starting with.
Please clarify the process you are using and what the starting image or images are?
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
I'm sorry if I'm not clear. Now that you've mentioned it, the fuzz factor could affect my comparisons. Using a fuzz factor of 5 would not detect a stray dot or a comma as a difference then?
Here's the verbose information of the PDF that I got using ImageMagick:
PDF1
PDF2
A little background of what I'm doing: I'm creating an automated regression testing suite that involves PDFs. I generate PDFs and once they're approved as free of errors I store them as the basis for checking on the next batch of PDFs to be generated if something changed or not. Basically the same system generates these PDFs.
So, the automated PDF testing goes like this:
1. I take two PDFs, the error free PDF(PDF A) and the newly generated PDF(PDF B).
2. I convert them both to JPEGs using ImageMagick. This produces two folders each with the pages of each PDFs.
3. Using Python Imaging Library(PIL) I draw black or white boxes on some parts of the images on both PDFs to hide the dynamically changing elements because they shouldn't be compared.
4. I compare the pages by using PIL. I'm using Python because it takes care of everything: getting the PDFs, invoking the ImageMagick command to convert the PDFs, compare the images, count the matching and non-matching pages, and generate a report by the end.
I did a research and it seems that there's no library yet that does comparison of PDFs that ignore regions. That's why I'm doing it with images because there are lots of image comparison applications now. DiffPDF actually compares the PDFs and even shows regions but they don't have a developer library available.
Here's the verbose information of the PDF that I got using ImageMagick:
PDF1
Code: Select all
Image: 1.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 612x792+0+0
Resolution: 72x72
Print size: 8.5x11
Units: Undefined
Type: TrueColorAlpha
Endianess: Undefined
Colorspace: sRGB
Depth: 16/8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
alpha: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 65535 (1)
mean: 57546.6 (0.878105)
standard deviation: 21415.6 (0.326781)
kurtosis: 3.35258
skewness: -2.31303
Green:
min: 0 (0)
max: 65535 (1)
mean: 57548.9 (0.87814)
standard deviation: 21401.1 (0.32656)
kurtosis: 3.358
skewness: -2.31375
Blue:
min: 0 (0)
max: 65535 (1)
mean: 57524.3 (0.877765)
standard deviation: 21415.5 (0.32678)
kurtosis: 3.33692
skewness: -2.30845
Alpha:
min: 0 (0)
max: 65535 (1)
mean: 4667.68 (0.0712242)
standard deviation: 13904.9 (0.212175)
kurtosis: 8.74319
skewness: -3.10793
Image statistics:
Overall:
min: 0 (0)
max: 65535 (1)
mean: 58371.8 (0.890697)
standard deviation: 19802.8 (0.302171)
kurtosis: 4.4746
skewness: -2.5238
Alpha: srgba(255,255,255,0) #FFFFFFFFFFFF0000
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgba(223,223,223,1)
Matte color: grey74
Transparent color: none
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Scene: 0 of 6
Compression: Undefined
Orientation: Undefined
Properties:
date:create: 2013-02-21T14:02:25+08:00
date:modify: 2013-02-21T14:02:25+08:00
pdf:HiResBoundingBox: 612x792+0+0
pdf:Version: PDF-1.2
signature: 7238be2ec5fa3f2ef83a75db908770232293229cfa1efc1461663070e81a28ff
Profiles:
Profile-icc: 2576 bytes
Description: Artifex Software sRGB ICC Profile
Manufacturer: Artifex Software sRGB ICC Profile
Model: Artifex Software sRGB ICC Profile
Copyright: Copyright Artifex Software 2011
Artifacts:
filename: 1.pdf
verbose: true
Tainted: False
Filesize: 66.5KB
Number pixels: 485K
Pixels per second: 4.53MB
User time: 0.109u
Elapsed time: 0:01.107
Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
Code: Select all
Image: 2.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 612x792+0+0
Resolution: 72x72
Print size: 8.5x11
Units: Undefined
Type: TrueColorAlpha
Endianess: Undefined
Colorspace: sRGB
Depth: 16/8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
alpha: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 65535 (1)
mean: 57543.3 (0.878054)
standard deviation: 21419.5 (0.326841)
kurtosis: 3.34917
skewness: -2.3123
Green:
min: 0 (0)
max: 65535 (1)
mean: 57545.5 (0.878089)
standard deviation: 21405 (0.326619)
kurtosis: 3.35459
skewness: -2.31301
Blue:
min: 0 (0)
max: 65535 (1)
mean: 57521 (0.877714)
standard deviation: 21419.4 (0.326839)
kurtosis: 3.33352
skewness: -2.30772
Alpha:
min: 0 (0)
max: 65535 (1)
mean: 4668.28 (0.0712334)
standard deviation: 13905.2 (0.21218)
kurtosis: 8.74354
skewness: -3.10795
Image statistics:
Overall:
min: 0 (0)
max: 65535 (1)
mean: 58369.1 (0.890656)
standard deviation: 19806 (0.302221)
kurtosis: 4.47138
skewness: -2.52318
Alpha: srgba(255,255,255,0) #FFFFFFFFFFFF0000
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgba(223,223,223,1)
Matte color: grey74
Transparent color: none
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Scene: 0 of 6
Compression: Undefined
Orientation: Undefined
Properties:
date:create: 2013-02-21T14:02:34+08:00
date:modify: 2013-02-21T14:02:34+08:00
pdf:HiResBoundingBox: 612x792+0+0
pdf:Version: PDF-1.2
signature: 9c96c4a131515698a79a0bf90c594ebbabaab89767a16980ee4e156b3000956b
Profiles:
Profile-icc: 2576 bytes
Description: Artifex Software sRGB ICC Profile
Manufacturer: Artifex Software sRGB ICC Profile
Model: Artifex Software sRGB ICC Profile
Copyright: Copyright Artifex Software 2011
Artifacts:
filename: 2.pdf
verbose: true
Tainted: False
Filesize: 66.5KB
Number pixels: 485K
Pixels per second: 4.039MB
User time: 0.109u
Elapsed time: 0:01.119
Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
So, the automated PDF testing goes like this:
1. I take two PDFs, the error free PDF(PDF A) and the newly generated PDF(PDF B).
2. I convert them both to JPEGs using ImageMagick. This produces two folders each with the pages of each PDFs.
3. Using Python Imaging Library(PIL) I draw black or white boxes on some parts of the images on both PDFs to hide the dynamically changing elements because they shouldn't be compared.
4. I compare the pages by using PIL. I'm using Python because it takes care of everything: getting the PDFs, invoking the ImageMagick command to convert the PDFs, compare the images, count the matching and non-matching pages, and generate a report by the end.
I did a research and it seems that there's no library yet that does comparison of PDFs that ignore regions. That's why I'm doing it with images because there are lots of image comparison applications now. DiffPDF actually compares the PDFs and even shows regions but they don't have a developer library available.
I'm not really Stephen Malkmus.
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
Update:
I tried converting to PNG and TIF and here are the results:
PNG - converting to this format and comparing them through the ImageMagick commands I specified above resulted in a value of 64. It is also worth noting that the output of the difference is the same as the JPG version.
TIF - converting and comparing using this format did reduce the difference but the first and last set of horizontal lines still exist as discrepancies. See the second image on the post with the ImageMagick outputs. The value now of the comparison is 32.
I think I found a way to create new PDFs and replicate the error. I'll post the PDFs once I have them.
I tried converting to PNG and TIF and here are the results:
PNG - converting to this format and comparing them through the ImageMagick commands I specified above resulted in a value of 64. It is also worth noting that the output of the difference is the same as the JPG version.
TIF - converting and comparing using this format did reduce the difference but the first and last set of horizontal lines still exist as discrepancies. See the second image on the post with the ImageMagick outputs. The value now of the comparison is 32.
I think I found a way to create new PDFs and replicate the error. I'll post the PDFs once I have them.
I'm not really Stephen Malkmus.
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
Hey there, I've finally duplicated the errors. Now I think I'm correct in assuming that there are phantom colors that surround the text that are invisible to the naked eye.
Here are the PDFs:
http://totaldrench.files.wordpress.com/2013/02/pdf1.pdf
http://totaldrench.files.wordpress.com/2013/02/pdf2.pdf
Here are the images produced:
Image1 converted from the PDF2
Image1 with a black box
Image2 converted from the PDF2
Image2 with a black box
Output of comparison through ImageMagick
Output of comparison through Beyond Compare
I'm using the black boxes to cover the last lines, which as you can see are different. As you can see on the black boxes there are differences even though it's all black. Any thoughts on how black would stay black or something? Let me know if you can't download any of the stuff posted here.
Update - Here's the verbose information of the PDF files:
PDF1
PDF2
Here are the PDFs:
http://totaldrench.files.wordpress.com/2013/02/pdf1.pdf
http://totaldrench.files.wordpress.com/2013/02/pdf2.pdf
Here are the images produced:
Image1 converted from the PDF2
Image1 with a black box
Image2 converted from the PDF2
Image2 with a black box
Output of comparison through ImageMagick
Output of comparison through Beyond Compare
I'm using the black boxes to cover the last lines, which as you can see are different. As you can see on the black boxes there are differences even though it's all black. Any thoughts on how black would stay black or something? Let me know if you can't download any of the stuff posted here.
Update - Here's the verbose information of the PDF files:
PDF1
Code: Select all
Image: PDF1.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 612x792+0+0
Resolution: 72x72
Print size: 8.5x11
Units: Undefined
Type: PaletteAlpha
Endianess: Undefined
Colorspace: sRGB
Depth: 16/8-bit
Channel depth:
red: 1-bit
green: 1-bit
blue: 1-bit
alpha: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 65535 (1)
mean: 64776.2 (0.988422)
standard deviation: 7010.76 (0.106977)
kurtosis: 81.3809
skewness: -9.13132
Green:
min: 0 (0)
max: 65535 (1)
mean: 64776.2 (0.988422)
standard deviation: 7010.76 (0.106977)
kurtosis: 81.3809
skewness: -9.13132
Blue:
min: 0 (0)
max: 65535 (1)
mean: 64776.2 (0.988422)
standard deviation: 7010.76 (0.106977)
kurtosis: 81.3809
skewness: -9.13132
Alpha:
min: 0 (0)
max: 65535 (1)
mean: 442.002 (0.00674452)
standard deviation: 4634.13 (0.0707123)
kurtosis: 139.837
skewness: -11.563
Image statistics:
Overall:
min: 0 (0)
max: 65535 (1)
mean: 64855.4 (0.98963)
standard deviation: 6498.6 (0.0991623)
kurtosis: 92.222
skewness: -9.66482
Alpha: srgba(255,255,255,0) #FFFFFFFFFFFF0000
Colors: 26
Histogram:
1050: ( 0, 0, 0,65535) #000000000000 black
877: ( 0, 0, 0,34952) #0000000000008888 srgba(0,0,0,0.533333)
496: ( 0, 0, 0, 4369) #0000000000001111 srgba(0,0,0,0.0666667)
402: ( 0, 0, 0,48059) #000000000000BBBB srgba(0,0,0,0.733333)
379: ( 0, 0, 0,17476) #0000000000004444 srgba(0,0,0,0.266667)
329: ( 0, 0, 0,30583) #0000000000007777 srgba(0,0,0,0.466667)
313: ( 0, 0, 0,56797) #000000000000DDDD srgba(0,0,0,0.866667)
301: ( 0, 0, 0,52428) #000000000000CCCC srgba(0,0,0,0.8)
291: ( 0, 0, 0,13107) #0000000000003333 srgba(0,0,0,0.2)
242: ( 0, 0, 0, 8738) #0000000000002222 srgba(0,0,0,0.133333)
236: ( 0, 0, 0,39321) #0000000000009999 srgba(0,0,0,0.6)
232: ( 0, 0, 0,26214) #0000000000006666 srgba(0,0,0,0.4)
216: ( 0, 0, 0,61166) #000000000000EEEE srgba(0,0,0,0.933333)
137: ( 0, 0, 0,43690) #000000000000AAAA srgba(0,0,0,0.666667)
97: ( 0, 0, 0,21845) #0000000000005555 srgba(0,0,0,0.333333)
3: ( 0, 0, 0,12336) #0000000000003030 srgba(0,0,0,0.188235)
2: ( 0, 0, 0,32896) #0000000000008080 srgba(0,0,0,0.501961)
2: ( 0, 0, 0,42919) #000000000000A7A7 srgba(0,0,0,0.654902)
1: ( 0, 0, 0,20046) #0000000000004E4E srgba(0,0,0,0.305882)
1: ( 0, 0, 0,36751) #0000000000008F8F srgba(0,0,0,0.560784)
1: ( 0, 0, 0,36494) #0000000000008E8E srgba(0,0,0,0.556863)
1: ( 0, 0, 0,38807) #0000000000009797 srgba(0,0,0,0.592157)
1: ( 0, 0, 0,39835) #0000000000009B9B srgba(0,0,0,0.607843)
1: ( 0, 0, 0,46774) #000000000000B6B6 srgba(0,0,0,0.713725)
1: ( 0, 0, 0,31354) #0000000000007A7A srgba(0,0,0,0.478431)
479092: (65535,65535,65535, 0) #FFFFFFFFFFFF0000 srgba(255,255,255,0)
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgba(223,223,223,1)
Matte color: grey74
Transparent color: none
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Compression: Undefined
Orientation: Undefined
Properties:
date:create: 2013-02-21T16:39:03+08:00
date:modify: 2013-02-21T16:39:03+08:00
pdf:HiResBoundingBox: 612x792+0+0
pdf:Version: PDF-1.2
signature: 6a5d37e948627efd81ba7a32ce337e489b19727eab07366d9c9c45316e0d5d96
Profiles:
Profile-icc: 2576 bytes
Description: Artifex Software sRGB ICC Profile
Manufacturer: Artifex Software sRGB ICC Profile
Model: Artifex Software sRGB ICC Profile
Copyright: Copyright Artifex Software 2011
Artifacts:
filename: PDF1.pdf
verbose: true
Tainted: False
Filesize: 14.9KB
Number pixels: 485K
Pixels per second: 7.694MB
User time: 0.016u
Elapsed time: 0:01.062
Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
Code: Select all
Image: PDF2.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 612x792+0+0
Resolution: 72x72
Print size: 8.5x11
Units: Undefined
Type: PaletteAlpha
Endianess: Undefined
Colorspace: sRGB
Depth: 16/8-bit
Channel depth:
red: 1-bit
green: 1-bit
blue: 1-bit
alpha: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 65535 (1)
mean: 64769.2 (0.988315)
standard deviation: 7042.78 (0.107466)
kurtosis: 80.5881
skewness: -9.0878
Green:
min: 0 (0)
max: 65535 (1)
mean: 64769.2 (0.988315)
standard deviation: 7042.78 (0.107466)
kurtosis: 80.5881
skewness: -9.0878
Blue:
min: 0 (0)
max: 65535 (1)
mean: 64769.2 (0.988315)
standard deviation: 7042.78 (0.107466)
kurtosis: 80.5881
skewness: -9.0878
Alpha:
min: 0 (0)
max: 65535 (1)
mean: 446.799 (0.00681771)
standard deviation: 4662.76 (0.0711492)
kurtosis: 138.422
skewness: -11.5066
Image statistics:
Overall:
min: 0 (0)
max: 65535 (1)
mean: 64848.9 (0.989531)
standard deviation: 6529.62 (0.0996356)
kurtosis: 91.3056
skewness: -9.61787
Alpha: srgba(255,255,255,0) #FFFFFFFFFFFF0000
Colors: 26
Histogram:
1073: ( 0, 0, 0,65535) #000000000000 black
881: ( 0, 0, 0,34952) #0000000000008888 srgba(0,0,0,0.533333)
500: ( 0, 0, 0, 4369) #0000000000001111 srgba(0,0,0,0.0666667)
413: ( 0, 0, 0,48059) #000000000000BBBB srgba(0,0,0,0.733333)
383: ( 0, 0, 0,17476) #0000000000004444 srgba(0,0,0,0.266667)
331: ( 0, 0, 0,30583) #0000000000007777 srgba(0,0,0,0.466667)
317: ( 0, 0, 0,56797) #000000000000DDDD srgba(0,0,0,0.866667)
298: ( 0, 0, 0,13107) #0000000000003333 srgba(0,0,0,0.2)
298: ( 0, 0, 0,52428) #000000000000CCCC srgba(0,0,0,0.8)
242: ( 0, 0, 0, 8738) #0000000000002222 srgba(0,0,0,0.133333)
235: ( 0, 0, 0,26214) #0000000000006666 srgba(0,0,0,0.4)
233: ( 0, 0, 0,39321) #0000000000009999 srgba(0,0,0,0.6)
216: ( 0, 0, 0,61166) #000000000000EEEE srgba(0,0,0,0.933333)
135: ( 0, 0, 0,43690) #000000000000AAAA srgba(0,0,0,0.666667)
96: ( 0, 0, 0,21845) #0000000000005555 srgba(0,0,0,0.333333)
2: ( 0, 0, 0,32896) #0000000000008080 srgba(0,0,0,0.501961)
2: ( 0, 0, 0,12336) #0000000000003030 srgba(0,0,0,0.188235)
2: ( 0, 0, 0,42919) #000000000000A7A7 srgba(0,0,0,0.654902)
1: ( 0, 0, 0,20046) #0000000000004E4E srgba(0,0,0,0.305882)
1: ( 0, 0, 0,36751) #0000000000008F8F srgba(0,0,0,0.560784)
1: ( 0, 0, 0,36494) #0000000000008E8E srgba(0,0,0,0.556863)
1: ( 0, 0, 0,38807) #0000000000009797 srgba(0,0,0,0.592157)
1: ( 0, 0, 0,39835) #0000000000009B9B srgba(0,0,0,0.607843)
1: ( 0, 0, 0,46774) #000000000000B6B6 srgba(0,0,0,0.713725)
1: ( 0, 0, 0,31354) #0000000000007A7A srgba(0,0,0,0.478431)
479040: (65535,65535,65535, 0) #FFFFFFFFFFFF0000 srgba(255,255,255,0)
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgba(223,223,223,1)
Matte color: grey74
Transparent color: none
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Compression: Undefined
Orientation: Undefined
Properties:
date:create: 2013-02-21T16:39:13+08:00
date:modify: 2013-02-21T16:39:13+08:00
pdf:HiResBoundingBox: 612x792+0+0
pdf:Version: PDF-1.2
signature: 29e98677c90f1c5a14a0dd2721c1dd965d70bd835bbb9428c94075479d84a36a
Profiles:
Profile-icc: 2576 bytes
Description: Artifex Software sRGB ICC Profile
Manufacturer: Artifex Software sRGB ICC Profile
Model: Artifex Software sRGB ICC Profile
Copyright: Copyright Artifex Software 2011
Artifacts:
filename: PDF2.pdf
verbose: true
Tainted: False
Filesize: 14.9KB
Number pixels: 485K
Pixels per second: 30.29MB
User time: 0.016u
Elapsed time: 0:01.016
Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
I'm not really Stephen Malkmus.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Converting PDF to image results in hidden differences
I did the following, on Windows 7, IM v6.7.9:
The first compare gave "791", so 791 pixels were different. The second gave 0; no pixels were different. As expected.
I suspect your black boxes are not entirely covering the text that changes.
The PDFs have a transparent background. This may cause a complication; how are you drawing the black boxes?
Code: Select all
"%IMG%convert" pdf1.pdf pdf1.png
"%IMG%convert" pdf2.pdf pdf2.png
"%IMG%compare" -metric AE pdf1.png pdf2.png pdfDiff.png
"%IMG%convert" pdf1.png -draw "rectangle 115,177 293,189" pdf1a.png
"%IMG%convert" pdf2.png -draw "rectangle 115,177 293,189" pdf2a.png
"%IMG%compare" -metric AE pdf1a.png pdf2a.png pdfDiffa.png
I suspect your black boxes are not entirely covering the text that changes.
The PDFs have a transparent background. This may cause a complication; how are you drawing the black boxes?
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to image results in hidden differences
If you can block out white areas that seems to be different and get no errors using PNG, then I would have to assume the issue is in the created PDF files. So that goes back to whatever tool is creating the PDFs, assuming I am following your descriptions correctly.
Or the PDF to PNG conversion is introducing differences. Then that would be a Ghostscript issue if done with IM?
If the PDF files have a transparent background, then the Ghostscript needs to be using sDEVICE=pngalpha rather than pnmraw assuming there is just one page to the PDFs. If more than one page, then Ghostscript cannot correctly handle multiple transparent pages. See your delegates.xml file
<delegate decode="ps:alpha" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pnmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>
If using pnmraw, it is possible that GS gets rid of the alpha channel, leaving the data with the marks under the transparency.
Furthermore, the compare is ignoring the alpha channel. So if one image has something under it that is made transparent by an alpha channel, then the compare will notice it anyway.
Perhaps you should flatten the PDF files against a white background when converting to 24-bit PNG. If you convert to 8-bit PNG that could also cause differences.
Or the PDF to PNG conversion is introducing differences. Then that would be a Ghostscript issue if done with IM?
If the PDF files have a transparent background, then the Ghostscript needs to be using sDEVICE=pngalpha rather than pnmraw assuming there is just one page to the PDFs. If more than one page, then Ghostscript cannot correctly handle multiple transparent pages. See your delegates.xml file
<delegate decode="ps:alpha" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pnmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>
If using pnmraw, it is possible that GS gets rid of the alpha channel, leaving the data with the marks under the transparency.
Furthermore, the compare is ignoring the alpha channel. So if one image has something under it that is made transparent by an alpha channel, then the compare will notice it anyway.
Perhaps you should flatten the PDF files against a white background when converting to 24-bit PNG. If you convert to 8-bit PNG that could also cause differences.
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
Hi snibgo, I'm drawing the black boxes through PIL. Now that you made an example of it, I haven't tried using ImageMagick to draw the black boxes.
Hi fmw42, well I did just that but the problem is if some of the elements are really close to each other then the whites get differences too. See the image difference with three sets of horizontal lines? Those elements are close to each other, so I think there is an overlapping. What did you mean about Ghostscript not being able to handle multiple pages? It can't convert a PDF with multiple pages into jpg with pngalpha on? I'm converting the images to JPG, should I switch to PNG? I thought of converting the images to PNG because I remembered that one has transparent backgrounds. Still, the difference persisted.
When you mean flatten I should make another white JPG then use it as a background for the converted image as PNG?
By the way, the PDFs are generated by iText, a developer library for generating PDFs.
Hi fmw42, well I did just that but the problem is if some of the elements are really close to each other then the whites get differences too. See the image difference with three sets of horizontal lines? Those elements are close to each other, so I think there is an overlapping. What did you mean about Ghostscript not being able to handle multiple pages? It can't convert a PDF with multiple pages into jpg with pngalpha on? I'm converting the images to JPG, should I switch to PNG? I thought of converting the images to PNG because I remembered that one has transparent backgrounds. Still, the difference persisted.
When you mean flatten I should make another white JPG then use it as a background for the converted image as PNG?
By the way, the PDFs are generated by iText, a developer library for generating PDFs.
I'm not really Stephen Malkmus.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to image results in hidden differences
Ghostscript, so I am told, cannot process multiple pages with transparency using device=pngalpha. device=pnmraw can process multiple pages but not with transparency. I am not sure if the transparency is lost or what happens. With pngalpha, you will likely get only the first page if there are multiple pages with transparency.
Part of your problem may be you are converting to a lossy jpg format. Thus two different image conversions to jpg (from different pdfs) may cause different parts of the image to compress differently. Thus you may be seeing this difference.
PNG or TIF is not lossy and both will keep transparency. But you are limited to converting only single page pdfs with transparency by GS.
Part of your problem may be you are converting to a lossy jpg format. Thus two different image conversions to jpg (from different pdfs) may cause different parts of the image to compress differently. Thus you may be seeing this difference.
PNG or TIF is not lossy and both will keep transparency. But you are limited to converting only single page pdfs with transparency by GS.
- discretiongrove
- Posts: 13
- Joined: 2013-02-19T01:20:09-07:00
- Authentication code: 6789
Re: Converting PDF to image results in hidden differences
Well, I tried changing that configuration to pngraw and when I tried to convert the PDF to PNG I got this error:
About my conversion to JPG I just use the simplest conversion method in ImageMagick
I'll try doing that white on transparent thing and see if it helps.
By the way I tried converting to PNG using pngalpha and all the pages have their transparencies.
Code: Select all
Unknown device: pngraw
Unrecoverable error: undefined in .uninstallpagedevice
Code: Select all
convert PDF.pdf PDF.jpg
By the way I tried converting to PNG using pngalpha and all the pages have their transparencies.
I'm not really Stephen Malkmus.