PDF to image causes 1st page margin

IMagick is a native PHP extension to create and modify images using the ImageMagick API. ImageMagick Studio LLC did not write nor does it maintain the IMagick extension, however, IMagick users are welcome to discuss the extension here.
BigLittle
Posts: 13
Joined: 2013-08-08T18:32:14-07:00
Authentication code: 6789

Re: PDF to image causes 1st page margin

Post by BigLittle »

I've found that the height of the document determines the angle of the watermark, as well as the size. I wonder if it's possible to pattern match the watermark, and then remove it? It looks like that is what you were talking about with morphology. I don't understand it, but I'm going to go through the http://www.imagemagick.org/Usage/morphology/ and try to figure it out.
BigLittle
Posts: 13
Joined: 2013-08-08T18:32:14-07:00
Authentication code: 6789

Re: PDF to image causes 1st page margin

Post by BigLittle »

Removing the majority of it seems reasonable actually. By looking at convert subimage-search and possibly morphology, it seems doable. I've toyed around with it but I can't get it to work.

I'd pay you (or anyone) who could figure out how to mostly remove it so OCR would work good. I attached the pattern, and several documents. The 11-0.png document has an exact match, while the others might be slightly different which is the biggest challenge.
Attachments
57-0.png
57-0.png (280.06 KiB) Viewed 10468 times
11-1.png
11-1.png (282.47 KiB) Viewed 10468 times
11-0.png
11-0.png (378.05 KiB) Viewed 10468 times
Pattern.png
Pattern.png (232.99 KiB) Viewed 10468 times
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PDF to image causes 1st page margin

Post by fmw42 »

You might be able to match the watermark only image to the image with text and watermark by using compare -subimage-search. Then you need to use -compose subtract or -compose divide to remove the watermark. morphology open or close will only try to remove small dots. That did not work well for me when I tested that.
BigLittle
Posts: 13
Joined: 2013-08-08T18:32:14-07:00
Authentication code: 6789

Re: PDF to image causes 1st page margin

Post by BigLittle »

I tried Imagick compare, but got an error:

Code: Select all

$Page = new Imagick('Result-0.png');
$Page2 = new Imagick('SearchPatternPNG.png');
$Result = $Page2 -> compareImages($Page, Imagick::COMPOSITE_SATURATE);
$Result[0] -> setImageFormat('jpeg');
echo $Result[0];

Code: Select all

Fatal error: Uncaught exception 'ImagickException' with message 'Compare images failed' in /home/pitmanco/public_html/la/ndrin/search.php:9 Stack trace: #0 /home/pitmanco/public_html/la/ndrin/search.php(9): Imagick->compareimages(Object(Imagick), 44) #1 {main} thrown in /home/pitmanco/public_html/la/ndrin/search.php on line 9
I printed the images and they both show, but when comparing it fails. The error isn't very helpful though. I also tried CL and I didn't get any response or any files created when I did:
compare -subimage-search /fullpath/Result-0.png /fullpath/SearchPatternPNG.png /fullpath/ZZZ.png



Edit: I did run "compare -subimage-search /path/Result-0.png /path/SearchPatternPNG.png /path/ZR-%d.png" which did execute and used a great amount of server resources which then returned no image.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PDF to image causes 1st page margin

Post by fmw42 »

try setting a -metric rmse (or some other metric). Also note that the for subimage-search, the two images must be different sizes (larger first)

compare -metric rmse -subimage-search largeimage smallimage resultimages

if you are running it via PHP exec(), you will likely need to send the result from stderr to stdout

compare -metric rmse -subimage-search largeimage smallimage resultimages 2>&1

see
http://www.imagemagick.org/Usage/compare/
http://www.imagemagick.org/Usage/compare/#statistics
http://www.imagemagick.org/script/compare.php

I do not know much about doing compare in Imagick. But it does work in command line.

see the following old example, but it now needs the addition of -subimage-search
viewtopic.php?f=1&t=14613&p=51076&hilit ... ric#p51076
BigLittle
Posts: 13
Joined: 2013-08-08T18:32:14-07:00
Authentication code: 6789

Re: PDF to image causes 1st page margin

Post by BigLittle »

I tried it with an example photo which worked. For some reason, the search pattern and search image attached will run till the server kills it. I attached them (SearchImage.jpg//SearchPattern.png). I tried it with a small version (attached SearchImageZ/SearchPatternZ) which returned the error: images too dissimilar `/SearchImageZ.jpg' @ error/compare.c/CompareImageCommand/953.

I'll be trying different patterns to see if something works. Any thoughts what I'm doing wrong?
Attachments
SearchImageZ.jpg
SearchImageZ.jpg (31.4 KiB) Viewed 10443 times
SearchPatternZ.jpg
SearchPatternZ.jpg (18.08 KiB) Viewed 10443 times
SearchImage.jpg
SearchImage.jpg (430.65 KiB) Viewed 10443 times
SearchPattern.png
SearchPattern.png (143.54 KiB) Viewed 10443 times
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PDF to image causes 1st page margin

Post by fmw42 »

IM compare is set up for normal type images and will stop if the images are too dissimilar. So add to the command -dissimilarity-threshold 100%. That should keep it from stopping too quickly. If you want to speed it up, you can also add -similarity-threshold somesmallvalue, if you use -metric rmse. It will then stop when it reaches a match that has a metric value smaller than or equal to you somesmallvalue. If you know you have a perfect match you can use somesmallvalue=0 (in quantumrange --- 65535 for Q16 compile or 256 in Q8 compile) or 0% (in range 0 to 100). So that value can be absolute or a percent. If you do not believe the match will be perfect, that raise the value to something bigger than 0 but still small or it will stop at a close but not optimum match. Otherwise, just wait for it to finish when leaving off the -similarity-threshold

see
http://www.imagemagick.org/script/comma ... -threshold
http://www.imagemagick.org/script/comma ... -threshold
Post Reply