Extract text from an image
Posted: 2015-09-09T15:09:40-07:00
So I want to extract the text from an image. Seems doable since i can get multiple images that are different, except for the text that i need. Sample:
I think I should be able to compare the two images and discard what is different, leaving the text. I did a forum search and this was pretty much answered here: viewtopic.php?f=1&t=15584&start=15#p55396
In this case they wanted what was different rather than the same.The summery from user fmw42:
After some research it seams difference is very exact and any mathematical difference in color is seen. These images come from compressed mpg video so i'm sure the compression artifacts is whats causing the discrepancy. I tried adding fuzz to counter this but no real difference.
\( -clone 0 -clone 1 -compose difference -fuzz 10 -composite -threshold 0 \)
Am i using fuzz wrong? Or should i be going about this a different way. Also i can extract more than two images from the video. Can difference take more than two sources for more datapoints? Here are the two original images:
Any help is apprecieated - Drop box url's don't appear to work. I'll try to fix that
Version: ImageMagick 6.9.0-0 Q16 x86_64 2015-07-27
url to images:
https://www.dropbox.com/s/jfo4guecqn175 ... e.png?dl=0
https://www.dropbox.com/s/z89vi8mcvmayo ... 1.jpg?dl=0
https://www.dropbox.com/s/ubrmoiohz9nt4 ... 2.jpg?dl=0
https://www.dropbox.com/s/s607p2a2g1zze ... 2.gif?dl=0
I think I should be able to compare the two images and discard what is different, leaving the text. I did a forum search and this was pretty much answered here: viewtopic.php?f=1&t=15584&start=15#p55396
In this case they wanted what was different rather than the same.The summery from user fmw42:
To get what is the same in each image i should just be able to invert the mask and this command should work. However I don't get a good mask it looks like this:convert image2.png image1.png -alpha off +repage \ <--- read images and turn off any existing alpha channel and remove virtual canvas using +repage so that the image sizes are the true canvas sizes
\( -clone 0 -clone 1 -compose difference -composite -threshold 0 \) \ <--- copy the two input images, get the absolute difference image and threshold to black/white so that white is any difference and black is no change. This becomes a mask.
\( -clone 0 -clone 2 -compose multiply -composite \) \ <--- multiply the mask against a copy of image2 so that the difference areas remain and the rest is turned black, which is the color I deduced was not a current color in the image any where.
-delete 0,1 +swap -alpha off -compose copy_opacity -composite -trim +repage \ <--- delete the original two images, turn alpha off, swap the order as needed for the compose, then put the mask as the alpha (trasparency) channel of the image with the changes surrounded by black, trim and reset the virtual canvas (typical when trimming)
image12diff2a.png <--- write output image
After some research it seams difference is very exact and any mathematical difference in color is seen. These images come from compressed mpg video so i'm sure the compression artifacts is whats causing the discrepancy. I tried adding fuzz to counter this but no real difference.
\( -clone 0 -clone 1 -compose difference -fuzz 10 -composite -threshold 0 \)
Am i using fuzz wrong? Or should i be going about this a different way. Also i can extract more than two images from the video. Can difference take more than two sources for more datapoints? Here are the two original images:
Any help is apprecieated - Drop box url's don't appear to work. I'll try to fix that
Version: ImageMagick 6.9.0-0 Q16 x86_64 2015-07-27
url to images:
https://www.dropbox.com/s/jfo4guecqn175 ... e.png?dl=0
https://www.dropbox.com/s/z89vi8mcvmayo ... 1.jpg?dl=0
https://www.dropbox.com/s/ubrmoiohz9nt4 ... 2.jpg?dl=0
https://www.dropbox.com/s/s607p2a2g1zze ... 2.gif?dl=0