Page 1 of 2
Need help to improve special scanned images
Posted: 2010-08-24T07:27:20-07:00
by amirian
Hi
I'm newbie to imagemagic. I've found it a suitable tool to meet my requirements.
I've 300 number of scanned jpg images of a book (this is a sample:
) and need a script to make following operations on it to make png image files(similar to this:
sorry but I can't attach image to my post):
1) rotate 270 degrees clockwise
2) split to two images
3) trim the images removing the total thick border
4) remove noises
5) convert to a 2-color image of the characters as foreground and the other as background (The image is not said to be of 2 colors, it can be 16-color but only using 2 of them)
6) remove the background (make it transparent)
7) resize the images (maintaining aspect ratio) to make their width be 800 (it's initially larger and needs to be made smaller but keeping quality remain if possible)
8 ) crop or add height to images to make them all equal in height
9) save as a 16-color or 2-color png format
and help would be appreciated. I really need a quick response
Re: Need help to improve special scanned images
Posted: 2010-08-25T05:32:14-07:00
by amirian
I forgot the deskew method in the above list, it can be number 2.5
any guidelines?
Re: Need help to improve special scanned images
Posted: 2010-08-25T08:47:01-07:00
by fmw42
several steps will be hard if possible at all
split the image - unless you supply the split coordinates, an automated split will be hard
trim the border - IM will not know how to distinguish your border pattern from the characters you want unless you manually crop the image into two parts and leave out the border
Re: Need help to improve special scanned images
Posted: 2010-08-26T19:47:23-07:00
by anthony
If you have a rough idea of the paper color, you could try setting paper to white and all other colors to black , then do the trim rotate on that to find the 'page'.
Alternative match on the distinctive green border. Looks like it is very different to everything else. Maybe work in HSV color space to extract a light 'green' shade. That border will then give to the crop and distortion points to extract the two (or more) pages.
Once you have the other junk removed, with just the page, you can then work to clean up the page..
Basically take things one step at a time.
Re: Need help to improve special scanned images
Posted: 2010-08-27T01:42:57-07:00
by amirian
Thanks for valuable responses.
splitting the page will not be a hard task. It must only take out the two new pages equal in size cutting the page at the middle.
detecting the border to crop could be done with a simple image processing algorithm to detect 4 corners that make a fix sized square. But what is hard for me is that I AM NEWBIE to IM and unfortunately don't have enough time to study. I'm sure learning IM would be my first interesting subject later but however, I made this message to get help to start working with IM with this special script need.
Thanx
Re: Need help to improve special scanned images
Posted: 2010-08-27T10:15:23-07:00
by fmw42
Try these steps:
1) manually crop the outside area. This would not be needed if your scan did not include black areas outside the image. You could help yourself here by putting a large white sheet on the scanner behind your pages.
convert 3486kh1.jpg[908x1192+108+272] 3486kh1_crop.png
2) make everything but the black letters become white
convert 3486kh1_crop.png -fuzz 45% -fill white +opaque black 3486kh1_crop_f10b.png
Note if I had added +repage at the end of the command(s) above, it would not be needed in the beginning of the next command. I should have remembered to do that to ensure you don't have a virtual canvas in your image.
3) manually crop into two halves (top and bottom)
convert 3486kh1_crop_f45b.png +repage -gravity north -crop x50%+0+0 +repage 3486kh1_crop_f45b_croptop.png
convert 3486kh1_crop_f45b.png +repage -gravity south -crop x50%+0+0 +repage 3486kh1_crop_f45b_cropbottom.png
4) rotate each image 90 degrees ccw
convert 3486kh1_crop_f45b_croptop.png -rotate -90 3486kh1_crop_f45b_croptop_rot90.png
convert 3486kh1_crop_f45b_cropbottom.png -rotate -90 3486kh1_crop_f45b_cropbottom_rot90.png
5) fuzz trim to the edges of the black letters (you may need to increase the 10%), then resize to height 800, then convert to 2 colors (black and white), add a border of 10, then save as gif
convert 3486kh1_crop_f45b_croptop_rot90.png -fuzz 10% -trim +repage -resize 800x800 -quantize gray +dither -colors 2 -bordercolor white -border 10 -depth 2 3486kh1_crop_f45b_croptop_rot90_trc2b10d2.gif
convert 3486kh1_crop_f45b_cropbottom_rot90.png -fuzz 10% -trim +repage -resize 800x800 -quantize gray +dither -colors 2 -bordercolor white -border 10 -depth 2 3486kh1_crop_f45b_cropbottom_rot90_trc2b10d2.gif
see commands at
http://www.imagemagick.org/Usage/
Re: Need help to improve special scanned images
Posted: 2010-08-28T11:13:20-07:00
by amirian
Thank you very much for your reply. I found it completely operational, but still 4 things remain:
1- deskewing the image due to the border tilt
2- removing the background: it is now white but I want it be transparent when shown on a wallpaper.
3- I want all characters with black color that appear on the page border or outside the border be removed. It can be done while removing the border, but I don't know how.
4- The most important issue is that I want the characters remain as smooth as they are in the original image. I know that it cannot be done with changing the 45% fuzz to any other percent because the original character borders is not sharp: the color of the characters change softly to the background color and the smooth font border will be damaged if I apply a fix color bound for dividing the image to foreground and background parts.
can you help me with these problems?
Re: Need help to improve special scanned images
Posted: 2010-08-28T12:47:48-07:00
by fmw42
Thank you very much for your reply. I found it completely operational, but still 4 things remain:
1- deskewing the image due to the border tilt
Add the -deskew command after the -rotate -90
see
http://www.imagemagick.org/script/comma ... u24#deskew
2- removing the background: it is now white but I want it be transparent when shown on a wallpaper.
In the last command add -transparent white. This will convert all white to transparent. If necessary add -fuzz XX% before the -transparent white to get things close to white if you see white halo's around your black characters.
3- I want all characters with black color that appear on the page border or outside the border be removed. It can be done while removing the border, but I don't know how.
This is the hard part that I cannot help with. The manual crop removed the pattern border. That is the only way I know to remove it. Thus it removed the stuff outside the border that you want. Sorry! I don't think you are going to be able to do this.
You can try cropping so that you just remove the outside black area. Then process the same as before to make everything but near black go to white (or transparent). That might should remove the patterned border and keep the outside characters. The key is to get rid of anything close to black by cropping first.
4- The most important issue is that I want the characters remain as smooth as they are in the original image. I know that it cannot be done with changing the 45% fuzz to any other percent because the original character borders is not sharp: the color of the characters change softly to the background color and the smooth font border will be damaged if I apply a fix color bound for dividing the image to foreground and background parts.
can you help me with these problems?
You cannot do that if you want to have 2 colors. The -resize will keep everything nice and smooth, but as soon as you reduce colors, especially to 2, you will lose quality. You could go to 16 grays or 8 grays and that might help.
This is the best I can offer.
Re: Need help to improve special scanned images
Posted: 2010-08-29T23:08:27-07:00
by amirian
fmw42 wrote:3- I want all characters with black color that appear on the page border or outside the border be removed. It can be done while removing the border, but I don't know how.
This is the hard part that I cannot help with. The manual crop removed the pattern border. That is the only way I know to remove it. Thus it removed the stuff outside the border that you want. Sorry! I don't think you are going to be able to do this.
You can try cropping so that you just remove the outside black area. Then process the same as before to make everything but near black go to white (or transparent). That might should remove the patterned border and keep the outside characters. The key is to get rid of anything close to black by cropping first.
Thanks fmw42 for your guidelines. It was so difficult for a beginner, but I finally made the following solution to remove characters on (or outside) the border:
0) split the page to 2 parts:
convert source.png +repage -gravity north -crop x50%+0+0 +repage 1a.png
convert source.png +repage -gravity south -crop x50%+0+0 +repage 1b.png
1) rotate the page:
convert 1a.png -rotate -90 -deskew 40% 2a.png
2) change all colors but black to white with a HIGH fuzz to a temporary file, this file helps to discover the main page dimentions. The appropriate fuzz was gained by try and error:
convert 2a.png -fuzz 31% -fill white +opaque black temp.png
3,4) Referring to the topic "Trimming 'Noisy' Images" (
http://www.imagemagick.org/Usage/crop/#trim_blur), crop the page with coordinates gained from the temporary file adding a 35-width border to the page to ignore any character missing. Here also the appropriate fuzz and border width were gained by try and error:
convert 2a.png -crop `convert temp.png -virtual-pixel edge -blur 10x10 -fuzz 25% -trim -format '%[fx:w+70]x%[fx:h+70]+%[fx:page.x-35]+%[fx:page.y-35]' info:` +repage DestA.png
and repeated levels 1 to 4 for 1b.png...
But the fuzz constants don't work well when I try it on similar pages due to differences. Now I guess that involving the black horizental and vertical histograms can be a better substitute, because the characters outside the page bounds don't exceed a histogram threshold. This threshold helps me removing them from the page, and also adding offset to the page to fix with other pages.
But now I don't know how to involve histogram information to trim the pages using IM. Could you help me?
Re: Need help to improve special scanned images
Posted: 2010-08-30T10:52:50-07:00
by fmw42
I am not sure what you mean by using Histogram information. You can get a histogram easily. See
http://www.imagemagick.org/Usage/files/#histogram. But then you need to process it somehow.
The main IM function that uses histograms directly is -contrast-stretch, but I am not sure how that helps. See
http://www.imagemagick.org/script/comma ... st-stretch and
http://www.imagemagick.org/Usage/color/#histogram.
The other thing would be to try to use some automatic threshold technique. I have several scripts on my web site.
IM also has a local threshold technique, -lat, see
http://www.imagemagick.org/script/comma ... ns.php#lat
You might also see my textcleaner script and see if that might help
Re: Need help to improve special scanned images
Posted: 2010-08-30T11:58:08-07:00
by Wolfgang Woehl
For an alternative (to Fred's awesome suggestions) look at
http://unpaper.berlios.de/
Re: Need help to improve special scanned images
Posted: 2010-08-30T12:41:38-07:00
by amirian
Wow, you have a rich script collection, Fred. I had a look and found that the Whiteboard somehow meets my requirement , but not exactly what I'm looking for. Now the main problem is the characters outside the main text.
Sorry for my poor english, perhaps I couldn't write out my purpose correctly. The solution is easy. What I mean by Horizontal histogram, is the number of black pixels (assuming a fuzz factor) located on each row and the collection makes a graph in the y axis. It is the same for Vertical histogram: showing on the x axis the number of black pixels locating on each column of the image. The black characters on the borders and outside are not very much and don't reach a threshold in the histogram, so, applying a threshold to the horizontal black color histogram and trimming it along with the original image, removes characters that locate on the left or right sides of the text, simultaneously, applying threshold on vertical histogram and trimming it will remove characters on the top or bottom of the page. So histogram can here be used to define the correct crop coordinates. Another advantage of using the vertical histogram is fixing the y coordinate of the first text line in all pages with a simple offset addition for each page if needed. Do you understand me?
Re: Need help to improve special scanned images
Posted: 2010-08-30T12:46:50-07:00
by amirian
unpaper is a good tool, but not helping me, it may reduce the text font quality because I'm using a colorful image with multiple patterns and need to remove them and make a 2-color (or 8-color) image. Am I right?
Re: Need help to improve special scanned images
Posted: 2010-08-30T17:04:38-07:00
by fmw42
What I mean by Horizontal histogram, is the number of black pixels (assuming a fuzz factor) located on each row and the collection makes a graph in the y axis. It is the same for Vertical histogram: showing on the x axis the number of black pixels locating on each column of the image. The black characters on the borders and outside are not very much and don't reach a threshold in the histogram, so, applying a threshold to the horizontal black color histogram and trimming it along with the original image, removes characters that locate on the left or right sides of the text, simultaneously, applying threshold on vertical histogram and trimming it will remove characters on the top or bottom of the page. So histogram can here be used to define the correct crop coordinates. Another advantage of using the vertical histogram is fixing the y coordinate of the first text line in all pages with a simple offset addition for each page if needed. Do you understand me?
I am not sure I still understand. However here is a suggestion. Average the image down to one row and get the histogram of that row. Do the same by averaging the image down to one column and get the histogram.
convert yourimage.png -filter box -resize "WIDTHx1!" rowimage.png
convert yourimage.png -filter box -resize "1xHEIGHT!" columnimage.png
-scale may be faster and may do the same thing as -filter box -resize, I am not sure.
Re: Need help to improve special scanned images
Posted: 2010-08-31T18:51:56-07:00
by amirian
I finally succeeded to crop the pages correctly, not by involving the histogram, but focusing on the green color of borders. Two more question:
fmw42 wrote:
2- removing the background: it is now white but I want it be transparent when shown on a wallpaper.
In the last command add -transparent white. This will convert all white to transparent. If necessary add -fuzz XX% before the -transparent white to get things close to white if you see white halo's around your black characters.
!)Adding "-transparent color" increases the image bit depth form 1 to 32. How can it be fixed? The final png image only contains a black color and transparent, that requires one bit depth(2 different colors). If impossible, then how could the bit depth be decreased to 4 instead of 1?
2)This is the old problem: image quantization to black and white will not preserve the characters smoothness (I mean a smooth curve border separating the characters and background, not the gray level transformation from black to white). I agree with you to some extent that smooth fonts need more different gray-scale colors that makes the image monochrome. But my 2-color final image quality is very low, somehow similar to a sub-sampled picture that is resized to a larger size. To prove this, repeat a look at my desired target image in my first post. Although it is 16-color, a simple quantization to 2-color reserves more quality for that image in comparison with my last images. So I believe that a higher level of quality and font smoothness can be achieved because the source images have a good resolution, but I don't know how.