Clean Up a Document for Faxing/OCR
Clean Up a Document for Faxing/OCR
Hi All,
I'm trying to clean up a document using the .NET library, where the image might have some darkess or color on the background. So I'd want to make the background white and improve the clarity of the text if possible.
I've been trying to reproduce the commands from this post: viewtopic.php?f=2&t=26744&hilit=contrast which is basically Fred's Textcleaner script: http://www.fmwconcepts.com/imagemagick/textcleaner/
Has anyone had any luck in doing something like this using the .NET library?
Thanks,
Matt
I'm trying to clean up a document using the .NET library, where the image might have some darkess or color on the background. So I'd want to make the background white and improve the clarity of the text if possible.
I've been trying to reproduce the commands from this post: viewtopic.php?f=2&t=26744&hilit=contrast which is basically Fred's Textcleaner script: http://www.fmwconcepts.com/imagemagick/textcleaner/
Has anyone had any luck in doing something like this using the .NET library?
Thanks,
Matt
Re: Clean Up a Document for Faxing/OCR
What have you tried so far? The names of the methods in the post 'viewtopic.php?f=2&t=26744&hilit=contrast' are most likely methods of the MagickImage class. For example MagickQuantizeImage = MagickImage.Quantize.
Re: Clean Up a Document for Faxing/OCR
Hi, I've tried to recreate this from the previous post:
as this in .NET:
But it's taking the dark background and making it darker. My images are very similar to the ones in Fred's Textcleaner script page: http://www.fmwconcepts.com/imagemagick/ ... /index.php
Also Fred's 2 color threshold script might really be all I need, but I'm having trouble coming up with an equivalent for the .NET code to match: For example I don't see how to specify +dither in .NET.
Thanks for your help dlemstra, ImageMagick rocks!
Code: Select all
MagickLevelImage(wand,0.0,0.25,MaxRGB);
MagickNegateImage(wand,false);
MagickAdaptiveThresholdImage(wand,30,30,10);
MagickNegateImage(wand,false);
Code: Select all
imgReceipt.AutoLevel();
imgReceipt.Negate();
imgReceipt.AdaptiveThreshold(30, 30, 10);
imgReceipt.Negate();
Also Fred's 2 color threshold script might really be all I need, but I'm having trouble coming up with an equivalent for the .NET code to match:
Code: Select all
convert $infile +dither -colors 2 -colorspace gray -contrast-stretch 0 $outfile
Thanks for your help dlemstra, ImageMagick rocks!
Re: Clean Up a Document for Faxing/OCR
The +dither is the DitherMethod property of the QuantizeSettings and -colors 2 is Colors property. You can use the QuantizeSettings with the Quantize method of MagickImage.
Re: Clean Up a Document for Faxing/OCR
Thanks that did it. So I really need to use a technique more similar to Fred's TextCleaner and I'm looking at his sample of the ImageMagick command string.
So for the first two lines I've got:
But then on the third line I'm a little lost. I see that on the image I can set the Compose property, but there is no copy_opacity value. Also I can't find an equivalent for -opaque or +matte. Can you point me in the right direction?
Thanks again.
Code: Select all
convert \( $infile -colorspace gray -type grayscale -contrast-stretch 0 \) \
\( -clone 0 -colorspace gray -negate -lat ${filtersize}x${filtersize}+${offset}% -contrast-stretch 0 \) \
-compose copy_opacity -composite -fill "$bgcolor" -opaque none +matte \
-deskew 40% -sharpen 0x1 \ $outfile
Code: Select all
MagickImage imgReceipt = new MagickImage("receipt.pdf");
QuantizeSettings qs = new QuantizeSettings();
qs.ColorSpace = ColorSpace.GRAY;
imgReceipt.Quantize(qs);
imgReceipt.ColorType = ColorType.Grayscale;
imgReceipt.ContrastStretch(0, 0);
QuantizeSettings qs2 = new QuantizeSettings();
MagickImage img2 = imgReceipt.Clone();
qs2.ColorSpace = ColorSpace.GRAY;
img2.Quantize(qs2);
img2.Negate();
img2.AdaptiveThreshold(15, 15, 10);
img2.ContrastStretch(0, 0);
Thanks again.
Re: Clean Up a Document for Faxing/OCR
+matte enables the alpha channel of the image (MagickImage.Alpha(AlphaOption.Activate))
-composite is the Composite method of MagickImage
copy_opacity has been renamed to copy_alpha (CompositeOperator,CopyAlpha)
-opaque is MagickImage.Opaque
-composite is the Composite method of MagickImage
copy_opacity has been renamed to copy_alpha (CompositeOperator,CopyAlpha)
-opaque is MagickImage.Opaque
Re: Clean Up a Document for Faxing/OCR
Thanks again, making more progress.
Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.
Can you see any reason that this
Is producing different results than this?
The resulting image from the .NET code is leaving a lot of white streaks in the background, vs the command line version leaves a nice mask with almost all the background black.
Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.
Can you see any reason that this
Code: Select all
//Create Mask
MagickImage imgMask = imgReceipt.Clone();
imgMask.ColorSpace = ColorSpace.GRAY;
imgMask.Negate();
imgMask.AdaptiveThreshold(15, 15, 5); //lat
imgMask.ContrastStretch(0, 0);
//imgMask.AutoLevel();
Code: Select all
receipt.jpg -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0
Re: Clean Up a Document for Faxing/OCR
The last parameter of AdaptiveThreshold should be 5% of the QuantumRange (Quantum.Max). I just submitted a patch to add an overload of AdaptiveThreshold that accepts a percentage.
It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:
I will have to look in this.
It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:
Code: Select all
convert logo: -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Clean Up a Document for Faxing/OCR
You left off the minus before colorspace (i.e. -colorspace rather than colorspace)convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
Re: Clean Up a Document for Faxing/OCR
Awesome that was what I needed. It's generating a nice cleaned up document with a white background now.dlemstra wrote:The last parameter of AdaptiveThreshold should be 5% of the QuantumRange (Quantum.Max). I just submitted a patch to add an overload of AdaptiveThreshold that accepts a percentage.
It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:I will have to look in this.Code: Select all
convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
Re: Clean Up a Document for Faxing/OCR
Thanks Fred, I have narrowed it down to the following:fmw42 wrote:You left off the minus before colorspace (i.e. -colorspace rather than colorspace)convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
Code: Select all
convert logo: -lat 15x15+5% logo.png
Re: Clean Up a Document for Faxing/OCR
The bug in AdaptiveThreshold has been found and will be fixed in the next release of Magick.NET
Re: Clean Up a Document for Faxing/OCR
Awesome thanks!
And did you see my previous question yesterday about contrast stretch?
And did you see my previous question yesterday about contrast stretch?
Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.
Re: Clean Up a Document for Faxing/OCR
I did see it but I forgot about it
It looks like you will have to calculate the white point differently. I think it should be Width*Height. I will change the code in the next release of Magick.NET so this means you will have to change your code after the next release.
p.s. I have not tested this yet.
It looks like you will have to calculate the white point differently. I think it should be Width*Height. I will change the code in the next release of Magick.NET so this means you will have to change your code after the next release.
p.s. I have not tested this yet.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Clean Up a Document for Faxing/OCR
Since -lat produces a binary image, -contrast-stretch at the end should do nothing. If yo want to use it, it should be used before -lat.