Textcleaner sensitive to IM Version?
Textcleaner sensitive to IM Version?
This is not *strictly* a textcleaner question, but textcleaner-adjacent, IMO.
I had the following command line (inspired by the TextCleaner script) that used to do a great job cleaning up text prior to running OCR (under IM version 6.7.7-10):
convert (infile.png -colorspace gray -type grayscale -contrast-stretch 0) (-clone 0--1 -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0) -compose copy_opacity -composite -opaque none -alpha off -deskew 40% -sharpen 0x1 outfile.png
But after I upgraded to IM 6.8.9-9, the result is a little less contrast with the images, and noticeably poorer OCR results. I couldn't find information in the changelogs that would explain this.
My question is, does anybody have any insight into what could account for the difference? I can supply image samples if that helps with the diagnosis.
Thanks much,
- Matt
I had the following command line (inspired by the TextCleaner script) that used to do a great job cleaning up text prior to running OCR (under IM version 6.7.7-10):
convert (infile.png -colorspace gray -type grayscale -contrast-stretch 0) (-clone 0--1 -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0) -compose copy_opacity -composite -opaque none -alpha off -deskew 40% -sharpen 0x1 outfile.png
But after I upgraded to IM 6.8.9-9, the result is a little less contrast with the images, and noticeably poorer OCR results. I couldn't find information in the changelogs that would explain this.
My question is, does anybody have any insight into what could account for the difference? I can supply image samples if that helps with the diagnosis.
Thanks much,
- Matt
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Textcleaner sensitive to IM Version?
This is not my exact code. You have added and removed things.convert (infile.png -colorspace gray -type grayscale -contrast-stretch 0) (-clone 0--1 -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0) -compose copy_opacity -composite -opaque none -alpha off -deskew 40% -sharpen 0x1 outfile.png
Best guess is that you may have different libpng delegates. Also there is code in my script to deal with changes in colorspace over time in IM. See viewtopic.php?f=4&t=21269
-opaque typically needs to have a -fill somecolor setting before using -opaque. But why are you setting it to none and then turning alpha off?
Put a -alpha off before -compose copy_opacity
Looks like I was a bit sloppy in my code and need to clean it up a little.
Perhap you should post your input image, so others can test with it.
Try upgrading to the latest IM 6 or IM 7 version and see what happens.
Re: Textcleaner sensitive to IM Version?
Thanks for the reply.
Yes, I did modify the code somewhat, but I *was* trying to preserve the original intent. The "alpha off" is to replace the +matte, which the docs say is obsolete but equivalent to "alpha off" (assuming I got that right). I put the alpha off in the same position that +matte was in the original.
Attaching the image I've been playing around with.
Thanks Again,
- Matt
Yes, I did modify the code somewhat, but I *was* trying to preserve the original intent. The "alpha off" is to replace the +matte, which the docs say is obsolete but equivalent to "alpha off" (assuming I got that right). I put the alpha off in the same position that +matte was in the original.
Attaching the image I've been playing around with.
Thanks Again,
- Matt
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Textcleaner sensitive to IM Version?
You probably took your code from an older version of my script. I changed all the matte to alpha a while ago.
Can you post your outputs from your two IM versions?
Did you check your versions of libpng?
should tell you the version numbers.
Can you post your outputs from your two IM versions?
Did you check your versions of libpng?
Code: Select all
convert -list format
Re: Textcleaner sensitive to IM Version?
(Edit for future readers--had the IM versions swapped; corrected that. -Matt)
The version that gives better OCR results is below:
This is off IM 6.7.7-10, with libpng ver 1.2.49
The version that yields poorer OCR results is:
And that is produced by IM 6.8.9-9, and it looks like that has libpng ver 1.2.50.
The convert command line is the same between them, but the results are noticeably different.
By the way, I didn't grab the command line from the script itself, but from the command line snippet at the bottom of http://www.fmwconcepts.com/imagemagick/ ... /index.php, which still appears to have the +matte in it. Probably
the script is updated, as you said.
Thanks again,
- RBW
The version that gives better OCR results is below:
This is off IM 6.7.7-10, with libpng ver 1.2.49
The version that yields poorer OCR results is:
And that is produced by IM 6.8.9-9, and it looks like that has libpng ver 1.2.50.
The convert command line is the same between them, but the results are noticeably different.
By the way, I didn't grab the command line from the script itself, but from the command line snippet at the bottom of http://www.fmwconcepts.com/imagemagick/ ... /index.php, which still appears to have the +matte in it. Probably
the script is updated, as you said.
Thanks again,
- RBW
Last edited by mstone on 2016-05-05T04:39:22-07:00, edited 1 time in total.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Textcleaner sensitive to IM Version?
I will look at this further later today. But try with jpg or tiff output. Are they any different?
Also note that parentheses must have spaces on both sides. This could be just a typo in your post.
Code: Select all
convert (infile.png -colorspace gray -type grayscale -contrast-stretch 0) (-clone 0--1 -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0) -compose copy_opacity -composite -opaque none -alpha off -deskew 40% -sharpen 0x1 outfile.png
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Textcleaner sensitive to IM Version?
Your problem occurs because version 6.7.7.10 was during the time that IM was undergoing changes of colorspace and linear vs non-linear gray. At 6.7.7.10, it was using linear gray. Before that some release and after about 6.8.5, it was nonlinear gray. See the link I posted above about this issue.
The problem is with -colorspace gray. You can fix your results by using a linear grayscale in your later versions of IM by replacing it with -grayscale rec601luminance.
I tested this with the following:
convert page_Image_0.png -colorspace gray tmp6a.png
im67710 convert page_Image_0.png -colorspace gray tmp6b.png
convert page_Image_0.png -grayscale rec601luminance tmp6a2.png
tmp6a and tmp6b are different. But tmp6a2 and tmp6b are similar.
Choices of gray can be found by
Luma is non-linear (equivalent of gray sRGB)
Luminance is linear. (equivalent of gray RGB)
The problem is with -colorspace gray. You can fix your results by using a linear grayscale in your later versions of IM by replacing it with -grayscale rec601luminance.
I tested this with the following:
convert page_Image_0.png -colorspace gray tmp6a.png
im67710 convert page_Image_0.png -colorspace gray tmp6b.png
convert page_Image_0.png -grayscale rec601luminance tmp6a2.png
tmp6a and tmp6b are different. But tmp6a2 and tmp6b are similar.
Choices of gray can be found by
Code: Select all
convert -list intensity
Luminance is linear. (equivalent of gray RGB)
Re: Textcleaner sensitive to IM Version?
The missing spaces near the parens are just me reformatting the command line for the post. I'm fairly sure they are there in the real world, as the pieces of the command are assembled from an array, ala:
child_process.spawnSync should just dumbly assemble the elements into a space-separated list, and that ought to produce the command line with spaces where we need them.
I did try with jpg and tiff in both environments. In 6.7 it produced pretty much the same image, and pretty much identical (good) OCR results. In 6.8 it was weird--converting to tiff yielded a negative image (white on black), which produced terrible OCR results. Converting to jpg produced a pretty similar output as converting to png, which is to say, slightly too-cleaned-up and therefore bad OCR results again.
Thanks,
- Matt
Code: Select all
childOutput = child_process.spawnSync('convert', [
'(',
operationObj.docpath,
'-colorspace', 'gray',
'-type', 'grayscale',
'-contrast-stretch', '0',
')',
'(',
'-clone', '0--1',
'-colorspace', 'gray',
'-negate',
'-lat', '15x15+5%',
'-contrast-stretch', '0',
')',
'-compose', 'copy_opacity',
'-composite',
'-opaque', 'none',
// '+matte',
'-alpha', 'off',
'-deskew', '40%',
'-sharpen', '0x1',
newName], {timeout: operation_timeout} );
I did try with jpg and tiff in both environments. In 6.7 it produced pretty much the same image, and pretty much identical (good) OCR results. In 6.8 it was weird--converting to tiff yielded a negative image (white on black), which produced terrible OCR results. Converting to jpg produced a pretty similar output as converting to png, which is to say, slightly too-cleaned-up and therefore bad OCR results again.
Thanks,
- Matt
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Textcleaner sensitive to IM Version?
Did you see my post above yours about using -grayscale rec601luminance rather than -colorspace gray?
Re: Textcleaner sensitive to IM Version?
Yes, I did. That's what I was referring to that made the difference. JPG / TIFF didn't do much different but -grayscale rec601luminance brought the results much more back in line with what 6.7 had been producing.
I'm also going to try IM 7 when I can afford to scrub my VM and reinstall everything, but from what you say I'll probably need to use -grayscale to get results under that version as well.
Thank you again. Very much appreciated.
Best,
- Matt
I'm also going to try IM 7 when I can afford to scrub my VM and reinstall everything, but from what you say I'll probably need to use -grayscale to get results under that version as well.
Thank you again. Very much appreciated.
Best,
- Matt
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Textcleaner sensitive to IM Version?
Yes, you will need to use -grayscale rec601luminance to get results similar to IM 6.7.7.10