Page 1 of 1

Removing JPEG artifacts from scanned text

Posted: 2016-06-26T05:41:57-07:00
by rustyx
I'm trying to convert images containing greyscale text on white background (improperly) saved as JPG into PNG, but that produces files larger than the original due to the JPEG artifacts (the text otherwise looks OK, i.e. the artifacts are mild).

Currently it looks like this (gamma exaggerated for illustration purposes):

Image

If I convert it as-is, the size increases because of the JPEG artifacts.
Example:
Original JPEG: ipsum.jpg, 10KB
Converted to PNG: ipsum.png, 14KB

What's the best way to reduce the noise but without distorting the text? (I rather keep some noise than remove some detail from the text)

Re: Removing JPEG artifacts from scanned text

Posted: 2016-06-26T06:06:22-07:00
by snibgo
It depends on the objective. As the linked images look okay, why do want to change them? Once an image has been saved as JPG, converting from that to almost any other format will increase the file size. This is because the image has been distorted in accordance with JPEG compression.

If you want to save space, then thresholding at 50% will result in very good PNG compression. But, of course, that's another distortion, which removes anti-aliasing.

Code: Select all

convert ipsum.jpg -level 5%,95% out.png
This saves some space (mostly by converting many near-white pixels to exactly white) without doing much damage.

Re: Removing JPEG artifacts from scanned text

Posted: 2016-06-26T08:00:31-07:00
by rustyx
Thanks for the suggestion but threshold loses too much detail.

So far I could come up with :

Code: Select all

mogrify -despeckle -fuzz 5% -fill white -opaque white -gamma 0.8 -colorspace gray -depth 6 -format png *.jpg