OCR of typewriter copy
Posted: 2014-09-02T08:51:29-07:00
This is my first post here, so hi everyone.
I have been using Imagemagick to pre-process images in preparation for OCR. But the input is typewriter copy which is a problem because some characters are typed with less force and so are considerably lighter than others. The other problem is that the white background varies in brightness across the image (I have no control of the scanning process). The OCR output was crummy.
Then I explored a bit, and found this useful line:
$ convert 1345.jpg -colorspace gray \( +clone -blur 0x20 \) -compose Divide_Src -composite 1345photocopy1.jpg
# ref http://www.imagemagick.org/Usage/compose/#divide
After this, the OCR output is quite good, thanks!. But not perfect. Are there other things I could do in ImageMagick to improve this?
I have been using Imagemagick to pre-process images in preparation for OCR. But the input is typewriter copy which is a problem because some characters are typed with less force and so are considerably lighter than others. The other problem is that the white background varies in brightness across the image (I have no control of the scanning process). The OCR output was crummy.
Then I explored a bit, and found this useful line:
$ convert 1345.jpg -colorspace gray \( +clone -blur 0x20 \) -compose Divide_Src -composite 1345photocopy1.jpg
# ref http://www.imagemagick.org/Usage/compose/#divide
After this, the OCR output is quite good, thanks!. But not perfect. Are there other things I could do in ImageMagick to improve this?