Page 1 of 1

Optimize dark (gray) image for OCR

Posted: 2019-05-25T04:24:19-07:00
by Jordan
Hi,

I'm trying to optimize image300.png for Tesseract OCR, the image will always look the same except for the text. So far I've managed to generate output.tiff. I've tried to use Photoshop to guide me with a gui, without any luck (the values e.g. in Photoshop levels don't corrospond with -level in Imagemagick).

Code: Select all

convert img300.png -grayscale Rec709Luminance -channel RGB -black-threshold 13% -white-thresh
old 12.9% -negate output.tiff
Output.tiffImage
https://www.dropbox.com/s/mluzbmt6tuuus ... .tiff?dl=0

Original file img300.pngImage
https://www.dropbox.com/s/vdtt2w3kkfbix ... 0.png?dl=0

Optimal result img300_optimized.png
Image
https://www.dropbox.com/s/w5kkf2ty3mw7c ... d.png?dl=0

I'm hopeing someone has some tips for me to get a better result, love to hear from you!

Jordan

Re: Optimize dark (gray) image for OCR

Posted: 2019-05-25T04:46:53-07:00
by snibgo
Instead of "-black-threshold 13% -white-threshold 12.9%", you could use "-threshold 12.95%".

In my experience of Tesseract, it works best when the height of capital letters such as ACSB is at least 20 pixels. Yours are only 8 pixels high.

Re: Optimize dark (gray) image for OCR

Posted: 2019-05-25T05:53:07-07:00
by Jordan
Yes that yields the same result!

Unfortunatly I don't have a higher resolution image (it will always be this format) I don't need capital letters though they only need to be detected correctly.

The OCR result I is:

Code: Select all

Sell Offers:

Amount: Lf. )+ Total: 659,887 @ | accept
Name Amount Piece Price Total Price |Ends At
Anonymous 1 659,887 659,887 |2019-06-22, 20:45:46 ‘
Anonymous 3 659,888 1,979,664 | 2019-06-22, 20:44:41
Anonymous 2 659,900 1,319,800 | 2019-06-22, 20:15:47
Anonymous 4 659,998| 2,639,992 |2019-06-22, 20:12:32
Anonymous 1 670,000 670,000 | 2019-06-22, 13:13:18
Anonymous 1 700,000 700,000 | 2019-06-22, 07:40:52
Anonymous 1 800,000 800,000 | 2019-06-22, 01:47:39 é
Buy Offers:

Amount: Lf. |+ Total: 570,102 @ | accept
Name Amount Piece Price Total Price |Ends At
Anonymous 1 570,102 570,102 | 2019-06-22, 20:45:50 x
Anonymous 3 570,101 1,710,303 | 2019-06-22, 20:11:29
Anonymous 5 570,100 2,850,500 | 2019-06-22, 20:06:11 .
Anonymous 1 570,000 570,000 | 2019-06-22, 20:00:31
Anonymous 4 569,600 2,278,400 | 2019-06-22, 19:57:38
Anonymous 1 569,512 569,512 | 2019-06-22, 19:20:04
Anonymous 1 569,502 569,502 | 2019-06-22, 19:03:28 é
Create Offer:
@: Sell Amount: 0 Gross Profit: 0e
_! Buy I. |- Fee: 0e

Piece Price: e| Total Profit: Je

_| Anonymous | Corace
Now it has some problems with the red text (Last two rows @ Sell Offers), is there a way to enhance the red to make it more bold or something?

Re: Optimize dark (gray) image for OCR

Posted: 2019-05-25T10:31:51-07:00
by fmw42
I do not know if this will help you or not. But if on a Linux, Mac OSX or Windows 10 unix (or Cygwin), you could try my textcleaner script at my link below.

input:
Image

Code: Select all

textcleaner -g -f 20 -o 10 -e normalize -i 1 img300.png img300_textclean_g_enorm_f20_o10_i1.png
Image

You can threshold further if you want.

Re: Optimize dark (gray) image for OCR

Posted: 2019-05-25T10:54:26-07:00
by Jordan
Hi,

Thanks for your message! I have to say by the way that ImageMagick is just awesome <3

I haven't stop tweaking. It seemed that the original image had been alerted in some way. But I managed to isolate the right colors
Original image
https://www.dropbox.com/s/9lnhrd1rrr6ld ... 3.png?dl=0
Image

Code: Select all

convert screen2.png -fuzz 0% -fill "#30ff00" -opaque "#b01111" -opaque "#c0c0c0" -opaque "#f4f4f4" -opaque "#c87d7d" -opaque "#bebebe" -opaque "#808080" -fill none -fuzz 0% +opaque "#30ff00" -fuzz 0% -fill "#000000" -opaque "#30ff00" +profile "icc" -density 1200 output.png
Output image
https://www.dropbox.com/s/v637khxuk688t ... t.png?dl=0
Image

Is there some way to smooth the text a bit? I have a feeling it might help for the OCR software.

OCR results:

Code: Select all

Sell Offers:
Amount: 1 Total: 316,900 :
Name Amount Piece Price Total Price Ends At
Anonymous 2 316,900 633,600 2019-06-24, 15:13:23
Anonymous 1 316,999 316,999 2019-06-23, 00:37:06
Anonymous 1 317,000 317,000 2019-06-23, 00:14:13
Anonymous 1 319,000 319,000 2019-06-22, 23:18:24
Anonyrnious 5 334,899 1,674,495 2019-06-22, 00:56:37
Anonymous 1 339,900 339,900 2019-06-20, 01:20:40
Anonymous 1 342,315 342,315 2019-06-19, 19:07:31
Buy Offers:
Amount: o Total: o
Name Amount Piece Price Total Price Ends At
Anonyrnious 4 251,851 1,007,404 2019-06-24, 16:31:26
Anonymous 1 251,850 251,850 2019-06-24, 15:48:12
Anonyrnious 4 251,847 1,007,388 2019-06-24, 15:08:52
Anonymous 2 251,804 503,608 2019-06-24, 14:02:42
Anonymous 1 251,700 251,700 2019-06-24, 12:46:48
Anonymous 2 250,601 501,202 2019-06-23, 00:37:12
Anonyrnious 5 250,000 1,250,000 2019-06-22, 01:20:15
Create Offer:
Sell Amount: 5 Price: 1,259,255
@ Buy Fee: 1,000
Piece Price: 251851] Total Price: 1,260,255
“ énonymous .

Re: Optimize dark (gray) image for OCR

Posted: 2019-05-31T11:36:28-07:00
by Jordan
Does someone have an idea how to blur the picture (to make it less pixelated?)

Also someone suggested the following:
I suggest blurring the picture before processing with tesseract (for example Gaussian Blur, horizontal 0.5, vertical 2.0). Does the recognition then improve?
I have no idea how to apply such a blur with imagemagick though

Re: Optimize dark (gray) image for OCR

Posted: 2019-05-31T16:17:41-07:00
by fmw42
Blurring the image will not do you any good. It will blur the text also and that will make it harder to OCR the characters. Try removing noise as follows:

Code: Select all

convert img300.png -enhance -enhance -enhance -enhance -enhance -enhance -enhance -enhance -enhance -enhance result.png