I am trying to prepare the following image for OCR:
But I cant seem to get it right. Normally, without the background & shadows on the side, the following parameters work to get just the text: -normalize -despeckle -despeckle -type grayscale -sharpen 1 -contrast to get :
but obviously this doesnt work for the first image.
Any ideas? Thanks for reading.
Removing Shadow and Background for OCR
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Removing Shadow and Background for OCR
Get a scanner. Life will be so much easier.
If that isn't feasible (perhaps horrible images like this come from clients), use an interactive editor to square up the image, crop it and curve it.
Automated processing is possible for this particular image, of course. But a generic solution for dodgy photographs of till receipts on noisy worktops with bad lighting is a lot of effort.
If that isn't feasible (perhaps horrible images like this come from clients), use an interactive editor to square up the image, crop it and curve it.
Automated processing is possible for this particular image, of course. But a generic solution for dodgy photographs of till receipts on noisy worktops with bad lighting is a lot of effort.
snibgo's IM pages: im.snibgo.com
Re: Removing Shadow and Background for OCR
Thanks for the input. However,
The images are coming from a camera and as you said, it is not feasible to get a scanner.snibgo wrote:Get a scanner. Life will be so much easier.
There are many images. Cropping and curving individually is a full time job.snibgo wrote:If that isn't feasible (perhaps horrible images like this come from clients), use an interactive editor to square up the image, crop it and curve it.
Alas, that is what is needed.snibgo wrote:But a generic solution for dodgy photographs of till receipts on noisy worktops with bad lighting is a lot of effort.
Yes! Exactly what I want to know. Any suggestions?snibgo wrote:Automated processing is possible for this particular image, of course.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing Shadow and Background for OCR
If on Linux/Mac or Windows w/Cygwin, try my script, textcleaner at the link below. Or use -lat, which is the basis of my script
Re: Removing Shadow and Background for OCR
Thanks for joining this thread Fred!
I did try your script earlier today with [ -g -e normalize -f 30 -o 12 -s 2 ] & many other variations of it to generally get something like
I couldn't get the text to be less pixelated by changing the different parameters as can be seen in the zoomed image:
Any idea on what parameter has to be adjusted or what operation can be done?
I did try your script earlier today with [ -g -e normalize -f 30 -o 12 -s 2 ] & many other variations of it to generally get something like
I couldn't get the text to be less pixelated by changing the different parameters as can be seen in the zoomed image:
Any idea on what parameter has to be adjusted or what operation can be done?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing Shadow and Background for OCR
You are really limited by the size of the image and thus the resolution available in pixels. You would do better if the image size were much larger.
try one of these (about the best I can get depending upon your idea of less pixelated)
textcleaner -g -f 15 -o 15 -e normalize -t 10 4wtf21w.jpg show:
textcleaner -g -f 15 -o 15 -e normalize -t 10 -s 1 4wtf21w.jpg show:
try one of these (about the best I can get depending upon your idea of less pixelated)
textcleaner -g -f 15 -o 15 -e normalize -t 10 4wtf21w.jpg show:
textcleaner -g -f 15 -o 15 -e normalize -t 10 -s 1 4wtf21w.jpg show:
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: Removing Shadow and Background for OCR
I would agree. the image size while suitable for monitor displays, is not suitable for OCR.
Practically all cameras these days capture at a much much higher resolution.
Also I would try to 'square up' the image more before trying to clean up the image. Phaps looking for the edge of the docket and the workbench to find the rotation.
Finally. Just how much control do you have on the photographing?
Can the work area be controlled?
Provide good strong lighting (or flash)?
Can you control the camera being used (resolution)
Can the camera be mounted perfectly overhead?
How about providing a fixed solid edge the docket can pushed up against so it will be square with the camera?
Can the workbench contrast be controlled?
EG: made dark, or some specific color (green felt) for easier auto docket rotation, and or removal.
The more you can control the environment, the easier it is to automate the OCR conversion, even without going to the expense of a dedicated high-res scanner.
The simple use of an edge, for example means it is fast to position the docket and take the photo, allowing fast turn over of dockets thru the system. Perhaps even a real-time indication of a successful OCR conversion on the computer as you process each docket (even to item codes being checked against the stock database).
Practically all cameras these days capture at a much much higher resolution.
Also I would try to 'square up' the image more before trying to clean up the image. Phaps looking for the edge of the docket and the workbench to find the rotation.
Finally. Just how much control do you have on the photographing?
Can the work area be controlled?
Provide good strong lighting (or flash)?
Can you control the camera being used (resolution)
Can the camera be mounted perfectly overhead?
How about providing a fixed solid edge the docket can pushed up against so it will be square with the camera?
Can the workbench contrast be controlled?
EG: made dark, or some specific color (green felt) for easier auto docket rotation, and or removal.
The more you can control the environment, the easier it is to automate the OCR conversion, even without going to the expense of a dedicated high-res scanner.
The simple use of an edge, for example means it is fast to position the docket and take the photo, allowing fast turn over of dockets thru the system. Perhaps even a real-time indication of a successful OCR conversion on the computer as you process each docket (even to item codes being checked against the stock database).
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/
Re: Removing Shadow and Background for OCR
Sorry, i'm a bit late to reply; got busy.
I heard this kind of system was developed by a Google Book's developer using a hacked scanner.
I was finally able to get higher quality image here: http://i.minus.com/ibiHhso9LbxL1f.pngYou are really limited by the size of the image and thus the resolution available in pixels. You would do better if the image size were much larger.
They give about the same quality I originally got. I think this is an OCR issue rather than preprocessing?Perhaps it needs to be trained better.try one of these ...
Yeah, I should probably add that but currently, want to focus on straight images.Also I would try to 'square up' the image more before trying to clean up the image. Phaps looking for the edge of the docket and the workbench to find the rotation.
Not much. I suppose most of the images are of the same quality as above. I have access only to the raw image files; cant do much about work area, flash,camera mounting,etcFinally. Just how much control do you have on the photographing?
Any built in function in imagemagick to do this?EG: made dark, or some specific color (green felt) for easier auto docket rotation, and or removal.
I agree 1000%; scanned images would be much easier. Unfortunately, only the above type of camera images are available.The more you can control the environment, the easier it is to automate the OCR conversion, even without going to the expense of a dedicated high-res scanner.
The simple use of an edge, for example means it is fast to position the docket and take the photo, allowing fast turn over of dockets thru the system. Perhaps even a real-time indication of a successful OCR conversion on the computer as you process each docket
I heard this kind of system was developed by a Google Book's developer using a hacked scanner.
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: Removing Shadow and Background for OCR
By 'edge'. I means just a raise bit of wood stuck to the work area. The docket is pushed onto that wood edge and thus is immediately perfectly aligned with the camera, before the image is taken. No time spent by user.
Such small changes (like green felt on workbench) makes the later processing that much easier.
But is you have little control. then the next step is to try auto rotation.
have a look at Fred (fwm) scripts. whiteboard script and rotation scripts.
(see his link above)
If the photo can be rotated square before OCR then OCR software should work a lot better.
Such small changes (like green felt on workbench) makes the later processing that much easier.
But is you have little control. then the next step is to try auto rotation.
have a look at Fred (fwm) scripts. whiteboard script and rotation scripts.
(see his link above)
If the photo can be rotated square before OCR then OCR software should work a lot better.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/