Hi,
I have a simple image (as attached) when trying to extract text (OCR) from image giving me wrong characters, resulting in wrong spellings. Please suggest what to do to extract correctly.
Using tesseract with Ubuntu through PHP like given below:
exec('tesseract temp/' . $filename . '.png temp/' . $filename);
Thanks,
Extracting wrong characters from image
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Extracting wrong characters from image
In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.
snibgo's IM pages: im.snibgo.com
Re: Extracting wrong characters from image
Try adding an extra border around the extracted text image, I seemed to get improved recognition with Tesseract using additional 20x20 white border.
Re: Extracting wrong characters from image
Thanks, I have increased the size of the image by double and now it's recognising correctly but still missing space between two words.snibgo wrote:In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.