Extracting wrong characters from image

seopower · Post by **seopower** » 2016-03-14T00:58:15-07:00

Hi,

I have a simple image (as attached) when trying to extract text (OCR) from image giving me wrong characters, resulting in wrong spellings. Please suggest what to do to extract correctly.

Using tesseract with Ubuntu through PHP like given below:

exec('tesseract temp/' . $filename . '.png temp/' . $filename);

Thanks,

Post by **snibgo** » 2016-03-14T03:14:15-07:00

In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.

markt · Post by **markt** » 2016-03-14T09:30:18-07:00

Try adding an extra border around the extracted text image, I seemed to get improved recognition with Tesseract using additional 20x20 white border.

seopower · Post by **seopower** » 2016-03-14T09:51:17-07:00

snibgo wrote:In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.

Thanks, I have increased the size of the image by double and now it's recognising correctly but still missing space between two words.

Legacy ImageMagick Discussions Archive

Extracting wrong characters from image

Extracting wrong characters from image

Re: Extracting wrong characters from image

Re: Extracting wrong characters from image

Re: Extracting wrong characters from image