Page 1 of 1

PAID: Need to enhance text within Image/PDF

Posted: 2015-08-25T11:54:52-07:00
by osutra
Hi

We are looking for an experienced IM developer who can help with enhancing text within TIFF/PDF files to improve OCR success ratios.

The text quality within the files we use isn't very good and some enhancements to it "might" help improve the OCR output quality.

Samples attached. Please PM me if interested and we can discuss this further.

100% Size : http://www.tiikoni.com/tis/view/?id=fd41a35
150% size: http://www.tiikoni.com/tis/view/?id=8211a93

This will be a paid/compensated effort.

Re: PAID: Need to enhance text within Image/PDF

Posted: 2015-08-25T12:07:49-07:00
by fmw42
You should not have any trouble with OCR on these two files. They are pretty good. The only thing you might need is to deskew them to rotate the image to horizontal lines of text. See -deskew at http://www.imagemagick.org/script/comma ... php#deskew

If you have images that have non-white backgrounds, and you are using unix (Linux, Mac OSX or Windows w/Cygwin), then you might try my script, textcleaner, at the link below.

Re: PAID: Need to enhance text within Image/PDF

Posted: 2015-08-25T13:31:17-07:00
by osutra
Hi Fred,

We have done the best we can and are using Google's tesseract. So most words show up fine in the extraction but there are quite a few that get messed up.

This image I posted was a crop of the entire PDF file. And, here is the OCR output on the same CROP - http://www.tiikoni.com/tis/view/?id=6a4e21d

Words like "-rash" on the 5th line from bottom at the end show up as "-ra5h1". Same issue with "-cce" and "+BS" that became "+35".

Thoughts on how we can get this fixed.

Would you like me to PM the actual PDF?

Thanks a ton!

Re: PAID: Need to enhance text within Image/PDF

Posted: 2015-08-25T16:06:59-07:00
by fmw42
Sorry, I am not an OCR expert nor even done much OCR at all. I do not know what to suggest at this point beyond what I have suggested.

Have you tried -deskew to see if better horizontal alignment helps? You might also try some sharpening using -unsharp.

If the PDF is pure vector and not a raster image inside in a PDF container, you could try giving it more density to make the text larger.