Image database of scanned letters.

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
miguellint
Posts: 22
Joined: 2015-09-27T20:26:53-07:00
Authentication code: 1151

Image database of scanned letters.

Post by miguellint »

Hello...

I'm not sure who to ask (and I'm not even sure this is a thing) but I'd appreciate any thoughts :-)

---

I have several dozen scanned images of old and faded microfiche. The microfiche hold lists of names and places.

Here is an example of the letter E. As can be seen there are five distinct shapes even though they are all for the same letter.

(Dropbox link so just X any pop-ups asking you to register)

http://tinyurl.com/jzpy5sl

If I collect all of these shapes, standardise their size (e.g 30x40), and then put them in a database under "Uppercase letter E" could I then go through each of my original scanned images and compare each 30x40 pixel area to the database. When the comparision is true I could then paste a "good" 30x40px copy of the letter E over the original "bad" 30x40px letter E.

Repeat for each letter of the alphabet.

I know that when you are training OCR software you build up a database containg various images of each letter. I'm wondering if I can do something similar using IM.

Hope that all makes sense :-)

Any thoughts appreciated
Miguel

IM 6.8.9-9
Kubuntu 15.10
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Image database of scanned letters.

Post by snibgo »

I suppose you could do this, but why? People have written special-purpose software to read text. IM is a general-purpose image processor, and doesn't contain code to recognise, characterise and match letter-forms. Using a brute-force method (chop the document into individual letters, and compare all with every image in the database) will be massively slow.
snibgo's IM pages: im.snibgo.com
miguellint
Posts: 22
Joined: 2015-09-27T20:26:53-07:00
Authentication code: 1151

Re: Image database of scanned letters.

Post by miguellint »

Hello Snigbo...

Thanks for replying :-)

It's not so much about OCR'ing the images. It's more about making the images look "nice" (for want of a better word).

And learning more about IM - specifically looping a copy/paste region by region, IM and databases, and comparing images pixel by pixel. I'm still a total noob.

The microfiche are not that legible so, as it stands, if I OCR a scanned image the software will come back with dozens of suspect letters which it will want me to correct. Multiply that by several dozen original scans and I'll soon have to make thousands of corrections.

When I eventually finish the OCR I'll have a text version of the microfiche (which admittedly is pretty cool) but the original scans will still look pretty shabby and still be fairly illegible.

---

Totally appreciate that it seems pointless but I'm just hoping for some pointers, not full blown code.

Thanks
Miguel
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Image database of scanned letters.

Post by fmw42 »

compare is not rotation or scale invariant. It can only find offsets and is slow. But see compare

http://www.imagemagick.org/script/compare.php
http://www.imagemagick.org/Usage/compare/
Post Reply