Removing table borders

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
phrosty
Posts: 7
Joined: 2011-08-08T01:11:59-07:00
Authentication code: 8675308

Removing table borders

Post by phrosty »

I'm trying to OCR some papers which have data in a table format, but the table has borders between rows/columns and it's messing up the OCR. I think there must be some way to have ImageMagick remove the borders for me -- anyone know?
Bonzo
Posts: 2971
Joined: 2006-05-20T08:08:19-07:00
Location: Cambridge, England

Re: Removing table borders

Post by Bonzo »

For a start nobody will be able to help without an example; also are all the tables the same in every page?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing table borders

Post by fmw42 »

If all the tables are the same structure, then can you scan an empty table. Then use compare to locate the offset (assuming no rotation or scale differences). Then use the empty table to mask out the lines.
phrosty
Posts: 7
Joined: 2011-08-08T01:11:59-07:00
Authentication code: 8675308

Re: Removing table borders

Post by phrosty »

I've got thousands of scanned documents from various sources and they're all slightly different -- I can't count on fonts, sizes, positions (global or local), border thickness, etc. so it's not a one-mask-fits-all problem (I wish!).

I've thought of detecting straight lines and removing any that are beyond a size which could be text -- sounds easy in theory, but the algorithms I've seen (e.g. Hough transform) which might be used are beyond me. Maybe I'm over-complicating it.

I don't have the documents on me at the moment, I'll post some later tonight when I do but really just imagine messed up (due to scanning) HTML tables with borders.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing table borders

Post by fmw42 »

Having one or two examples may help us understand better. Also it gives us something with which to test ideas.
Post Reply