Removing table borders
Removing table borders
I'm trying to OCR some papers which have data in a table format, but the table has borders between rows/columns and it's messing up the OCR. I think there must be some way to have ImageMagick remove the borders for me -- anyone know?
Re: Removing table borders
For a start nobody will be able to help without an example; also are all the tables the same in every page?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing table borders
If all the tables are the same structure, then can you scan an empty table. Then use compare to locate the offset (assuming no rotation or scale differences). Then use the empty table to mask out the lines.
Re: Removing table borders
I've got thousands of scanned documents from various sources and they're all slightly different -- I can't count on fonts, sizes, positions (global or local), border thickness, etc. so it's not a one-mask-fits-all problem (I wish!).
I've thought of detecting straight lines and removing any that are beyond a size which could be text -- sounds easy in theory, but the algorithms I've seen (e.g. Hough transform) which might be used are beyond me. Maybe I'm over-complicating it.
I don't have the documents on me at the moment, I'll post some later tonight when I do but really just imagine messed up (due to scanning) HTML tables with borders.
I've thought of detecting straight lines and removing any that are beyond a size which could be text -- sounds easy in theory, but the algorithms I've seen (e.g. Hough transform) which might be used are beyond me. Maybe I'm over-complicating it.
I don't have the documents on me at the moment, I'll post some later tonight when I do but really just imagine messed up (due to scanning) HTML tables with borders.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing table borders
Having one or two examples may help us understand better. Also it gives us something with which to test ideas.