Border removal for OCR
Posted: 2014-07-04T12:44:40-07:00
Hi All,
I've a legacy application in which I'm testing the button texts using our OCR service. But our OCR is failing to read text correctly, if the text image is enclosed in borders. We normally choose a large button area(including borders) for accommodating drift in different platform. The button can be highlighted or not.
I've implemented a logic to do border scan for pixels in top, bottom, left and right to identify borders. Scanning is done on grey or binary converted images. If continuous pixels are are detected along image width(horizontal scan) or image height(vertical scan) assumed it as a border and trimmed it. During border line scanning, if pixels are discontinuous, it can be an edge of a character(if images captured tightly with text). So we stop scanning and assume it as a character boundary.
Image ------ Before Crop ------ After Crop
1. ------ ------
2. ------ ------
3. ------ ------
4. ------ ------
For the images 1,2 and 3, I was able to successfully crop the borders using border line scanning and the resulting ocr comparison was successful. But for the 4th one cropping failed. This is due to the fact that, there were speckles on the image boundary. Since these speckles are discontinuous, my cropping routine consider it as a character boundary. So cropping is not effective thus ocr comparison failed.
I'm wondering is there any better way(noise filters, masks or other built-in tools?) by which I can remove borders of a text image effectively OR remove the speckles outside border line, so that my cropping routine will work correctly. Also there can be cases where there were no borders at all. So need border removal solution for both cases.
Please guide and share your suggestions.
I've a legacy application in which I'm testing the button texts using our OCR service. But our OCR is failing to read text correctly, if the text image is enclosed in borders. We normally choose a large button area(including borders) for accommodating drift in different platform. The button can be highlighted or not.
I've implemented a logic to do border scan for pixels in top, bottom, left and right to identify borders. Scanning is done on grey or binary converted images. If continuous pixels are are detected along image width(horizontal scan) or image height(vertical scan) assumed it as a border and trimmed it. During border line scanning, if pixels are discontinuous, it can be an edge of a character(if images captured tightly with text). So we stop scanning and assume it as a character boundary.
Image ------ Before Crop ------ After Crop
1. ------ ------
2. ------ ------
3. ------ ------
4. ------ ------
For the images 1,2 and 3, I was able to successfully crop the borders using border line scanning and the resulting ocr comparison was successful. But for the 4th one cropping failed. This is due to the fact that, there were speckles on the image boundary. Since these speckles are discontinuous, my cropping routine consider it as a character boundary. So cropping is not effective thus ocr comparison failed.
I'm wondering is there any better way(noise filters, masks or other built-in tools?) by which I can remove borders of a text image effectively OR remove the speckles outside border line, so that my cropping routine will work correctly. Also there can be cases where there were no borders at all. So need border removal solution for both cases.
Please guide and share your suggestions.