$2000 for cleaning up scanned images
Posted: 2014-05-14T03:15:11-07:00
Hello,
I can pay $2000 for an IM expert to help out with some scripts for cleaning up scanned pages from books. I would be even more interested if anyone can combine IM with some programming, or OpenCV. But just IM would be OK too.
These scans can consist of single images containing text, pictures, noise, artifacts, etc.
Problems:
Background removal - currently I do this by manufacturing a predicted gradient background for the image and then dividing the original image by this predicted gradient. I think this could be optimized much further into something quite sophisticated.
Thresholding - probably local adaptive is the best bet, but will need some testing and optimization.
Dithering of pictures - I have a good technique for this, but perhaps someone can improve on it.
Noise removal - removing noise without affecting any "wanted" parts of the image, such as punctuation, pictures, borders, horizontal or vertical lines, etc.
Autorotation - this can be done without IM with a bit of trig on the OCR coordinates, but maybe IM has a good way of doing it. Worth a try, especially it might be useful for picture only pages where I can't use the OCR results for this.
Removal of page edges - detecting where the actual page starts and removing everything that is not the page.
Please let me know if anyone is interested. More details will be provided to anyone interested.
Thanks,
Alasdair
I can pay $2000 for an IM expert to help out with some scripts for cleaning up scanned pages from books. I would be even more interested if anyone can combine IM with some programming, or OpenCV. But just IM would be OK too.
These scans can consist of single images containing text, pictures, noise, artifacts, etc.
Problems:
Background removal - currently I do this by manufacturing a predicted gradient background for the image and then dividing the original image by this predicted gradient. I think this could be optimized much further into something quite sophisticated.
Thresholding - probably local adaptive is the best bet, but will need some testing and optimization.
Dithering of pictures - I have a good technique for this, but perhaps someone can improve on it.
Noise removal - removing noise without affecting any "wanted" parts of the image, such as punctuation, pictures, borders, horizontal or vertical lines, etc.
Autorotation - this can be done without IM with a bit of trig on the OCR coordinates, but maybe IM has a good way of doing it. Worth a try, especially it might be useful for picture only pages where I can't use the OCR results for this.
Removal of page edges - detecting where the actual page starts and removing everything that is not the page.
Please let me know if anyone is interested. More details will be provided to anyone interested.
Thanks,
Alasdair