we are going through digitization of records. The problem we are facing is that documents have water mark on them. When the documents are scanned and put through OCR the watermark disturbs the OCR process and the text can not be extracted. An example of the documents we are processing is like https://imgur.com/QqprgcR
One can see the diagonal watermark of Approved. We need to remove this water mark keeping the text above it e.g Deputy as shown in Image. I am a newbie to imagemagick, i have tried different tutorials related to closed component labeling and morphology but could not get the watermark removed.
Can some body help to guide what would be efficient manner to remove the watermark from the document by using imagemagick?
Removing water mark from scanned image
-
- Posts: 2
- Joined: 2019-06-26T02:00:41-07:00
- Authentication code: 1152
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Removing water mark from scanned image
From the image you supply, I doubt there is a good solution. I can't see what distinguishes the large text from the small text. You could remove the long lines of the large text but cleanly removing these, while leaving "ep" of "Deputy", seems impossible.
snibgo's IM pages: im.snibgo.com
-
- Posts: 2
- Joined: 2019-06-26T02:00:41-07:00
- Authentication code: 1152
Re: Removing water mark from scanned image
What would be the way to remove long lines of large text?
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Removing water mark from scanned image
With "-connected-components".
Threshold the image, possibly after a blur, then use "-connected-components" with an area threshold to get just the large black components. Negate that, and "-compose Lighten -composite" with the original. The result will have the largest black marks whitened. But this will also whiten "ep" of "Deputy".
Threshold the image, possibly after a blur, then use "-connected-components" with an area threshold to get just the large black components. Negate that, and "-compose Lighten -composite" with the original. The result will have the largest black marks whitened. But this will also whiten "ep" of "Deputy".
snibgo's IM pages: im.snibgo.com