Page 1 of 2

Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T03:01:31-07:00
by ederhj
Hello,

I find your scripts very useful and great.

So I ask you to help me with an issue that could solved with your scripts I think, but I don't know how.

I have a document/scan as JPG where many areas are marked (with a text-marker) with 3 differnet colors.
This scan is read by OCR software (tesseract).
Now I have the idea to tag the file, to rename the file according to the colors in the file.
That means for example:
- the yellow marked text should be used for filename
- the green for tagging the file
- an so on.

My idea is to crop all areas with a certain color an then make the ocr.
But how do I get generic files for each colored area out of the source file?

Or is there another way to solve this problem?

It would be very nice, if you help me.

Thank U and best regards
ederhj

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T09:01:57-07:00
by ederhj
I hope it is clear what I want to do?
If not please ask.

I think everyone who wants a paperless buero has this need.

Thank U

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T09:18:51-07:00
by snibgo
Put up a sample image, and your expected results.

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T09:28:53-07:00
by ederhj
Hello,

the file ist here: https://www.hightail.com/download/bXBaR ... V3lHR3NUQw

My expectations is, that i get images for each mark which i do OCR with.

Thank U

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T09:52:29-07:00
by snibgo
I hope you have a better source than JPG.

The colours are more saturated than the background, so we can distunguish by saturation, and use that to mask out the rest of the image.

Code: Select all

%IM%convert scan.jpg -colorspace HSL -channel G -separate +channel -threshold 30%% s.png

%IM%convert scan.jpg s.png -compose CopyOpacity -composite s2.png

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T11:50:50-07:00
by ederhj
Almost perfect.

Only one thing i need further:
I need this "extruding" for each color, that means in my example 4 export files for each color.

How does this work.

And: THANK U for your support.

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T12:05:27-07:00
by fmw42
You can use the following to extract the largest the crop areas corresponding to the largest white areas from the s.png image. You will have too many due to the large white region at the bottom right. But you can then throw out the ones not needed or examine the bounding box coordinates for shapes that are longer in x than y. Skip the first one, which has color gray(0) and is the black background. Then get the next ones that are gray(255) until you read one that is gray(0) again. See http://magick.imagemagick.org/script/co ... onents.php

Code: Select all

convert s.png -define connected-components:verbose=true -connected-components 4 null:
Objects (id: bounding-box centroid area mean-color):
0: 885x1090+0+0 437.8,544.0 927407 gray(0)
286: 101x161+784+929 847.8,1024.5 9443 gray(255) <-- not likely what you want, not the right w/h aspect ratio
215: 176x32+278+359 365.2,373.9 4914 gray(255) <-- probably a good one
217: 193x29+524+384 618.3,398.1 4525 gray(255) <-- probably a good one
223: 130x35+361+544 425.3,561.1 3839 gray(255) <-- probably a good one
72: 207x28+469+170 575.2,182.4 3504 gray(255) <-- probably a good one
52: 157x25+239+115 316.9,126.7 3238 gray(255) <-- probably a good one
228: 83x26+347+636 387.0,648.5 1821 gray(255) <-- probably a good one
219: 84x25+136+421 177.2,432.4 1804 gray(255) <-- probably a good one
211: 80x26+139+228 177.3,240.4 1796 gray(255) <-- probably a good one
327: 17x21+836+939 843.5,950.1 168 gray(255) <-- probably too small and not the right w/h aspect ratio
505: 12x11+801+1045 807.2,1050.5 85 gray(255) <-- probably too small and not the right w/h aspect ratio
573: 26x8+774+1082 786.3,1086.6 75 gray(255) <-- probably too small and not the right w/h aspect ratio

96: 11x12+633+178 638.6,183.5 63 gray(0)
442: 12x13+793+1027 798.8,1033.4 63 gray(255)
377: 14x8+771+1000 777.9,1004.4 62 gray(255)
379: 9x7+804+1001 807.5,1004.3 55 gray(0)
...

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-08-30T12:43:32-07:00
by fmw42
P.S. If you take all the top white areas and crop your image, you can then filter further by the average color of the cropped regions. Throw out any that are near gray, i.e. keep any with color of large saturation.

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-01T09:00:04-07:00
by ederhj
Hello,

I'll try.

But is it not simple possible to crop out this areas and save them in a new file?

Tank U
Hans-Jürgen

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-01T09:52:12-07:00
by fmw42
It could be built into -connected-components, but it is not there now.

You will need to write a script to loop over the textual data and find the top group of gray(255) before the next gray(0), then separate the crop values for each and crop at those coordinates in your input image.

What platform are you on and what version of IM?

If on Unix, then the following will crop out the first group of gray(255) subsections in the list. If you want to be more selective, such as some maximum (and minimum) area, you can filter further on area. Or is you want, you can extract the W and H and compute the aspect ratio and set limits on that.

Code: Select all

list=`convert s.png \
-define connected-components:verbose=true -connected-components 4 null: |\
tail -n +3 | sed -n 's/^ *//p'`
i=0
OLDIFS=$IFS
IFS=$'\n'
for row in $list; do
color=`echo $row | cut -d\  -f5`
cropvals=`echo $row | cut -d\  -f2`
IFS=$OLDIFS
if [ "$color" = "gray(255)" ]; then
convert scan.jpg[$cropvals] scan_$i.jpg
i=$((i+1))
else
break
fi
done

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-02T05:06:27-07:00
by ederhj
Hi,

well that would me more to me preferd solution.

Im Working with Windows and ImageMagick 6.9.2-0 Q16 x86 2015-08-15.

If you could help me here again I would be very pleased.

Thank U

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-02T08:38:39-07:00
by fmw42
Sorry, I do not know Windows scripting.

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-02T08:41:14-07:00
by ederhj
Ah.

No Prob. But can you tell me, what steps your script does so that I'll be able to transfer it to my script-language?

Thank U

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-02T09:08:37-07:00
by fmw42

Code: Select all

# line 1 -- read the input image
# line 2 -- process with -connected components to get list and pipe it to next line
# line 3 -- remove the top two rows and remove the leading spaces of all rows
# line 4 -- set output image index to 0
# line 5 -- save the existing internal field separator, which is a space
# line 6 -- set the internal field separator to a new line so that each item in the list is a row
# line 7 -- loop over each row in the list
# line 8 -- extract the color field of the row
# line 9 -- extract the crop values field of the row
# line 10 - reset the internal field separator to a space
# line 11 - test if the color variable is gray(255)
# line 12 - if test passes, then crop with the corresponding crop values
# line 13 - increment the output index by 1
# line 14 - else statement of test
# line 15 - if test fails, it means that the current color is not gray(255), so break the loop and quit
# line 16 - fi is end of if test
# line 17 - done is end of for loop

Code: Select all

list=`convert s.png \
-define connected-components:verbose=true -connected-components 4 null: |\
tail -n +3 | sed -n 's/^ *//p'`
i=0
OLDIFS=$IFS
IFS=$'\n'
for row in $list; do
color=`echo $row | cut -d\  -f5`
cropvals=`echo $row | cut -d\  -f2`
IFS=$OLDIFS
if [ "$color" = "gray(255)" ]; then
convert scan.jpg[$cropvals] scan_$i.jpg
i=$((i+1))
else
break
fi
done

Re: Crop all areas with a certain color (get text out of colored areas)

Posted: 2015-09-12T05:45:53-07:00
by ederhj
I don't know if I get it excatly.
You crop each pixel, is it?

I want du crop a certain area, where is a certain color with a little bit of variance.

the $cropvals ist always one pixel. so what do i do with that?

Thank U