crop columns out of dictionary page

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: crop columns out of dictionary page

Post by fmw42 »

Sorry I left in some debugging code. Try this:

Code: Select all

infile="page-004.png"
inname=`convert -ping "$infile" -format "%t" info:`
suffix=`convert -ping "$infile" -format "%e" info:`
convert "$infile" -auto-level -morphology smooth diamond:1 \
-background white -deskew 40% +repage \
-fuzz 10% -trim +repage tmp.png
OIFS=$IFS
IFS=$'\n'
white_arr=(`convert tmp.png -auto-level -threshold 75% -scale x1! txt: |\
tail -n +2 | tr -cs "0-9\n" " " | grep -e '.* .* 255'`)
#echo "${white_arr[*]}"
num=${#white_arr[*]}
IFS=$OIFS
middle=`convert xc: -format "%[fx:round($num/2)]" info:`
#echo "middle=$middle"
xcrop=`echo "${white_arr[$middle]}" | cut -d\ -f1`
#echo "xcrop=$xcrop"
ww=`convert -ping tmp.png -format "%w" info:`
hh=`convert -ping tmp.png -format "%h" info:`
ww1=$((xcrop+1))
dim1="${ww1}x${hh}+0+0"
ww2=`convert xc: -format "%[fx:$ww-$xcrop-1]" info:`
xoff2=$ww1
dim2="${ww2}x${hh}+${xoff2}+0"
#echo "dim1=$dim1; dim2=$dim2;"
convert tmp.png \
\( -clone 0 -crop $dim1 +repage -fuzz 10% -trim +repage -write ${inname}_left.$suffix \) \
\( -clone 0 -crop $dim2 +repage -fuzz 10% -trim +repage -write ${inname}_right.$suffix \) \
null:
rm -f tmp.png
johnbent
Posts: 14
Joined: 2014-12-16T10:08:07-07:00
Authentication code: 6789

Re: crop columns out of dictionary page

Post by johnbent »

:(

> /tmp/columnize4.sh page-004.png
cut: bad delimiter
convert: geometry does not contain image `tmp.png' @ warning/attribute.c/GetImageBoundingBox/247.

And the output "columns" have left empty and right being the full original image.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: crop columns out of dictionary page

Post by fmw42 »

What s columnize4.sh? I did not write that? I would need to see the script to comment. tmp.png is used only in the above command lines and is deleted at the end. I do not know what you are doing in columnize4 over and above my commands.

You are right, though, that the script is not working. I am not sure what has changed. It worked this morning for me. I will look into it further.
johnbent
Posts: 14
Joined: 2014-12-16T10:08:07-07:00
Authentication code: 6789

Re: crop columns out of dictionary page

Post by johnbent »

Sorry for confusion; columnize4.sh is indeed your code that I copied out of your previous post. It is unedited except that I changed 'infile="page-004.png"' to be 'infile=$1' so I could run it with bash and pass the filename as the command line argument.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: crop columns out of dictionary page

Post by fmw42 »

OK. I have found it. Somehow, when pasting the code into the message a space got left out of the cut -d\ command. It needs two spaces after the \ and before -f` for the line starting with xcrop. Here is the corrected code. Be sure when you copy and paste, it has the needed two spaces.

Code: Select all

infile="page-004.png"
inname=`convert -ping "$infile" -format "%t" info:`
suffix=`convert -ping "$infile" -format "%e" info:`
convert "$infile" -auto-level -morphology smooth diamond:1 \
-background white -deskew 40% +repage \
-fuzz 10% -trim +repage tmp.png
OIFS=$IFS
IFS=$'\n'
white_arr=(`convert tmp.png -auto-level -threshold 75% -scale x1! txt: |\
tail -n +2 | tr -cs "0-9\n" " " | grep -e '.* .* 255'`)
#echo "${white_arr[*]}"
num=${#white_arr[*]}
IFS=$OIFS
middle=`convert xc: -format "%[fx:round($num/2)]" info:`
#echo "middle=$middle"
xcrop=`echo "${white_arr[$middle]}" | cut -d\  -f1`
#echo "xcrop=$xcrop"
ww=`convert -ping tmp.png -format "%w" info:`
hh=`convert -ping tmp.png -format "%h" info:`
ww1=$((xcrop+1))
dim1="${ww1}x${hh}+0+0"
ww2=`convert xc: -format "%[fx:$ww-$xcrop-1]" info:`
xoff2=$ww1
dim2="${ww2}x${hh}+${xoff2}+0"
#echo "dim1=$dim1; dim2=$dim2;"
convert tmp.png \
\( -clone 0 -crop $dim1 +repage -fuzz 10% -trim +repage -write ${inname}_left.$suffix \) \
\( -clone 0 -crop $dim2 +repage -fuzz 10% -trim +repage -write ${inname}_right.$suffix \) \
null:
rm -f tmp.png
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: crop columns out of dictionary page

Post by fmw42 »

Perhaps a safer method is to put the space in quotes for cut -d' ' -f1

Code: Select all

infile="page-004.png"
inname=`convert -ping "$infile" -format "%t" info:`
suffix=`convert -ping "$infile" -format "%e" info:`
convert "$infile" -auto-level -morphology smooth diamond:1 \
-background white -deskew 40% +repage \
-fuzz 10% -trim +repage tmp.png
OIFS=$IFS
IFS=$'\n'
white_arr=(`convert tmp.png -auto-level -threshold 75% -scale x1! txt: |\
tail -n +2 | tr -cs "0-9\n" " " | grep -e '.* .* 255'`)
#echo "${white_arr[*]}"
num=${#white_arr[*]}
IFS=$OIFS
middle=`convert xc: -format "%[fx:round($num/2)]" info:`
#echo "middle=$middle"
xcrop=`echo "${white_arr[$middle]}" | cut -d' ' -f1`
#echo "xcrop=$xcrop"
ww=`convert -ping tmp.png -format "%w" info:`
hh=`convert -ping tmp.png -format "%h" info:`
ww1=$((xcrop+1))
dim1="${ww1}x${hh}+0+0"
ww2=`convert xc: -format "%[fx:$ww-$xcrop-1]" info:`
xoff2=$ww1
dim2="${ww2}x${hh}+${xoff2}+0"
#echo "dim1=$dim1; dim2=$dim2;"
convert tmp.png -write show: \
\( -clone 0 -crop $dim1 +repage -fuzz 10% -trim +repage -write ${inname}_left.$suffix \) \
\( -clone 0 -crop $dim2 +repage -fuzz 10% -trim +repage -write ${inname}_right.$suffix \) \
null:
rm -f tmp.png
johnbent
Posts: 14
Joined: 2014-12-16T10:08:07-07:00
Authentication code: 6789

Re: crop columns out of dictionary page

Post by johnbent »

Thanks again so very much! I tried this and it works great. There are 500 pages and seems to work correctly for about 90% of them!! I'm happy to do the remaining fifty manually; actually it will probably only be about 20 to do manually since 20 of the 50 are the weird "chapter" pages and probably another 10 are blank pages. I'm not sure what's up with the few pages that don't work but again this is plenty good enough for me! If you're curious, for example, it's not working correctly on page-010.png:

Image
johnbent
Posts: 14
Joined: 2014-12-16T10:08:07-07:00
Authentication code: 6789

Re: crop columns out of dictionary page

Post by johnbent »

Maybe the skew is too severe?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: crop columns out of dictionary page

Post by fmw42 »

The problem with page-010 is that there is a black speck in the middle of the right margin, which prevents the trim from removing all the right white space and thus skewing the location of the middle. The middle should be the only place that is totally white, otherwise it will crop wrong.

So I put in a shave option too trim once, then shave a few pixels off the image all around, and then trim again. This allows this page to work.

The argument shaving="10x10" means shave off 10 pixels at the left and right and 10 pixels at the top and bottom. You can adjust that to whatever makes things work. But it could be image dependent. If you have dark specks in the original or that appear from specks on the scanner, then you will likely have trouble. Also the page number at the bottom center is right in the area that needs to be totally white. It is fortunate that this seems to work anyway. But there is no guarantees. You may have to shave that off. That is more on the top and bottom and less on the sides. If you need to cut differently on all four sides, I can modify this script to use -chop rather than -shave.

If you want a few pixels of white margin put onto the left and right image, that can be done also.

Here is the code.

Code: Select all

infile="page-010.png"
shaving="10x10"
inname=`convert -ping "$infile" -format "%t" info:`
suffix=`convert -ping "$infile" -format "%e" info:`
convert "$infile" -auto-level -morphology smooth diamond:1 \
-background white -deskew 40% +repage \
-fuzz 10% -trim +repage -shave $shaving -trim +repage tmp.png
OIFS=$IFS
IFS=$'\n'
white_arr=(`convert tmp.png -auto-level -threshold 75% -scale x1! txt: |\
tail -n +2 | tr -cs "0-9\n" " " | grep -e '.* .* 255'`)
#echo "${white_arr[*]}"
num=${#white_arr[*]}
IFS=$OIFS
middle=`convert xc: -format "%[fx:round($num/2)]" info:`
#echo "middle=$middle"
xcrop=`echo "${white_arr[$middle]}" | cut -d' ' -f1`
#echo "xcrop=$xcrop"
ww=`convert -ping tmp.png -format "%w" info:`
hh=`convert -ping tmp.png -format "%h" info:`
ww1=$((xcrop+1))
dim1="${ww1}x${hh}+0+0"
ww2=`convert xc: -format "%[fx:$ww-$xcrop-1]" info:`
xoff2=$ww1
dim2="${ww2}x${hh}+${xoff2}+0"
#echo "dim1=$dim1; dim2=$dim2;"
convert tmp.png -write show: \
\( -clone 0 -crop $dim1 +repage -fuzz 10% -trim +repage -write ${inname}_left.$suffix \) \
\( -clone 0 -crop $dim2 +repage -fuzz 10% -trim +repage -write ${inname}_right.$suffix \) \
null:
rm -f tmp.png
Post Reply