Page 2 of 2
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-25T10:32:54-07:00
by bratpit
To convert pdf to png use ghostscript directly not IM.
IM uses ghostscript but is magnitude slower.
This is nix command in M$ will be similar.
gs -sDEVICE=png16m -dDOINTERPOLATE -dQUIET -sOutputFile=%03d.png -dDownScaleFactor=3 -dSAFER -dBATCH -dNOPAUSE -r900 in.pdf
r900 and
dDownScaleFactor on the fly in memory do the same like convert
-density 900 -resize 33%
to improove quality but a lot faster .
For grayscale use
-sDEVICE=pnggray
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-25T12:35:07-07:00
by snibgo
@isfando: You have reverted to small characters, as you had in your first posts. Why? Your later post had larger characters, which will give better quality, thus better OCR.
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-25T15:40:03-07:00
by isfando
snibgo wrote: ↑2018-09-25T12:35:07-07:00
@isfando: You have reverted to small characters, as you had in your first posts. Why? Your later post had larger characters, which will give better quality, thus better OCR.
@snibgo What shows that i have reverted to small characters. I am not able to grasp it.
I am trying to stream line the approaches you taught me step by step.
I used your script in the 'EARLIER APPROACH' and got good results which are sharp in quality. But I am using convert twice.
In 'CURRENT APPROACH' i am using convert once and i am not able to apply same parameters as step1 of 'EARLIER 'APPROACH' and my results are not good.My question is how can i can join step1 and step2 of 'EARLIER APPROACH' into step1 of 'CURRENT APPROACH'.
********************EARLIER APPROACH******************************
1)
Code: Select all
convert -density 300 ./sam.pdf -depth 8 -strip -background white -alpha off -threshold 70% sam.png
the output image sam.png from this step is pretty crisp so the result in step 3 is also crisp
https://drive.google.com/open?id=1fBFFo ... HG-8w-6zGI
2)
Code: Select all
convert ^
sam.png ^
-strip ^
( +clone ^
-threshold 50%% ^
-write mpr:ORG ^
+delete ^
) ^
( mpr:ORG ^
-negate ^
-morphology Erode rectangle:200x1 ^
-mask mpr:ORG -morphology Dilate rectangle:200x1 ^
+mask ^
-morphology Dilate Disk:3 ^
) ^
-compose Lighten -composite ^
( +clone ^
-morphology HMT "1x4:1,0,0,1" ^
) ^
-compose Lighten -composite ^
( +clone ^
-morphology HMT "1x3:1,0,1" ^
) ^
-compose Lighten -composite ^
( +clone ^
-morphology HMT "3x1:1,0,1" ^
) ^
-compose Lighten -composite ^
-blur 0x0.5 out.png
3) Result image
https://drive.google.com/open?id=1obtnH ... hHFV0VsOCL
################################################################################################################33
*********************CURRENT APPROACH*********************************
1) doonepage.bat
Code: Select all
convert ^
-density 300 ^
%1 ^
-depth 8 ^
-strip ^
( +clone ^
-threshold 50%% ^
-write mpr:ORG ^
+delete ^
) ^
( mpr:ORG ^
-negate ^
-morphology Erode rectangle:200x1 ^
-mask mpr:ORG -morphology Dilate rectangle:200x1 ^
+mask ^
-morphology Dilate Disk:3 ^
) ^
-compose Lighten -composite ^
( +clone ^
-morphology HMT "1x4:1,0,0,1" ^
) ^
-compose Lighten -composite ^
( +clone ^
-morphology HMT "1x3:1,0,1" ^
) ^
-compose Lighten -composite ^
( +clone ^
-morphology HMT "3x1:1,0,1" ^
) ^
-compose Lighten -composite ^
-blur 0x0.5 %2
2) domanypages.bat
Code: Select all
set INPDF=sam.pdf
for /F "usebackq" %%L in (`exiftool -args -PageCount %INPDF%`) do set %%L
set /A LASTPAGE=%-PageCount%-1
for /L %%I in (0,1,%LASTPAGE%) do call DoOnePage %INPDF%[%%I] out_%%I.png
3)Result
https://drive.google.com/open?id=1toqjB ... 5pItrdMeRi
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-25T15:42:33-07:00
by isfando
@bratpit
okay i would give it a try.
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-26T04:04:12-07:00
by isfando
@snibgo
do you have a shell script alternative to this batch script.
Code: Select all
set INPDF=sam.pdf
for /F "usebackq" %%L in (`exiftool -args -PageCount %INPDF%`) do set %%L
set /A LASTPAGE=%-PageCount%-1
for /L %%I in (0,1,%LASTPAGE%) do call DoOnePage %INPDF%[%%I] out_%%I.png
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-26T04:14:30-07:00
by snibgo
I don't understand your question. That is a shell script, for the Windows BAT language. It could be translated to any other shell language.
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-26T04:24:46-07:00
by isfando
@snibgo
Sorry i meant to say do you have an alternative to this script in the format of a bash script which can be run on linux server.
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-26T05:58:27-07:00
by snibgo
No.
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-26T06:07:36-07:00
by isfando
@snibgo okay thanks alot. Sorry i had to ask alot questions because of lack of previous experience on the topic.
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-26T07:00:46-07:00
by isfando
@snibgo i was able to formulate the following script for bash. (I feel good to contribute somehow to the topic)
Code: Select all
#!/bin/bash
INPDF=$1
PAGES=$(exiftool -args -PageCount $INPDF | cut -d'=' -f2)
N=$(( $PAGES ))
for ((I=1;I<=$N;I++));
do
convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70%% out_${I}.png
convert \
out_${I}.png\
-strip \
\( +clone \
-threshold 50%% \
-write mpr:ORG \
+delete \
\) \
\( mpr:ORG \
-negate \
-morphology Erode rectangle:200x1 \
-mask mpr:ORG -morphology Dilate rectangle:200x1 \
+mask \
-morphology Dilate Disk:3 \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "1x4:1,0,0,1" \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "1x3:1,0,1" \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "3x1:1,0,1" \
\) \
-compose Lighten -composite \
-blur 0x0.5 out_${I}.png
done
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-27T06:23:38-07:00
by isfando
isfando wrote: ↑2018-09-26T07:00:46-07:00
@snibgo i was able to formulate the following script for bash. (I feel good to contribute somehow to the topic)
Code: Select all
#!/bin/bash
INPDF=$1
PAGES=$(exiftool -args -PageCount $INPDF | cut -d'=' -f2)
N=$(( $PAGES ))
for ((I=1;I<=$N;I++));
do
convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70%% out_${I}.png
convert \
out_${I}.png\
-strip \
\( +clone \
-threshold 50%% \
-write mpr:ORG \
+delete \
\) \
\( mpr:ORG \
-negate \
-morphology Erode rectangle:200x1 \
-mask mpr:ORG -morphology Dilate rectangle:200x1 \
+mask \
-morphology Dilate Disk:3 \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "1x4:1,0,0,1" \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "1x3:1,0,1" \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "3x1:1,0,1" \
\) \
-compose Lighten -composite \
-blur 0x0.5 out_${I}.png
done
@snibgo i am using two convert commands in my bash script. How can i feed the output of the first convert command to second convert command without making a temporary png image because i am executing my script in multithreaded environment
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-27T07:13:26-07:00
by snibgo
For bash, don't double the percent signs.
You can combine the two converts, so you have only one, removing the final write from the first and the initial read from the second, like this:
Code: Select all
convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70% \
-strip \
\( +clone \
{and so on}
The second strip is redundant, and can be removed.
"-depth 8" has no effect until the output is written. But do you really want that?
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-27T23:54:16-07:00
by bratpit
isfando wrote: ↑2018-09-27T06:23:38-07:00
isfando wrote: ↑2018-09-26T07:00:46-07:00
@snibgo i was able to formulate the following script for bash. (I feel good to contribute somehow to the topic)
Code: Select all
#!/bin/bash
INPDF=$1
PAGES=$(exiftool -args -PageCount $INPDF | cut -d'=' -f2)
N=$(( $PAGES ))
for ((I=1;I<=$N;I++));
do
convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70%% out_${I}.png
convert \
out_${I}.png\
-strip \
\( +clone \
-threshold 50%% \
-write mpr:ORG \
+delete \
\) \
\( mpr:ORG \
-negate \
-morphology Erode rectangle:200x1 \
-mask mpr:ORG -morphology Dilate rectangle:200x1 \
+mask \
-morphology Dilate Disk:3 \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "1x4:1,0,0,1" \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "1x3:1,0,1" \
\) \
-compose Lighten -composite \
\( +clone \
-morphology HMT "3x1:1,0,1" \
\) \
-compose Lighten -composite \
-blur 0x0.5 out_${I}.png
done
@snibgo i am using two convert commands in my bash script. How can i feed the output of the first convert command to second convert command without making a temporary png image because i am executing my script in multithreaded environment
Check this with Ghostscript.
Besides your script doesn't preserve spaces in filenames.
This way is better IMHO.
Code: Select all
#!/bin/bash
INPDF="$1"
gs -sDEVICE=pnggray -dDOINTERPOLATE -dQUIET -sOutputFile=%03d.png -dDownScaleFactor=3 -dSAFER -dBATCH -dNOPAUSE -r900 "$INPDF"
find . -mindepth 1 -maxdepth 1 -type f -name '*.png' |
while read file; do
convert "$file" -depth 8 -strip -background white -alpha off -threshold 70% out_"${file}"
done
Re: Remove horizontal summation lines but keep a minus
Posted: 2018-09-28T04:23:07-07:00
by snibgo
Yes, if spaces are permitted in filenames, they should be quoted.