Page 1 of 1

convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-10T23:10:03-07:00
by stozi
Hi I want to convert multi-page colour scan-based pdfs to high contrast black & white. I've tried
$ convert input.pdf -level 50%,50% output.pdf
and
$ convert input.pdf -density 300 -level 50%,50% output.pdf

but in both the text in the scans becomes unclear and pixelly. How do I retain the original 'sharpness ' of the text while making this conversion? Thank you.

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-10T23:39:55-07:00
by fmw42
try supersampling, by providing a high density before reading the image, then resize afterwards.

Code: Select all

convert -density 288 image.pdf ... your processing ... -resize 25% output.pdf
or just

Code: Select all

convert -density 288 image.pdf ... your processing ... output.pdf

But perhaps you should post an example image and also provide us with your IM version and platform.

Perhaps you want -threshold 50% rather than -level.

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T07:44:34-07:00
by stozi
ok. I'm on Arch Linux with imagemagick 6.9.2.0-1.
All the pdfs are quite similar to this one, mostly made with the same book-scanner http://vk.com/doc269758631_424827626
I got

Code: Select all

$ convert -density 288 input.pdf -density 300 -threshold 50% output.pdf
convert: unable to extent pixel cache `No such file or directory' @ fatal/cache.c/CacheSignalHandler/3390.

Code: Select all

$ convert -density 288 input.pdf -density 300 -level 50%,50% output.pdf
convert: unable to extent pixel cache `No such file or directory' @ fatal/cache.c/CacheSignalHandler/3390.

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T08:05:51-07:00
by snibgo
As each page contains a single raster image and nothing else, IM is not the obvious tool to extract the images. A more suitable tool is pdfimages:

Code: Select all

pdfimages -all Gilmour_fascist_italy.pdf gilm
Then process each image as you want.

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T09:51:07-07:00
by stozi
one by one? I have a few pdfs with over 600 pages. any way to do them as a batch?

Code: Select all

$ convert in.pdf -threshold 50% out.pdf
does the job on a single image.

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T10:49:53-07:00
by fmw42
mogrify will do a whole folder of images. I suggest you create a new empty directory to hold the output images so you do not write over the originals. Say you have images in folder1 and create an empty folder2


change directory to folder1

Code: Select all

mogrify -path path2/folder2 -format pdf -threshold 50% *.pdf
See http://www.imagemagick.org/Usage/basics/#mogrify

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T11:34:56-07:00
by stozi
that gave me a complicated fail,

Code: Select all

$ mogrify -path ./ -format pdf -threshold 50% *.pdf
libpng error: Write Error
mogrify: PDFDelegateFailed `[ghostscript library 9.18] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=/tmp/magick-2377mnK6ZlxULIWr%d" "-f/tmp/magick-2377AhoF2ccOVk4y" "-f/tmp/magick-2377UaWJjhh3SCfG"': Error: /VMerror in --showpage--
VM status: 3 12019364 13454248
Current allocation mode is local
Last OS error: 28
GPL Ghostscript 9.18: Unrecoverable error, exit code 1
 @ error/pdf.c/InvokePDFDelegate/271.
but this set of commands seems to work ok

Code: Select all

$ pdfimages -all input.pdf name

$ find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% {} ouputdirectory/{} \;

$ mogrify -format pdf -- *.jpg

$ pdfunite *.pdf output.pdf
but can anyone tell my why if try

Code: Select all

find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% -format pdf {} ./{} \;
the threshold works but not the format, without giving any error

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T14:23:09-07:00
by fmw42
libpng error: Write Error
Try changing the delegates.xml file entry for ps (pdf) to pnmraw rather than pngalpha

Or try downgrading GS to at least GS 9.15 (9.16 has been known to have bugs and I do not know about 9.18)


The following seems to work for me. IM 6.9.2.3 Q16 Mac OSX with GS 9.10 and

Code: Select all

<delegate decode="pdf" encode="eps" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=epswrite" "-sOutputFile=%o" "-f%i""/>

<delegate decode="pdf" encode="ps" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=ps2write" "-sOutputFile=%o" "-f%i""/>

<delegate decode="ps:alpha" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

<delegate decode="ps:cmyk" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pamcmyk32" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

<delegate decode="ps:color" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pnmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

<delegate decode="ps" encode="eps" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=epswrite" "-sOutputFile=%o" "-f%i""/>

<delegate decode="ps" encode="pdf" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pdfwrite" "-sOutputFile=%o" "-f%i""/>

<delegate decode="ps" encode="print" mode="encode" command="lpr "%i""/>

<delegate decode="ps:mono" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pbmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

Code: Select all

convert rose: rose: rose: rose: rose: rose.pdf
Then I duplicated it twice and put it into test1 and created an empty test2

Code: Select all

cd test1
mogrify -path ../test2 -format pdf -threshold 50% *.pdf
The results look fine


Also for you last question
convert needs an explicit input image and output image

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T14:48:46-07:00
by snibgo
stozi wrote:... can anyone tell my why if try

find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% -format pdf {} ./{} \;

the threshold works but not the format, without giving any error
Because "-format" in convert does something different to "-format" in mogrify. See http://www.imagemagick.org/script/comma ... php#format

Re: convert multi-page colour pdfs to high contrast black & white

Posted: 2015-10-11T15:21:52-07:00
by stozi
Ok, fmw42, that's all way above my head. Anyway, the important thing is I got what I wanted with

Code: Select all

$ pdfimages -all input.pdf name

$ find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% {} ouputdirectory/{} \;

$ mogrify -format pdf -- *.jpg

$ pdfunite *.pdf output.pdf
Thank you all.