convert multi-page colour pdfs to high contrast black & white

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
stozi
Posts: 5
Joined: 2015-10-10T23:03:14-07:00
Authentication code: 1151

convert multi-page colour pdfs to high contrast black & white

Post by stozi »

Hi I want to convert multi-page colour scan-based pdfs to high contrast black & white. I've tried
$ convert input.pdf -level 50%,50% output.pdf
and
$ convert input.pdf -density 300 -level 50%,50% output.pdf

but in both the text in the scans becomes unclear and pixelly. How do I retain the original 'sharpness ' of the text while making this conversion? Thank you.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert multi-page colour pdfs to high contrast black & white

Post by fmw42 »

try supersampling, by providing a high density before reading the image, then resize afterwards.

Code: Select all

convert -density 288 image.pdf ... your processing ... -resize 25% output.pdf
or just

Code: Select all

convert -density 288 image.pdf ... your processing ... output.pdf

But perhaps you should post an example image and also provide us with your IM version and platform.

Perhaps you want -threshold 50% rather than -level.
stozi
Posts: 5
Joined: 2015-10-10T23:03:14-07:00
Authentication code: 1151

Re: convert multi-page colour pdfs to high contrast black & white

Post by stozi »

ok. I'm on Arch Linux with imagemagick 6.9.2.0-1.
All the pdfs are quite similar to this one, mostly made with the same book-scanner http://vk.com/doc269758631_424827626
I got

Code: Select all

$ convert -density 288 input.pdf -density 300 -threshold 50% output.pdf
convert: unable to extent pixel cache `No such file or directory' @ fatal/cache.c/CacheSignalHandler/3390.

Code: Select all

$ convert -density 288 input.pdf -density 300 -level 50%,50% output.pdf
convert: unable to extent pixel cache `No such file or directory' @ fatal/cache.c/CacheSignalHandler/3390.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: convert multi-page colour pdfs to high contrast black & white

Post by snibgo »

As each page contains a single raster image and nothing else, IM is not the obvious tool to extract the images. A more suitable tool is pdfimages:

Code: Select all

pdfimages -all Gilmour_fascist_italy.pdf gilm
Then process each image as you want.
snibgo's IM pages: im.snibgo.com
stozi
Posts: 5
Joined: 2015-10-10T23:03:14-07:00
Authentication code: 1151

Re: convert multi-page colour pdfs to high contrast black & white

Post by stozi »

one by one? I have a few pdfs with over 600 pages. any way to do them as a batch?

Code: Select all

$ convert in.pdf -threshold 50% out.pdf
does the job on a single image.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert multi-page colour pdfs to high contrast black & white

Post by fmw42 »

mogrify will do a whole folder of images. I suggest you create a new empty directory to hold the output images so you do not write over the originals. Say you have images in folder1 and create an empty folder2


change directory to folder1

Code: Select all

mogrify -path path2/folder2 -format pdf -threshold 50% *.pdf
See http://www.imagemagick.org/Usage/basics/#mogrify
stozi
Posts: 5
Joined: 2015-10-10T23:03:14-07:00
Authentication code: 1151

Re: convert multi-page colour pdfs to high contrast black & white

Post by stozi »

that gave me a complicated fail,

Code: Select all

$ mogrify -path ./ -format pdf -threshold 50% *.pdf
libpng error: Write Error
mogrify: PDFDelegateFailed `[ghostscript library 9.18] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=/tmp/magick-2377mnK6ZlxULIWr%d" "-f/tmp/magick-2377AhoF2ccOVk4y" "-f/tmp/magick-2377UaWJjhh3SCfG"': Error: /VMerror in --showpage--
VM status: 3 12019364 13454248
Current allocation mode is local
Last OS error: 28
GPL Ghostscript 9.18: Unrecoverable error, exit code 1
 @ error/pdf.c/InvokePDFDelegate/271.
but this set of commands seems to work ok

Code: Select all

$ pdfimages -all input.pdf name

$ find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% {} ouputdirectory/{} \;

$ mogrify -format pdf -- *.jpg

$ pdfunite *.pdf output.pdf
but can anyone tell my why if try

Code: Select all

find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% -format pdf {} ./{} \;
the threshold works but not the format, without giving any error
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: convert multi-page colour pdfs to high contrast black & white

Post by fmw42 »

libpng error: Write Error
Try changing the delegates.xml file entry for ps (pdf) to pnmraw rather than pngalpha

Or try downgrading GS to at least GS 9.15 (9.16 has been known to have bugs and I do not know about 9.18)


The following seems to work for me. IM 6.9.2.3 Q16 Mac OSX with GS 9.10 and

Code: Select all

<delegate decode="pdf" encode="eps" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=epswrite" "-sOutputFile=%o" "-f%i""/>

<delegate decode="pdf" encode="ps" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=ps2write" "-sOutputFile=%o" "-f%i""/>

<delegate decode="ps:alpha" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

<delegate decode="ps:cmyk" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pamcmyk32" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

<delegate decode="ps:color" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pnmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

<delegate decode="ps" encode="eps" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=epswrite" "-sOutputFile=%o" "-f%i""/>

<delegate decode="ps" encode="pdf" mode="bi" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pdfwrite" "-sOutputFile=%o" "-f%i""/>

<delegate decode="ps" encode="print" mode="encode" command="lpr "%i""/>

<delegate decode="ps:mono" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pbmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

Code: Select all

convert rose: rose: rose: rose: rose: rose.pdf
Then I duplicated it twice and put it into test1 and created an empty test2

Code: Select all

cd test1
mogrify -path ../test2 -format pdf -threshold 50% *.pdf
The results look fine


Also for you last question
convert needs an explicit input image and output image
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: convert multi-page colour pdfs to high contrast black & white

Post by snibgo »

stozi wrote:... can anyone tell my why if try

find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% -format pdf {} ./{} \;

the threshold works but not the format, without giving any error
Because "-format" in convert does something different to "-format" in mogrify. See http://www.imagemagick.org/script/comma ... php#format
snibgo's IM pages: im.snibgo.com
stozi
Posts: 5
Joined: 2015-10-10T23:03:14-07:00
Authentication code: 1151

Re: convert multi-page colour pdfs to high contrast black & white

Post by stozi »

Ok, fmw42, that's all way above my head. Anyway, the important thing is I got what I wanted with

Code: Select all

$ pdfimages -all input.pdf name

$ find . -maxdepth 1 -iname '*jpg' -exec convert -threshold 50% {} ouputdirectory/{} \;

$ mogrify -format pdf -- *.jpg

$ pdfunite *.pdf output.pdf
Thank you all.
Post Reply