Page 1 of 1

[SOLVED]6.8.8+Win7 x64 - converting PDF pages to PNM pipe

Posted: 2014-03-07T20:16:25-07:00
by ghorg
EDIT:
A short summary of the problems I was having and how they were solved.
  • PNMPlus tools (specifically pnmtopnm -plain) do not work well with convert's output stream

    Skip the use of pnmtopnm and instead use the -compress none option when converting to a PBM image.
  • convert sends PNM output to STDOUT in a stream rather than "chunking" it.

    That is by design. http://www.imagemagick.org/Usage/files/#adjoin
    A major problem with saving images, is that ImageMagick works with a ordered sequence (list) of images, not just one image at a time. Because of this IM will attempt to write ALL the images in the current image sequence into the filename given.
    It is the responsibility of the script/program ingesting the output from convert to buffer the input stream until a complete image has been obtained.
original post follows:
------------------------

I am using the Windows precompiled binaryImageMagick-6.8.8-0-Q16-x64-dll.exe on Windows 7 Professional x86.

Code: Select all

>convert -version
Version: ImageMagick 6.8.8-0 Q16 x64 2013-12-21 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC Modules OpenMP
Delegates: bzlib cairo freetype jbig jng jp2 jpeg lcms lqr pangocairo png ps rsvg tiff webp xml zlib
I am trying to run a pipeline where I use "convert" to process each page of a PDF, crop a selection, convert the selection to a "raw" PBM stream (type P4), convert that to a "plain" PBM stream (type P1) with the pnmtopnm program from the GnuWin32 package, and then send that stream to a script for analysis.

Here is my pipeline:

Code: Select all

convert pdf:2a_e9c_540.pdf -crop 10x10+150+150 +repage pbm:- | pnmtopnm -plain | BITS-N-BOBS.py
The problem is that convert appears to send all the separate pages from the PDF in one long stream instead of "chunks". The pnmtopnm program only manages to convert the first page image in the sequence since the P4 header for the subsequent image is sent with the last line as the current image.

Here is an example of the output from convert:
(converted to a plain P1 PBM stream for posting purposes; the actual output from convert is a binary P4 stream)

Code: Select all

P1
10 10
0000000000
0110000110
0110000110
0110000110
0110000110
0110000110
0110000110
0111111110
0111111110
0000000000P1
10 10
0000000000
0000110000
0000110000
0000110000
0111111110
0111111110
0000110000
0000110000
0000110000
0000000000P1
10 10
0000000000
0111111110
0111111110
0110000110
0110000110
0110000110
0110000110
0111111110
0111111110
0000000000
I am at a bit of a loss as to how I should proceed.

Re: 6.8.8+Win7 x64 - issues converting PDF pages to PNM pipe

Posted: 2014-03-07T20:36:05-07:00
by snibgo
I don't know if it would help, but convert can create the text (uncompressed) form directly, with "-compress none".

"adjoin" may also be useful. See http://www.imagemagick.org/Usage/formats/#pbmplus

Re: 6.8.8+Win7 x64 - issues converting PDF pages to PNM pipe

Posted: 2014-03-07T21:32:50-07:00
by ghorg
snibgo wrote:I don't know if it would help, but convert can create the text (uncompressed) form directly, with "-compress none".

"adjoin" may also be useful. See http://www.imagemagick.org/Usage/formats/#pbmplus
Thanks for the information.

"-compress none" works great. I can do away with the whole "pnmtopnm" step now.

However, I could not get "+adjoin" to work.

I tried the collowing command:

Code: Select all

convert pdf:2a_e9c_540.pdf -crop 10x10+150+150 +repage -compress none +adjoin pbm:-
convert still sends all the separate pages from the PDF in one long stream instead of "chunks".

The more I think about it though, the more I believe what I am searching for is meaningless; STDOUT is simply a stream of characters/bytes without any notion of an discrete "image". It would really be the responsibility of the final script (BITS-N-BOBS.py) to buffer the stream from convert until a full PBM image is obtained.

Or is there a hidden/little-known command option that will "chunk" an image stream sent to STDOUT?

Re: 6.8.8+Win7 x64 - issues converting PDF pages to PNM pipe

Posted: 2014-03-08T07:19:12-07:00
by snibgo
ghorg wrote:... what I am searching for is meaningless
Yes, I think so.

Your Python script should parse whatever it receives. When it sees "P" as the first character on a line, it should realise this is the start of a new image. The values it has received so far should be the correct number for the given width and height.

Re: 6.8.8+Win7 x64 - issues converting PDF pages to PNM pipe

Posted: 2014-03-08T14:35:33-07:00
by ghorg
snibgo wrote:
ghorg wrote:... what I am searching for is meaningless
Yes, I think so.

Your Python script should parse whatever it receives. When it sees "P" as the first character on a line, it should realise this is the start of a new image. The values it has received so far should be the correct number for the given width and height.
Thanks Snibgo.
That is what I figured. Now that I know the proper way to process a PNM pipeline, I will mark this as [SOLVED]