Page 1 of 1

How to extract a page from a multipage tiff without decompressing?

Posted: 2016-10-09T05:14:55-07:00
by 6bUDNpXmWoj7N5jT
Dear all,

I have thousands of multipage tiff images which are compressed with JPEG. From time to time, I have to extract a single page from one of these documents.

My question is: How can I extract a single page from a multipage tiff which is JPEG-compressed without decomressing the single page?

An example: I have two tiff files of 25 MB each. Using IM, I join them to a multipage tiff using JPEG compression. The new multipage tiff is then about 4 MB. If I then use IM to extract one page from the new multipage tiff, the resulting file again is 25 MB (like the original one) instead of about 2 MB like expected.

This means that IM decompresses the page when extracting. This is not what I would like to do. I would like to extract the page (which already is compressed) in unaltered form.

I know that there are multiple other threads dealing with multipage tiffs, but only a few of them deal with JPEG compression, and I have only found one thread with an answer to this question (https://www.imagemagick.org/discourse-s ... hp?t=23053). Unfortunately, the answer does not work for me (see below).

This is on Windows 7 Pro x64 with the following version of IM:

Code: Select all

X:\Scans (temp)>magick -version
Version: ImageMagick 7.0.3-3 Q16 x64 2016-10-08 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180040629
Features: Cipher DPC HDRI Modules OpenMP
Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib
Transcript of session showing the problem:

Code: Select all

X:\Scans (temp)\test>dir
 Volume in drive X has no label.
 Volume Serial Number is 307E-16E3

 Directory of X:\Scans (temp)\test

09.10.2016  14:04    <DIR>          .
09.10.2016  14:04    <DIR>          ..
09.10.2016  08:45        26.120.362 Scan0936.tif
09.10.2016  11:54        26.120.362 Scan0937.tif
               2 File(s)     52.240.724 bytes
               2 Dir(s)  107.199.389.696 bytes free

X:\Scans (temp)\test>magick convert -compress jpeg -quality 70 Scan0936.tif Scan0937.tif multipage.tif

X:\Scans (temp)\test>dir
 Volume in drive X has no label.
 Volume Serial Number is 307E-16E3

 Directory of X:\Scans (temp)\test

09.10.2016  14:08    <DIR>          .
09.10.2016  14:08    <DIR>          ..
09.10.2016  14:08         3.766.898 multipage.tif
09.10.2016  08:45        26.120.362 Scan0936.tif
09.10.2016  11:54        26.120.362 Scan0937.tif
               3 File(s)     56.007.622 bytes
               2 Dir(s)  107.192.328.192 bytes free

X:\Scans (temp)\test>magick convert multipage.tif[0] page_1.tif

X:\Scans (temp)\test>magick convert multipage.tif series-%d.tif

X:\Scans (temp)\test>dir
 Volume in drive X has no label.
 Volume Serial Number is 307E-16E3

 Directory of X:\Scans (temp)\test

09.10.2016  14:10    <DIR>          .
09.10.2016  14:10    <DIR>          ..
09.10.2016  14:08         3.766.898 multipage.tif
09.10.2016  14:10        26.092.396 page_1.tif
09.10.2016  08:45        26.120.362 Scan0936.tif
09.10.2016  11:54        26.120.362 Scan0937.tif
09.10.2016  14:10        26.092.396 series-0.tif
09.10.2016  14:10        26.092.396 series-1.tif
               6 File(s)    134.284.810 bytes
               2 Dir(s)  107.114.037.248 bytes free
I would be very grateful I somebody could help me out there ...

Thank you very much,

Peter

Re: How to extract a page from a multipage tiff without decompressing?

Posted: 2016-10-09T07:28:57-07:00
by snibgo
6bUDNpXmWoj7N5jT wrote:I would like to extract the page (which already is compressed) in unaltered form.
That won't happen. When IM reads an image file, it decodes the image into memory, uncompressed. Then you can save the image in compressed format, if you want.

Output tiff format will not be jpeg compressed, unless you ask for it, eg:

Code: Select all

magick convert multipage.tif[0] -compress jpeg page_1.tif
Reading a jpeg and saving a jpeg will generally change pixel data every time (because jpeg compression is lossy).

Re: How to extract a page from a multipage tiff without decompressing?

Posted: 2016-10-09T08:00:07-07:00
by 6bUDNpXmWoj7N5jT
Thank you very much for answering.
snibgo wrote:
6bUDNpXmWoj7N5jT wrote:I would like to extract the page (which already is compressed) in unaltered form.
That won't happen. When IM reads an image file, it decodes the image into memory, uncompressed. Then you can save the image in compressed format, if you want.
Thanks for expressing this in such a clear way. I think I have to use another software to extract a single page then.

The reason why I need the page in unaltered form is that I have to embed it in other documents, for example in a word processing application. The further applications I use are indeed able to embed image files without altering them. The rationale here is: Don't accept further loss of quality, but keep files as small as possible. The only way to achieve this is to extract the single page in compressed form (without decompressing and re-compressing) and to embed it into the new document (whatever that might be).
snibgo wrote:Output tiff format will not be jpeg compressed, unless you ask for it, eg:
OK, got that. The point is that I under no circumstances want to re-compress. I just don't want to decompress in the first place ...

I would be interested in your opinion about making a related feature request. Do you think there is a chance that it would make it into IM? Probably the "decompress always" policy is a fundamental part of IM's software architecture ...

Thank you very much,

Peter

Re: How to extract a page from a multipage tiff without decompressing?

Posted: 2016-10-09T08:40:35-07:00
by snibgo
6bUDNpXmWoj7N5jT wrote:I just don't want to decompress in the first place ...
One of the tiff* tools may do what you want. When I want to simply manipulate tiff files without changing pixels, these are usually faster and use less memory than IM. They are available for Windows as part of the Cygwin toolset. (They may be available without having to download all the other Cygwin stuff.)
6bUDNpXmWoj7N5jT wrote:Do you think there is a chance that it would make it into IM?
I think there is no chance of that. As you say, the fundamental IM architecture is to decompress pixels into memory.

Incidentally, when you find a good solution, I encourage you to put a brief note on this thread: "My solution was the following command ..."

Re: How to extract a page from a multipage tiff without decompressing?

Posted: 2016-10-09T09:33:36-07:00
by 6bUDNpXmWoj7N5jT
snibgo wrote: One of the tiff* tools may do what you want. When I want to simply manipulate tiff files without changing pixels, these are usually faster and use less memory than IM. They are available for Windows as part of the Cygwin toolset. (They may be available without having to download all the other Cygwin stuff.)
This definitely is a great idea, notably because I am doing all sorts of things using Cygwin anyway. Obviously, I've been too fixated on IM (probably because it's my Swiss army knife for mass image manipulation).
snibgo wrote:
6bUDNpXmWoj7N5jT wrote:Do you think there is a chance that it would make it into IM?
I think there is no chance of that. As you say, the fundamental IM architecture is to decompress pixels into memory.
OK, thanks. Then I won't bother anyone with it.
snibgo wrote:Incidentally, when you find a good solution, I encourage you to put a brief note on this thread: "My solution was the following command ..."
I promise I will do, probably in a few hours.

Thank you very much,

Peter

Re: How to extract a page from a multipage tiff without decompressing?

Posted: 2016-10-09T10:09:41-07:00
by 6bUDNpXmWoj7N5jT
OK, now I have installed the package "tiff" into Cygwin x64 (most recent current version, all updates applied). The package contains, among others, the utilities "tiffsplit" and "tiffcp". Both obviously cannot handle tiff files with jpeg compression:

Using IM, I have created two tiff files with jpeg compression. Then I have used Cygwin's tiffcp to put them into a multipage tiff file. The resulting file contains garbage - nor IrfanView nor the tools bundled with windows (Paint, Photo Editor etc.) nor any other application I have tested could open the file.

Then, again using IM, I have put two tiff files into a multipage tiff file with jpeg compression. That file as expected could be opened using any of the applications mentioned above. Then I have used Cygwin's tiffsplit to split that multipage tiff file into the single pages. The result was similar to the above: tiffsplit produced two files whose size (in sum) was approximately the size of the multipage file, but nevertheless the two files only contained garbage and could not be opened by any application I know.

So, unfortunately, Cygwin's tiff* tools cannot be used for such tasks.

Thank you very much,

Peter