I need to process some very large MultipageTiff documents (some in excess of 10k frames/images/pages). MagickImageCollection appears to populate the pixel caches for every single frame of the image when the collection is opened (using ~25MB per image) which quickly consumes vast quantities of disk space, and in some cases fills the disk completely causing crashes (this generally happens if there are multiple high page count documents that get processed around the same time).
Is there method in Magick.Net to get the total frame count from the file prior having MagickImageCollection process/open it (I haven't been able to find one)? I would like to handle the pages in batches to minimize the disk impact - and have the following code which works quite well as long as I know the frame count up front, With a batch size of 25 - it doesn't generate any cache files at all (which is ideal) If I go to 50, it does generate 15 to 20 caches depending on source material used.
var fileName = "testFile.tiff";
var batchSize = 25;
var frameCount = this.GetFrameCount(fileName); // How do I do this part?
for (var i = 0; i < frameCount; i += batchSize)
{
var settings = new MagickReadSettings();
settings.FrameCount = i + batchSize >= frameCount ? frameCount - i : batchSize;
settings.FrameIndex = i;
using (var imageCollection = new MagickImageCollection(fileName, settings))
{
foreach (var image in imageCollection)
{
// Do Stuff Here.
}
}
}
Also open to any other suggestions on how to address this.
Additional note - The reason I need the full frame count, is that if you attempt to request for example - frame 101 from a 100 frame image - MagickImageCollection disregards the the read settings and begins loading all of the frames from the file into the cache (seems like a bug?).
This could be resolved if a zero length collection was returned if the requested frames do not exist (or a partial collection if only some of the frames exist). Throwing an error could also work, but doesn't seem useful for addressing this specific issue if you don't already know the full frame count.
I'm now pulling frame count using the MS TiffBitmapDecoder - which has allowed me to move forward. I would still prefer to use Magick.Net to get this information without having to populate the pixel cache if at all possible.
Can you share an image that demonstrates this issue? Is this an image with Photoshop layers or tiff pages? And have you tried using the Ping method of the MagickImageCollection to get the number of images?
It would also be nice if you could demonstrate the issue where you request a frame that is to high and you don't get an empty collection. Or the frames that fall without the range that you are requesting when you are requesting frames outside the boundary. I cannot reproduce that issue with the latest version of Magick.NET.
Cache files are generated when the index is out of bounds - however, there are no images loaded into the collection when this occurs. I would think that an error should be thrown in that scenario, or just an empty collection returned without any caches generated? Also, somewhat intermittently some cache files are left behind in this scenario (could possibly be due to contention from the file watcher getting the cache file counts - I haven't investigated that further).