stream -extract without reading entire file from disk?

Questions and postings pertaining to the development of ImageMagick, feature enhancements, and ImageMagick internals. ImageMagick source code and algorithms are discussed here. Usage questions which are too arcane for the normal user list should also be posted here.
Post Reply
killmoms
Posts: 29
Joined: 2014-07-07T07:41:47-07:00
Authentication code: 6789

stream -extract without reading entire file from disk?

Post by killmoms »

I myself am not a developer (far from it!), but I was hoping a developer in this forum might be able to answer my question.

tl;dr version: Is there any way to stream -extract a row of pixels from an input image without having to read the entire input image off the disk? I'm currently trying to do this with IM's stream -extract and PPM input images—should I be looking into IM's memory-mapped MPC format instead (trying that with stream thus far has yielded no errors but zero KB output files)? If this can't be done with ImageMagick, is there something that can do it (e.g. NetPBM's "pamcut"—the documentation on the site seems unclear, to me, on this point)?

The long version: This thread: viewtopic.php?f=1&t=25870

The medium version: I'm basically trying to take an arbitrarily large set of input images (but all of which have the same pixel dimensions and color depth), and iteratively assemble a set of output images such that output image [n] is comprised of each row[n] from every input image. So output image 1 will contain the first row of pixels from every input image, one after the other; output image 2 will contain the second row of pixels from every input image, one after the other; etc.. At the end of this process I'll have as many output images as the input images have rows of pixels (and, it follows, the number of rows of pixels in each output image will be equal to the number of input files). I'm doing this with a bash script on Mac OS X because I am not a programmer (and barely a scripter). The full script has a lot of other hoo-hah that isn't relevant to this question, so I've made a (more hard-coded, less finesse-y) testing script, which currently looks like this:

Code: Select all

#!/bin/bash
# ImageMagick Tests for mbarcode.sh

mkdir -p tempcrops

echo -e "Dumping PPMs..."
ffmpeg -an -sn -i ${1} -r 0.25 -vf "transpose=1" -pix_fmt rgb24 -f image2 -vcodec ppm tempcrops/f_%05d.ppm &> /dev/null

mkdir -p anim
FILES=($(find "tempcrops" -type f -name *.ppm | sort))

echo -e "Assembling barcodes..."
for (( c=0; c<1920; c++ )); do
	echo "$(printf %04d $((${c}+1))) of 1920..."
	for f in ${FILES[@]}; do
		stream -map rgb -storage-type char -extract 1080x1+0+${c} ${f} -
	done |\
	convert -depth 8 -size 1080x${#FILES[@]} rgb:- -rotate -90 anim/barcode_$(printf %05d ${c}).png
done

rm -rf tempcrops
Currently, every method I've attempted has seemingly required reading the complete set of input images for every single output image assembled. With an SSD this is faster than it would be w/ a spinning disk but not nearly as fast as I'd like. Confusingly (as noted in this post at the end of the above-referenced thread), when I tried this using "stream -extract" last night with 1280x (roughly) SD-sized images (~1MB each), Activity Monitor in OS X seemed to indicate that I wasn't totally thrashing the SSD (each assembly took roughly 11 seconds). However, when I used the exact same code this morning on 1920x HD-sized images (~6.2MB each), suddenly I was pulling 200MB/s for every single assembly process (each taking roughly 53 seconds). Maybe this is due to vagaries about how Activity Monitor works/samples disk activity, and I really was pulling full images w/ the SD-sized stuff and I just didn't notice because they're smaller in size by a factor of six.

If that's the case, though, it means BOTH could be faster, if I could find a method to avoid having to read the entire input file off the disk when I know exactly which part (row) I want.

It was suggested in that thread that I might try outputting each row to its own file first, and append using a clever loop. However, that requires generating something on the order of 3.6 million 2 - 3KB temp files for an HD input (1920 images with 1920 rows each), and since I'm on a Mac (with fseventsd running and relying on the increasingly creaky HFS+ file system), that's something I'd like to avoid if at all possible.
killmoms
Posts: 29
Joined: 2014-07-07T07:41:47-07:00
Authentication code: 6789

Re: stream -extract without reading entire file from disk?

Post by killmoms »

Email says someone responded to this thread but I see no response. Perhaps forum emails are confused. In any case, I think I'm butting up against the old "P vs NP problem", as my programmer friend told me, in this case.

So, no worries! This is what happens when someone who hasn't studied computer science goes "but WHY?" without having the foundational understanding to know why the question he's asking doesn't make sense. :lol:
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: stream -extract without reading entire file from disk?

Post by snibgo »

Someone did "reply" to the topic, but it was spam with no relevance to the topic or IM or images or anything else, so I deleted it (and banned the user).

Sorry, I know nothing about stream, or other pre-built method to read some pixels without reading the entire file. A simple program could be written to read a binary PNM image file, seek to the correct position in the file, and read the pixels. Perhaps this could be done in bash.
snibgo's IM pages: im.snibgo.com
Post Reply