Page 1 of 1
Compositing thousands of images
Posted: 2018-06-20T05:00:57-07:00
by nick18
IM Version: 6.9.7-4 Q16 x86_64
OS: Ubuntu
Script type: Bash Script
Via a Bash script i'm reading in coordinates (X:Y) line by line from a txt file (which has several thousand lines), and then using them to position one of five small (30x30) images at those coordinates on a canvas, and then saving this out as one single image.
I've been bashing my head against the documentation and Google and so far I do have it working, but only with one of the small images (rather than all 5) and it takes ages to complete.
This is my current code:
Code: Select all
while IFS=':' read -r x y
do
convert -size 5000x5000 /var/www/html/magic/images/small_image_one.png -repage +$x+$y miff:-;
done </var/www/html/magic/values.txt | convert -size 5000x5000 canvas:none miff:- -layers merge +repage /var/www/html/magic/images/output3.png
I know this isn't the right way to be doing it, but currently i'm creating a canvas at the size of the final output (5000x5000), repaging the small image to the coordinates, and then piping the MIFF data to the output where they are layered together.
I'm wanting to read in the small image filepath from this txt file as well (which i can do no problem) but as it's the same 5 images that get used over and over again Im thinking they can be instantiated once and then reused (as per Point 8 on
https://stackoverflow.com/questions/287 ... le_rich_qa)
Can anyone point me in the right direction of a better technique? Thank you for your time.
Re: Compositing thousands of images
Posted: 2018-06-20T06:12:19-07:00
by snibgo
Your first "-size 5000x5000" does nothing, because the first convert doesn't create a canvas.
Yes, re-reading and de-compressing those five PNG files thousands of times wastes time. You could first convert them to MIFF or MPC files, and read those instead of PNG files.
To avoid re-reading the five images, you can read them in once, save them as mpr:, and use those. This means a single convert command (instead of thousands) which will also save time.
To do that, your script should write another script, then execute "convert" just once with that script.
In v6, execute a script like this:
The extension ".scr" is my own convention. myscript.scr is a plain text file, something like:
Code: Select all
small_image_one.png -write mpr:p1 +delete
small_image_two.png -write mpr:p2 +delete
small_image_three.png -write mpr:p3 +delete
small_image_four.png -write mpr:p4 +delete
small_image_five.png -write mpr:p5 +delete
-size 5000x5000 canvas:none
( mpr:p1 -repage +12+23 )
( mpr:p3 -repage +12+23 )
{etc as required}
( mpr:p2 -repage +45+56 )
-layers merge +repage
(Note there are no line-continuation characters.)
All the lines like "( p1 -repage +12+23 )" would be derived from data in your text file values.txt.
Re: Compositing thousands of images
Posted: 2018-06-20T09:05:22-07:00
by GeeMack
nick18 wrote: ↑2018-06-20T05:00:57-07:00I'm wanting to read in the small image filepath from this txt file as well (which i can do no problem) but as it's the same 5 images that get used over and over again Im thinking they can be instantiated once and then reused (as per Point 8 on
https://stackoverflow.com/questions/287 ... le_rich_qa)
I use IM to build the output image for a maze script. It uses 16 different 32x32 images, and locates them as needed to make the walls of the maze. The locations are read from a text file, and there may be as many as 40,000 clones of the images for that many sets of coordinates for a large maze. I may have some ideas for you, but I'd like to know how your script knows which of the 5 images to place at the particular offsets.
Re: Compositing thousands of images
Posted: 2018-06-20T16:12:34-07:00
by nick18
Thank you both for your answers - much appreciated!
Geemack - the small images are picked at random by a completely seperate script that generates the coordinates txt file, and currently i have it returning the full path to the image as part of each line, e.g.:
/var/www/html/small_images/small_one.png:10:32
/var/www/html/small_images/small_three.png:40:32
/var/www/html/small_images/small_five.png:70:32
/var/www/html/small_images/small_one.png
32
Each line gets split on the ":" character, so in the Bash script the parameters become:
Image Path (e.g. /var/www/html/small_images/small_three.png)
X coordinate (e.g. 40)
Y coordinate (e.g. 32)
I have full control over the script that creates the txt file so this can be changed in any way (e.g. if we needed to have a name for the image instead of a path etc).
Thank you again!
Re: Compositing thousands of images
Posted: 2018-06-20T22:12:52-07:00
by GeeMack
nick18 wrote: ↑2018-06-20T16:12:34-07:00I have full control over the script that creates the txt file so this can be changed in any way (e.g. if we needed to have a name for the image instead of a path etc).
With the option to write the list of input images any way you want, you might consider doing something like
snibgo's suggestion above with a bash script like this...
Code: Select all
convert image1.png image2.png image3.png image4.png image5.png -repage 480x480 \
-write mpr:input -delete 0--1 @data.im -background none -flatten output.png
Then create your text file list of images and offsets, essentially an IM script named "data.im", with lines like this...
Code: Select all
( mpr:input[0] -repage +240+360 )
( mpr:input[1] -repage +120+240 )
( mpr:input[2] -repage +240+180 )
( mpr:input[3] -repage +360+90 )
( mpr:input[1] -repage +210+120 )
( mpr:input[4] -repage +150+240 )
( mpr:input[0] -repage +180+300 )
The five input images are read in at the beginning of the command script and held in a sort of IM memory array named "mpr:input". Then those memory images are called into the script from another text file, each with an index from 0 to 4 like "mpr:input[0]". Each line of that text file specifies one input by its index and sets the paging for it. Then it finishes with the command script flattening all those images onto the transparent canvas.
That only reads the original input images once, only runs "convert" once, and only writes one output image.
Re: Compositing thousands of images
Posted: 2018-06-21T06:05:39-07:00
by nick18
Thank you - that worked nicely and was very easy to implement! The only challenge i have is that some of the images i'll be creating involve an insane number of these smaller images (250,000 to be exact) and while this new code does work it takes a rather long time.
I see the process of parsing the data.im file (which has which image to use and the X and Y) only uses one CPU core, so my next question, is there some way within this IM pipe to multithread it, or would i need to look at creating a few of these files and then parsing them as seperate tasks called in a multithreaded fashion (e.g. by GNU Parallel or a higher order languag with pThreads).
Thank you again!
Re: Compositing thousands of images
Posted: 2018-06-22T15:11:44-07:00
by GeeMack
nick18 wrote: ↑2018-06-21T06:05:39-07:00Thank you - that worked nicely and was very easy to implement! The only challenge i have is that some of the images i'll be creating involve an insane number of these smaller images (250,000 to be exact) and while this new code does work it takes a rather long time.
I don't know much about managing memory or CPU, so I can't help there, but there may be a more efficient way to place all the images. It requires you do a couple of things in the script before the IM command.
(1) Pre-sort the list of images so the +0+0 image is first, then +1n+0, +2n+0, and so on, then "sed" everything off the list except just the names of the memory registers like "mpr:input[4]" in the order they'll be laid on the canvas, left to right, top to bottom. Then you don't need to set paging information or flatten all those images. No parentheses, no offsets, no settings, nothing but the list.
(2) Determine how many rows high your grid of tiles will be, and send that into the IM command as a shell variable.
It works like this: Read the ordered list of images, "+append" them into a single horizontal image, crop that into the total number of rows high, "-append" those rows vertically, and you're done. It could look like this inside the IM command...
Code: Select all
... @data.im +append +gravity -crop ${rows}x1@ -append +repage ...
I did a rough test with 3600 tiles, 48x48 pixels each, for a 60x60 grid. The slower way was to set the paging information on each tile and flatten them all into their proper locations. It was radically faster to have the system tools sort and prepare the list of input images in the proper order, then +append, crop, and -append all the images to build the output. It still took a few seconds, but the difference in speed wasn't even a contest.
Re: Compositing thousands of images
Posted: 2018-06-22T16:03:07-07:00
by snibgo
On performance: Your inputs are PNG, which is integer-only. You don't need floating-point pixels, so ensure your version of IM isn't HDRI. If your inputs are all 8-bit, I suggest you install and use a Q8 non-HDRI version of IM.
On parallel processing: the code that appends images, AppendImages() in image.c, works by calculating the required size of the output image, then builds it with background-colour pixels. Then comes the slow part: for each input image, for each row, for each column, copy the pixels from the input to the output.
IM can use OpenMP but only within each input image, by dividing the rows of an input image among threads. If your input images are small (eg 30x30) I doubt that this paralellism will be useful. It may not even be used.
If you can organise the work so each row of images is independent of other rows of images, I expect that is a more productive use of paralellism. Then, tell each instance of IM to use only one thread.
Is this a one-of job, or something you want repeatedly? If repeatedly, it might be worth writing a program in C, C++ or whatever, to do the entire job. The program can sort the tiles, process each row of tiles in parallel, then append them vertically.
Re: Compositing thousands of images
Posted: 2018-06-28T23:56:37-07:00
by nick18
Apologies for the slow update and thank you again to everyone for your suggestion. I've got this up and running in a multi-threaded fashion that processes a part of the final image, saves it, and then (as a seperate step once this phase completes) assembles all these images into one final big image.
It's almost working, but i have a problem where around half of these images are output as 1x1 pixel transparent images and the other half are rendered correctly. Testing with a smaller set (i.e. less images being composited on each 'chunk') onto a smaller canvas it completes all of them fine, so it seems there is some resource limit i'm hitting once i go up to the larger size.
I'm using this code, with one change to read in the images (what's after convert up to the -repage command) from an .IM script that just contains the path to them:
Code: Select all
convert image1.png image2.png image3.png image4.png image5.png -repage 500x15000 \
-write mpr:input -delete 0--1 @${image_placement_im_file_to_use} -background none -flatten ${output_filename}
The error returned on the ones that fail is:
convert: geometry does not contain image `input' @ warning/attribute.c/GetImageBoundingBox/247.
I've checked the resource limits in the policy.xml, and changed it from the automatic values to the following:
temporary-path: /tmp
memory: 120GiB
map: 120GiB
area: 120GB
disk: 120EB
thead: 4
This is running on a server with 96 cores and 320GB of RAM, and watching the processes via htop doesn't show it even getting close to the RAM limit. I've tried changing the parallel code that calls the IM script which renders each chunk, but even reducing that to 2 threads doesn't help. Enabling debug also doesn't show anything else in the log.
Could anyone suggest why this might be occuring?
Re: Compositing thousands of images
Posted: 2018-06-29T00:49:20-07:00
by snibgo
Please show some sample lines from your script file.
I suspect that your 1x1 problem is caused by a problem in that script, rather than a resource limit.
1. I suggest you insert "+repage" immediately after reading the five inputs. This will ensure none has a canvas offset.
2. The error message "geometry does not contain image" typically occurs when an image is cropped to a rectangle that may be inside the image's canvas but is outside the image itself.
Re: Compositing thousands of images
Posted: 2018-07-02T02:31:31-07:00
by nick18
Thank you - that was indeed the problem!
After some more testing i've found there is one aspect that is killing the performance on large images, and that is the -repage [X]x[Y] part.
In tracking this down i created a transparent image and saved it to disk using this call:
Code: Select all
time convert -size 10000x10000 xc:none /var/www/html/im_testing/10k_canvas.png
This completes in 2.546 seconds which is great. However, if i run this same command with the canvas size i'm using in the -repage [X]x[Y] part of my script for my real-world image, via this code:
Code: Select all
time convert -size 30000x17910 xc:none /var/www/html/im_testing/30k_canvas.png
It take around 7 minutes 23 seconds to complete.
I also tried setting -repage 0x0 and the canvas did grow according to the coordinates of where the images were placed, but with anything more than about 100 images the performance tanked.
As another test, i created a blank 30000x17910 image using VIPS which completed in 7.477 seconds, so is there a way i can create this image first and then have my IM script position the images onto it, rather than onto a virtual canvas? Or is there another approach that would remove this bottleneck?
Thank you all again - greatly appreciated.
Re: Compositing thousands of images
Posted: 2018-07-02T13:15:55-07:00
by snibgo
"-repage" merely changes metadata. It doesn't seem to cost any time, and I wouldn't expect it to. (But then doing a "-layers" operation might take longer.)
IM stores pixel in a pixel cache, usually in memory. But if there is not enough space, it is stored on disk with a big performance penalty.
VIPS has a different architecture that reduces the performance problem on large images.