I am testing the performance of the convert utility in ImageMagick 6.6.0-4 on Debian Squeeze on a dual quad-core (8-cores, 16 core threads) Intel 5620 server.
I tested converting 400MB of JPEG images to a lower resolution. On this server, I found the following timings:
MAGICK_THREAD_LIMIT=1 150 sec
normal 55 sec
waitloop 28 sec
waitloop && MAGICK_THREAD_LIMIT=1 13 sec
Waitloop is a bash script I use to force 16 copies of convert to run in the background. I know convert uses multiple threads automatically to convert an image, so the slow timing of the first item (using only one thread) is expected. It is also expected that using my waitloop tool would improve performance because the normal test only has the CPUs at 40%.
What is surprising is that by disabling threading in convert, and forcing 16 convert processes to run simultaneously, I get a 2x speedup over the waitloop case. Perhaps your documentation should be clearer about the benefits of setting
THREAD_LIMIT=1 when you are already running multiple convert processes, i.e. your documentation isn't clear that setting it to "1" might yield improved performance:
http://www.imagemagick.org/script/architecture.php
If you want more details or a self-contained test case, please let me know.
Threading slows down 'convert'
Re: Threading slows down 'convert'
It can be difficult to predict behavior in a parallel environment. Performance might depend on a number of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of memory, whether hyperthreading is enabled, the mix of applications that are executing concurrently with ImageMagick, or the particular image-processing algorithm you utilize. The only way to be certain of the optimal performance, in terms of the number of threads, is to benchmark. ImageMagick 6.7.4-1 Beta includes progressive threading when benchmarking a command and returns the elapsed time and efficiency for one or more threads. This can help you identify how many threads is the most efficient in your environment. Here is an example benchmark for threads 1-8:
- convert -bench 40 model.png -sharpen 0x1 null:
Performance[1]: 10i 0.712ips 1.000e 14.000u 0:14.040
Performance[2]: 10i 1.362ips 0.657e 14.550u 0:07.340
Performance[3]: 10i 2.033ips 0.741e 14.530u 0:04.920
Performance[4]: 10i 2.667ips 0.789e 14.590u 0:03.750
Performance[5]: 10i 3.236ips 0.820e 14.970u 0:03.090
Performance[6]: 10i 3.802ips 0.842e 15.280u 0:02.630
Performance[7]: 10i 4.274ips 0.857e 15.540u 0:02.340
Performance[8]: 10i 4.831ips 0.872e 15.680u 0:02.070
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Threading slows down 'convert'
Rule of thumb
If the number of independent compute intensive processes is comparable to the number of cores you have (or larger), they will run faster in "single core" mode (no single task spread across multiple processors).
(This is why embarrassingly parallel methods generally should run without any communication between parts: you're better off chopping the big tasks into very roughly equal parts than constantly rebalancing.)
Rule of thumb
If processes have high I/O or memory requirements, and it is not possible to make them run in parallel without significant I/O collisions, they should be run sequentially (one after another, instead of at once).
Example conclusion
If your server typically must handle more image processing requests than there are cores, each ImageMagick task should probably run on a single core (disable OpenMP) except possibly if the input and/or output images are so large that you should run the jobs sequentially (more or less one after another), each with OpenMP enabled, in order to minimize the I/O bottleneck.
-----
Overcoming these rules of thumb generally requires very tricky tuning or programming.
If the number of independent compute intensive processes is comparable to the number of cores you have (or larger), they will run faster in "single core" mode (no single task spread across multiple processors).
(This is why embarrassingly parallel methods generally should run without any communication between parts: you're better off chopping the big tasks into very roughly equal parts than constantly rebalancing.)
Rule of thumb
If processes have high I/O or memory requirements, and it is not possible to make them run in parallel without significant I/O collisions, they should be run sequentially (one after another, instead of at once).
Example conclusion
If your server typically must handle more image processing requests than there are cores, each ImageMagick task should probably run on a single core (disable OpenMP) except possibly if the input and/or output images are so large that you should run the jobs sequentially (more or less one after another), each with OpenMP enabled, in order to minimize the I/O bottleneck.
-----
Overcoming these rules of thumb generally requires very tricky tuning or programming.
Last edited by NicolasRobidoux on 2012-03-11T18:58:35-07:00, edited 7 times in total.
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: Threading slows down 'convert'
Just some note on IM parallelization...
Within IM it is only individual image processing operations that are parallelized. So the saving is more with large image processing, and not with processing large numbers of images.
See Making IM Faster (in general)
http://www.imagemagick.org/Usage/api/#speed
Within IM it is only individual image processing operations that are parallelized. So the saving is more with large image processing, and not with processing large numbers of images.
See Making IM Faster (in general)
http://www.imagemagick.org/Usage/api/#speed
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/