Page 1 of 1

OpenCL compilation requirements

Posted: 2010-05-25T09:23:46-07:00
by kyron
The question: How does one truly enable/compile ImageMagick with OpenCL (even though it is said to be compiled in and the proper 'environment' (nvidia hardware + driver version) seems to exist):

The context is the following:
ImageMagick:

Code: Select all

twitt2 imagemagick # convert -version
Version: ImageMagick 6.6.2-0 2010-05-25 Q8 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2010 ImageMagick Studio LLC
Features: HDRI OpenCL
GCC:

Code: Select all

twitt2 imagemagick # gcc -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.4.2/work/gcc-4.4.2/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.4.2 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.2/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.4.2 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.4.2/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.4.2/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.2/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.4.2/python --enable-objc-gc --enable-languages=c,c++,java,objc,obj-c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.4.2 p1.0'
Thread model: posix
gcc version 4.4.2 (Gentoo 4.4.2 p1.0)
Nvidia (card):

Code: Select all

02:00.0 VGA compatible controller: nVidia Corporation GT200b [GeForce GTX 275] (rev a1)
Nvidia (driver):

Code: Select all

NVRM: loading NVIDIA UNIX x86_64 Kernel Module  195.36.24  Thu Apr 22 19:10:14 PDT 2010
So, OpenCL is said to be compiled in but fails with:

Code: Select all

eric@twitt2 ~ $ convert barbara_gold1.bmp -convolve '-1, -1, -1, -1, 9, -1, -1, -1, -1' convolve.png
convert: failed to create OpenCL context `barbara_gold1.bmp' (-32) @ warning/accelerate.c/GetConvolveInfo/499.
The error throws me back to the lines of code pertaining to the initialization of the OpenCL environment even _before_ the device handle gets queried/created (in other words, before the nVidia driver would theoretically get involved).

I would like to know which compiler (and/or optional libs) was used to test the accelerate.c and, probably, there should be a warning if IM is compiled with a compiler that doesn't actually implement the OpenCL API (I must admit I didn't expect the code to compile under plain GCC but it did!)... As I don't think GCC has native support for OpenCL, what should be added?

Thanks!

Re: OpenCL compilation requirements

Posted: 2010-05-25T10:23:59-07:00
by magick
Notice the -32 status. That is the status returned by clCreateContextFromType() by the OpenCL API. We don't have the docs handy to translate that error to a message. Have you tried the OpenCL demo projects? If give more meaningful diagnostics. If they run, ImageMagick with OpenCL support should also run without complaint.

Re: OpenCL compilation requirements

Posted: 2010-05-25T10:51:33-07:00
by kyron
Well, from cl.h, -32 means:

Code: Select all

#define CL_INVALID_PLATFORM                         -32
I guess my question should be: "What environment did YOU devs get it to compile and work in"? Which compiler, driver version, hardware (when applicable).

Thanks ;)

Re: OpenCL compilation requirements

Posted: 2010-05-25T11:26:36-07:00
by magick
The OpenCL kernel is compiled and executed at runtime. The ImageMagick calls to the OpenCL API are compiled and linked when ImageMagick is compiled.

Re: OpenCL compilation requirements

Posted: 2010-05-26T05:36:09-07:00
by kyron
Ok, it would seem that Nvidia's CUDA supported cards != OpenCL supported cards. The list is not online but only in their SDK's PDF files...once you've found the proper SDK that is (also non-trivial to find). Once I got the 3.0 SDK and attempted to install it, I got "nvcc fatal : Unsupported gpu architecture 'compute_20'" (which is concurrent with my error message above).

All in all, this is obviously an NVIDIA problem not being consistent with themselves (there is no such compute platform, 2.0, as of writing, they range from 1.0 to 1.3 (p105 [95] of the CUDA C Programming Guide).

Note that even with a said-to-be-supported card (GeFirce 8800 GTS, compute 1.0) I get the same error...

So, yes, do get the vendor's OpenCL reference and examples going, then try IM's implementation ;)

Re: OpenCL compilation requirements

Posted: 2010-06-21T12:43:01-07:00
by zaratol
Out of curiosity: has anyone some experince how much faster IM will get?

How much does it depend on the used commands?

For example will a simple convert tif->jpeg even profit?

Re: OpenCL compilation requirements

Posted: 2010-09-29T06:54:42-07:00
by kyron
zaratol wrote:Out of curiosity: has anyone some experince how much faster IM will get?
It would get _really_ faster if they implemented other functions than just the convolution.
zaratol wrote:How much does it depend on the used commands?
Currently, the doc is not clear and it only seems the convolution is OpenCL enabled, but I would love to see this extended to other functions.
zaratol wrote:For example will a simple convert tif->jpeg even profit?
It definitely should. Especially since there is an example of the DCT in the CUDA SDK in both OpenCL and CUDA language that should inspire a programmer into making an equivalent for ImageMagick. I am surprised there _seems_ to be little interest in OpenCL/GPU processing from the ImageMagick devs. I can't even seem to compile the current (6.6.4-5) version with OpenCL, even if I specify --enable-opencl, `identify --version` doesn't return that OpenCL was compiled.
Configure line:

Code: Select all

./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-dependency-tracking --disable-static --enable-hdri --enable-opencl --with-threads --without-included-ltdl --with-ltdl-include=/usr/include --with-ltdl-lib=/usr/lib64 --with-modules --with-quantum-depth=8 --with-magick-plus-plus --with-perl --with-perl-options=INSTALLDIRS=vendor --with-gs-font-dir=/usr/share/fonts/default/ghostscript --with-bzlib --with-x --with-zlib --without-autotrace --without-dps --without-djvu --with-dejavu-font-dir=/usr/share/fonts/dejavu --without-fftw --without-fpx --without-fontconfig --without-freetype --without-gslib --without-gvc --without-jbig --with-jpeg --with-jp2 --without-lcms --without-lcms2 --without-lqr --without-openexr --with-png --without-rsvg --with-tiff --without-corefonts --without-wmf --with-xml --disable-openmp
Output of `identify --version`:

Code: Select all

TWITT ImageMagick-6.6.4-5 # ./utilities/identify --version
Version: ImageMagick 6.6.4-5 2010-09-29 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2010 ImageMagick Studio LLC
Features: HDRI 

Re: OpenCL compilation requirements

Posted: 2010-09-29T12:08:06-07:00
by magick
ImageMagick is open-source. The developers illustrated how to use OpenCL within ImageMagick and leave it to you to take an interest and add additional capabilities. There is course much work to be done, including improving the autoconf m4 macro for detecting OpenCL support. We would like to do more work with OpenCL ourselves but we are currently consumed with other issues.

Re: OpenCL compilation requirements

Posted: 2010-10-01T14:37:52-07:00
by kyron
magick wrote:ImageMagick is open-source. The developers illustrated how to use OpenCL within ImageMagick and leave it to you to take an interest and add additional capabilities. There is course much work to be done, including improving the autoconf m4 macro for detecting OpenCL support. We would like to do more work with OpenCL ourselves but we are currently consumed with other issues.
I hope my post didn't come in as dismissive and seemed like I was neglecting the great efforts behind imagemagick. I am more than conscious that modifying the code base to adapt to OpenCL is far from trivial. I myself have been playing around with adapting Matlab->Mex->C->CUDA code and it's not a "walk in the park" as one might say.

This said, are you saying that there is a known issue with the m4 macros to enable OpenCL correctly (and that would be why it's not being compiled as it should?)

Thanks!

Re: OpenCL compilation requirements

Posted: 2010-10-01T15:44:45-07:00
by magick
This said, are you saying that there is a known issue with the m4 macros to enable OpenCL correctly
We only tested under Fedora Linux and Mac OS X.

Re: OpenCL compilation requirements

Posted: 2010-10-01T19:16:50-07:00
by kyron
magick wrote:We only tested under Fedora Linux and Mac OS X.
Would it be possible to get the environment information such as compiler, nvidia tollkit/driver and support tool versions. I'd like to replicate the environment under Gentoo and get the build system to generate an OpenCL-able version (that would be a start ;))

Thanks!

Re: OpenCL compilation requirements

Posted: 2010-10-02T09:12:24-07:00
by magick
That information is currently not available. We wiped our Fedora environment recently and have not yet reinstalled the NVidia OpenCL toolkit. We did just build ImageMagick under Mac OS X and it found and linked the OpenCL libraries as expected.

Re: OpenCL compilation requirements

Posted: 2010-10-04T07:43:03-07:00
by dproc
zaratol wrote:Out of curiosity: has anyone some experince how much faster IM will get?
Below are some of the things I've found in working with OpenCL and CUDA, especially with 2D FFTs. (My leaning toward CUDA info is only because I've a little more experience with it--not to imply its better. Although I did have trouble finding known good/versatile FFT code for OpenCL--so I don't have reliable timing info for that):

1. nVidia sits their OpenCL implementation on top of CUDA.
2. OpenCL on CUDA is slightly slower than raw CUDA. I'm not sure if it has anything to do with sitting OpenCL on CUDA, or that they can optimize raw CUDA code better.
3. OpenCL still isn't all that standardized even for different releases from the same company (AMD/nVidia)
4. Data transfers between host and GPU (for both CUDA and OpenCL, whether on nVidia or AMD) are quite slow--rivalling the total FFT time! So you won't see THAT much improvement with IM's OpenCL stuff, I wager. If you want real speed, put ALL your code in the GPU or at least minimize the massive data transfers.
5. The CUDA FFT lib (CUFFT) automatically adjusts number of threads to optimize for the current hardware.
6. CUFFT is based on FFTW, which is a good lib.
7. CUFFT seems to be well-tuned for their (nVidia's) hardware.
8. CUFFT does seem to have a latency proportional to N Log(N), as it should be.
9. You can't get to CUFFT from OpenCL on CUDA.
10. OpenCL can be run-time compiled, CUDA cannot.

I'm very curious to know what OpenCL FFT code IM uses (too lazy to look).

Using CUFFT (under CUDA, of course) on a Quadro FX5800 I found a 1024x1024 FFT (each pixel = 4-byte floating point value) ran about 40 times faster (not counting data transfer between host & GPU) than quality multi-threaded host code on a 4-core 3.2GHz CPU. When the data transfer is included in the time, is was not much faster than the CPU (I forgot how much at the moment)

Re: OpenCL compilation requirements

Posted: 2010-10-04T11:02:36-07:00
by fmw42
IM FFT is based upon FFTW delegate library, but I don't know anything about it using OpenCL. But am not a coder, so cannot inform you further about that.

See http://www.fmwconcepts.com/imagemagick/ ... urier.html regarding using FFT with IM

Re: OpenCL compilation requirements

Posted: 2010-10-04T17:02:14-07:00
by dproc
I searched the IM source for OpenCL and found OpenCL stuff in accelerate.c

I see a kernel 'Convolve' function defined there as an OpenCL kernel. It ~looks~ like a time-domain convolve, and I ~think~ it is single threaded, but I've only glanced at the code. Even if I'm misinterpeting it and its multithreaded, it will still crawl because its time domain instead of frequency domain. But like I said I might be completely misinterpreting how its used. With the added time of moving the buffer between host & GPU and back again, this might be really slow. Has anyone benchmarked IM's OpenCL convolution over the CPU convolution (known FFT/IFFT using FFTW)? Just curious.