using opencl is slower in resize
Posted: 2014-10-19T01:14:43-07:00
Hi,
in latest version ImageMagick-6.8.9-8, i compile with opencl and test resize(convert src.jpg -filter box -resize 248x248 -bench 30 logo.jpg)
but the performance of using opencl is slower than only use cpu openmp.
I use cuda6.5
>convert src.jpg -filter box -resize 248x248 -bench 30 logo.jpg
openmp
Performance[1]: 30i 78.947ips 1.000e 0.370u 0:00.380
Performance[2]: 30i 81.081ips 0.507e 0.380u 0:00.370
Performance[3]: 30i 73.171ips 0.481e 0.390u 0:00.410
Performance[4]: 30i 78.947ips 0.500e 0.380u 0:00.380
Performance[5]: 30i 75.000ips 0.487e 0.390u 0:00.400
Performance[6]: 30i 73.171ips 0.481e 0.400u 0:00.410
Performance[7]: 30i 81.081ips 0.507e 0.380u 0:00.370
Performance[8]: 30i 75.000ips 0.487e 0.370u 0:00.400
openmp + opencl
Performance[1]: 30i 7.143ips 1.000e 13.570u 0:04.200
Performance[2]: 30i 17.143ips 0.706e 1.750u 0:01.750
Performance[3]: 30i 17.045ips 0.705e 1.770u 0:01.760
Performance[4]: 30i 17.143ips 0.706e 1.750u 0:01.750
Performance[5]: 30i 17.045ips 0.705e 1.750u 0:01.760
Performance[6]: 30i 17.045ips 0.705e 1.760u 0:01.760
Performance[7]: 30i 17.143ips 0.706e 1.750u 0:01.750
Performance[8]: 30i 17.143ips 0.706e 1.750u 0:01.750
if anybody had meet the same thing.
in latest version ImageMagick-6.8.9-8, i compile with opencl and test resize(convert src.jpg -filter box -resize 248x248 -bench 30 logo.jpg)
but the performance of using opencl is slower than only use cpu openmp.
I use cuda6.5
>convert src.jpg -filter box -resize 248x248 -bench 30 logo.jpg
openmp
Performance[1]: 30i 78.947ips 1.000e 0.370u 0:00.380
Performance[2]: 30i 81.081ips 0.507e 0.380u 0:00.370
Performance[3]: 30i 73.171ips 0.481e 0.390u 0:00.410
Performance[4]: 30i 78.947ips 0.500e 0.380u 0:00.380
Performance[5]: 30i 75.000ips 0.487e 0.390u 0:00.400
Performance[6]: 30i 73.171ips 0.481e 0.400u 0:00.410
Performance[7]: 30i 81.081ips 0.507e 0.380u 0:00.370
Performance[8]: 30i 75.000ips 0.487e 0.370u 0:00.400
openmp + opencl
Performance[1]: 30i 7.143ips 1.000e 13.570u 0:04.200
Performance[2]: 30i 17.143ips 0.706e 1.750u 0:01.750
Performance[3]: 30i 17.045ips 0.705e 1.770u 0:01.760
Performance[4]: 30i 17.143ips 0.706e 1.750u 0:01.750
Performance[5]: 30i 17.045ips 0.705e 1.750u 0:01.760
Performance[6]: 30i 17.045ips 0.705e 1.760u 0:01.760
Performance[7]: 30i 17.143ips 0.706e 1.750u 0:01.750
Performance[8]: 30i 17.143ips 0.706e 1.750u 0:01.750
if anybody had meet the same thing.