Caught exception clCreateContext failed while running multiple executions on GPU
Posted: 2018-01-16T16:33:42-07:00
Hello,
I am trying to come up with a script to run multiple execution of resize on number of images against GPU and would run the same against CPU as well.
I have compiled my IM with the following command
Version used : 6.9.9-26 Q16
I am controlling CPU vs GPU through option : MAGICK_OCL_DEVICE
Code I am using resizeGPU400.cpp
Script i am using
How i am running above script :
As soon as i increase the argument to the script to 43 it fails with following error and most of the processes don't start
I have 4 such GPUs in a single machine
What am i doing wrong? I might be hitting some race conditions etc. My aim is to maximize throughput on all the GPU cards i have. i can reach 100% in GPU-util even before by running 20-30 processes but some how it doesn't use multiple GPU cores.
Do you have a sample code which i can use to orchestrate program to load balance between all the available devices?
Thanks
I am trying to come up with a script to run multiple execution of resize on number of images against GPU and would run the same against CPU as well.
I have compiled my IM with the following command
Code: Select all
./configure --enable-shared --enable-static --with-webp --with-png --with-jpeg --with-tiff --with-pangocairo --enable-opencl LIBS=-ldl -lpng LDFLAGS=-L/usr/local/lib -L/usr/lib/x86_64-linux-gnu/ CPPFLAGS=-I/usr/local/include
Code: Select all
# identify -version
Version: ImageMagick 6.9.9-26 Q16 x86_64 2017-12-14 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC OpenCL OpenMP
Delegates (built-in): jbig jng jpeg ltdl lzma png tiff webp zlib
Code I am using resizeGPU400.cpp
Code: Select all
#include <dirent.h>
#include <cstring>
#include <iostream>
#include <vector>
#include <tr1/memory>
#include <ImageMagick-6/Magick++.h>
#include <iostream>
#include <ctime>
#include <stdlib.h>
using namespace std;
using namespace Magick;
namespace {
std::vector<std::string> GetDirectoryFiles(const std::string& dir) {
std::vector<std::string> files;
std::tr1::shared_ptr<DIR> directory_ptr(opendir(dir.c_str()), [](DIR* dir){ dir && closedir(dir); });
struct dirent *dirent_ptr;
if (!directory_ptr) {
std::cout << "Error opening : " << std::strerror(errno) << dir << std::endl;
return files;
}
while ((dirent_ptr = readdir(directory_ptr.get())) != nullptr) {
files.push_back(std::string(dirent_ptr->d_name));
}
return files;
}
} // namespace
int main() {
setenv("MAGICK_OCL_DEVICE","ON",1);
std::cout << "MAGICK_OCL_DEVICE = " << getenv("MAGICK_OCL_DEVICE") << std::endl;
Image image;
std::string dir_path="/home/gegupta/GPUvsCPU/";
const auto& directory_path = dir_path;//std::string("/home/gegupta/images");
const auto& files = GetDirectoryFiles(directory_path);
for (const auto& file : files) {
try{
// std::cout << file << std::endl;
std::string full_path = dir_path + file;
if(full_path.find(".jpg")!=std::string::npos || full_path.find(".png")!=std::string::npos || full_path.find(".jpeg")!=std::string::npos){
// std::cout << full_path << std::endl;
image.read( full_path );
int start_s=clock();
image.resize( Geometry(400,400) );
int stop_s=clock();
cout << file<<"," << (stop_s-start_s)/double(CLOCKS_PER_SEC)*1000 << endl;
// std::string resized_path = "/home/gegupta/resized/CPU/400/" + file;
// image.write( resized_path);
}
}catch( Exception &error_ )
{
cout << "Caught exception: " << error_.what() << endl;
return 1;
}
}
return 0;
Code: Select all
#!/bin/bash
echo $1
START=1
END=$1
for (( c=$START; c<=$END; c++ ))
do
echo -n "$c "
./resizeGPU400 &
done
wait
echo All done
Code: Select all
time ./runMultiple.sh 42
Even if i divided the execution by running 20 in one and running 20 in other, still same problem. Just to add if you monitor htop when you start the script if i use 43 in the argument the amount of memory usage grows much higher to what it is for 42 processes.Caught exception: Magick: clCreateContext failed. (-5) @ warning/opencl.c/InitOpenCLEnvInternal/1439
I have 4 such GPUs in a single machine
Code: Select all
lshw -C display
*-display
description: 3D controller
product: NVIDIA Corporation
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:08:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:3e80-3e7f iomemory:3ec0-3ebf irq:152 memory:98000000-98ffffff memory:3e800000000-3ebffffffff memory:3ec00000000-3ec01ffffff
Do you have a sample code which i can use to orchestrate program to load balance between all the available devices?
Thanks