I’ve been refactoring some code that involves trying to use OpenMP to offload parts of a larger function to an NVIDIA A100. Problem is, the section that I’m trying to offload is part of a larger function that is being threaded via std::thread’s in C++.
Specifically, each std::thread starts a function and within this function parts of it is being offloaded to the GPU via OpenMP. The OpenMP clause is typical e.g. “#pragma omp target teams distribute parallel for”…
This seems to be causing the following runtime error: > libgomp: cuLaunchKernel error: invalid resource handle
If I get rid of any concurrency (remove any std::thread-ing) and keep the OpenMP offloading it seems to run fine.
Any ideas of what might be causing this? I guess I’m unsure about the thread-safety of OpenMP GPU offloading.