1

I’ve been refactoring some code that involves trying to use OpenMP to offload parts of a larger function to an NVIDIA A100. Problem is, the section that I’m trying to offload is part of a larger function that is being threaded via std::thread’s in C++.

Specifically, each std::thread starts a function and within this function parts of it is being offloaded to the GPU via OpenMP. The OpenMP clause is typical e.g. “#pragma omp target teams distribute parallel for”…

This seems to be causing the following runtime error: > libgomp: cuLaunchKernel error: invalid resource handle

If I get rid of any concurrency (remove any std::thread-ing) and keep the OpenMP offloading it seems to run fine.

Any ideas of what might be causing this? I guess I’m unsure about the thread-safety of OpenMP GPU offloading.

2
  • Does this blog post help?
    – paleonix
    Commented Dec 22, 2022 at 9:53
  • Yes, thank you that solved it! I was also trying OpenMP's omp_set_default_device() function but it doesn't seem to function the same as the cudaSetDevice() function you linked.
    – py-aero
    Commented Dec 23, 2022 at 1:28

1 Answer 1

1

The blog post CUDA Pro Tip: Always Set the Current Device to Avoid Multithreading Bugs recommends to always explicitly set the device id on a freshly spawned CPU thread with cudaSetDevice().

Setting the device before a fork will only work for the master thread which then spawns the new worker threads. So the worker threads will by default try to use the default device. i.e. device id 0.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.