std::thread's and OpenMP GPU Offloading

Question

I’ve been refactoring some code that involves trying to use OpenMP to offload parts of a larger function to an NVIDIA A100. Problem is, the section that I’m trying to offload is part of a larger function that is being threaded via std::thread’s in C++.

Specifically, each std::thread starts a function and within this function parts of it is being offloaded to the GPU via OpenMP. The OpenMP clause is typical e.g. “#pragma omp target teams distribute parallel for”…

This seems to be causing the following runtime error: > libgomp: cuLaunchKernel error: invalid resource handle

If I get rid of any concurrency (remove any std::thread-ing) and keep the OpenMP offloading it seems to run fine.

Any ideas of what might be causing this? I guess I’m unsure about the thread-safety of OpenMP GPU offloading.

Yes, thank you that solved it! I was also trying OpenMP's omp_set_default_device() function but it doesn't seem to function the same as the cudaSetDevice() function you linked. — py-aero, Commented Dec 23, 2022 at 1:28

paleonix · Accepted Answer · 2022-12-23 13:06:15Z

1

The blog post CUDA Pro Tip: Always Set the Current Device to Avoid Multithreading Bugs recommends to always explicitly set the device id on a freshly spawned CPU thread with cudaSetDevice().

Setting the device before a fork will only work for the master thread which then spawns the new worker threads. So the worker threads will by default try to use the default device. i.e. device id 0.

answered Dec 23, 2022 at 13:06

paleonix

3,1385 gold badges17 silver badges39 bronze badges

Add a comment |

Collectives™ on Stack Overflow

std::thread's and OpenMP GPU Offloading

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related