18

I'm working on a system with multiple NVIDIA GPUs. I would like disable / make-disappear one of my GPUs, but not the others; without rebooting; and so that I can later re-enable it.

Is this possible?

Notes:

  • Assume I have root (though a non-root solution for users which have permissions for the device files is even better).
  • In case it matters, the distribution is either SLES 12 or SLES 15, and - don't ask me why :-(
10
  • I guess some BIOS let you disable a hardware? Commented Jun 13, 2021 at 10:24
  • @炸鱼薯条德里克: Like I said, I mustn't reboot. So no BIOS access either. Commented Jun 13, 2021 at 10:33
  • Good luck in the wonderful world of PCIe hotplugging! It's a known bug that nvidia's GPU linux drivers can't fully de- and re-initialize GPUs. Nvidia announced a month or so they did something about it (I think there was a Phoronix article?) I don't know whether the fixed driver is available yet. Anyway, try with SLES 15, and ignore SLES 12. Much (good) has happened in the last 5 years when it comes to PCIe hotplugging. Commented Jun 13, 2021 at 11:25
  • ah no, that was AMD. phoronix.com/… However, if AMD haven't had this straight, chances are nvidia is worse (I've yet to encounter an instance where the modern kernel AMD drivers are as bad as nvidia's closed source drivers), sorry :( Commented Jun 13, 2021 at 11:27
  • @einpoklum by the way, for which purpose do you need to disable it? In case this is about it not being used to display stuff, that's a whole different, much much much much MUCH easier problem! Commented Jun 13, 2021 at 11:59

1 Answer 1

24

Disabling:

The following disables a GPU, making it invisible, so that it's not on the list of CUDA devices you can find (and it doesn't even take up a device index)

nvidia-smi -i 0000:xx:00.0 -pm 0
nvidia-smi drain -p 0000:xx:00.0 -m 1

where xx is the PCI device ID of your GPU. You can determine that using lspci | grep NVIDIA or nvidia-smi.

The device will still be visible with lspci after running the commands above.

Re-enabling:

nvidia-smi drain -p 0000:xx:00.0 -m 0

the device should now be visible

Problems with this approach

  • This may fail to work if you are not root; or in some scenarios I can't yet characterize.
  • Haven't yet checked what happens to procesess which are actively using the GPU as you do this.
  • The syntax is baroque and confusing. NVIDIA - for shame, you need to make it simpler to disable GPUs.
2
  • How would I check if the Nvidia GPU is actually turned off? Apart from my laptop cooling down. Commented Nov 5, 2024 at 20:45
  • @damluar: Check nvidia-smi --help; and if you don't have your answer, ask a new question here on the site. Commented Nov 5, 2024 at 23:46

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.