2

On a server with Tesla Nvidia Card we decide to Restrict user access to GPU. In our server 2 GPU.

# ls -las /dev/nvidia*
0 crw-rw-rw-. 1 root root 195,   0 Dec  2 22:02 /dev/nvidia0
0 crw-rw-rw-. 1 root root 195,   1 Dec  2 22:02 /dev/nvidia1

I found this solve Defining User Restrictions for GPUs

I create local group gpu_cuda

sudo groupadd gpu_cuda

after add user to group gpu_cuda

Create a config file at /etc/modprob.d/nvidia.conf with content

#!/bin/bash
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=0 NVreg_DeviceFileMode=0777 NVreg_ModifyDeviceFiles=0

Create script in /etc/init.d/gpu-restriction

#!/bin/bash
### BEGIN INIT INFO
# Provides:          gpu-restriction
# Required-Start:    $all
# Required-Stop:
# Default-Start:     2 3 4 5
# Default-Stop:
# Short-Description: Start daemon at boot time
# Description:       Enable service provided by daemon.
#  permissions if needed.
### END INIT INFO
set -e
start() {
/sbin/modprobe --ignore-install nvidia;
/sbin/modprobe nvidia_uvm;
test -c /dev/nvidia-uvm || mknod -m 777 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if [ "$device" == "nvidia-uvm" ]; then echo $major; break; fi ; done) 0 && chown :root /dev/nvidia-uvm; 
test -c /dev/nvidiactl || mknod -m 777 /dev/nvidiactl c 195 255 && chown :root /dev/nvidiactl; 
devid=-1; 
for dev in $(ls -d /sys/bus/pci/devices/*); 
do vendorid=$(cat $dev/vendor); 
if [ "$vendorid" == "0x10de" ]; 
then class=$(cat $dev/class); 
classid=${class%%00}; 
if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; 
then devid=$((devid+1)); 
test -c /dev/nvidia${devid} || mknod -m 750 /dev/nvidia${devid} c 195 ${devid} && chown :gpu_cuda /dev/nvidia${devid}; 
fi; 
fi; 
done
}
stop() {
:
}
case "$1" in
    start)
       start
       ;;
    stop)
       stop
       ;;
    restart)
       stop
       start
       ;;
    status)
       # code to check status of app comes here 
       # example: status program_name
       ;;
    *)
       echo "Usage: $0 {start|stop|status|restart}"
esac
exit 0

I reboot server and run

/etc/init.d/gpu-restriction start

check result in first time is good.

# ls -las /dev/nvidia*
0 crw-rw-rw-. 1 root gpu_cuda 195,   0 Dec  2 22:02 /dev/nvidia0
0 crw-rw-rw-. 1 root gpu_cuda 195,   1 Dec  2 22:02 /dev/nvidia1

but in second time, chown group is back to root

# ls -las /dev/nvidia*
0 crw-rw-rw-. 1 root root 195,   0 Dec  2 22:02 /dev/nvidia0
0 crw-rw-rw-. 1 root root 195,   1 Dec  2 22:02 /dev/nvidia1

Why result back? and how to solve this problem?

MC68020
  • 6,281
  • 2
  • 13
  • 44
Nikolay Baranenko
  • 173
  • 1
  • 2
  • 17
  • 1/ /etc/modprob.d/nvidia.conf ? missing e . Typo error I presume ? 2/ NVreg_DeviceFileGID=0 ? Why do you keep that to root GID ? – MC68020 Dec 04 '22 at 21:50
  • added to post URL Defining User Restrictions for GPUs https://towardsdatascience.com/defining-user-restrictions-for-gpus-6971a658a9ce , config from this article used – Nikolay Baranenko Dec 04 '22 at 22:22
  • Hmmmm… had a hardtime understanding what the guy is doing. But… as a starting point, for sure, there is indeed a typo error in this page : you should definitely read **/etc/modprobe.d/nvidia.conf** modprob**e**.d directory in which you should already get some nvidia.conf. I'll try to figure out the rest of the mess… – MC68020 Dec 04 '22 at 22:43

1 Answers1

1

nvidia provides the way to set the group ID of its special device files without needing to resort to whatever extra somber script :

Whether a user-space NVIDIA driver component does so itself, or invokes nvidia-modprobe, it will default to creating the device files with the following attributes:

  UID:  0     - 'root'
  GID:  0     - 'root'
  Mode: 0666  - 'rw-rw-rw-'

Existing device files are changed if their attributes don't match these defaults. If you want the NVIDIAdriver to create the device files with different attributes, you can specify them with the "NVreg_DeviceFileUID" (user), "NVreg_DeviceFileGID" (group) and "NVreg_DeviceFileMode" NVIDIA Linux kernel module parameters.

The nvidia Linux kernel modue parameters can be set in the /etc/modprobe.d/nvidia.conf file, mine tells :

...
options nvidia \
        NVreg_DeviceFileGID=27 \
        NVreg_DeviceFileMode=432 \
        NVreg_DeviceFileUID=0 \
        NVreg_ModifyDeviceFiles=1\
...

And I indeed can ls -ails /dev/nvidia0 :

3419 0 crw-rw---- 1 root video 195, 0  4 déc.  15:01 /dev/nvidia0

and witness the fact that access to root owned special files is actually restricted to the members of the video group (GID=27 on my system)

Therefore, all you need to do is to get the group id of your gpu_cuda group and modify (or setup) your nvidia.conf accordingly.


Credits : /usr/share/doc/nvidia-drivers-470.141.03/html/faq.html (you'll probably need to adapt the path to your driver version).

MC68020
  • 6,281
  • 2
  • 13
  • 44
  • in my os group gpu_cuda - 1226: options nvidia \ NVreg_DeviceFileGID=1226 \ NVreg_DeviceFileMode=432 \ NVreg_DeviceFileUID=0 \ NVreg_ModifyDeviceFiles=1 ok? – Nikolay Baranenko Dec 05 '22 at 12:02
  • @NikolayBaranenko : Precisely! (Please mark this answer accepted if that fits your needs. (in order for other members of the community not to waste time investigating the useless script posted in OP) – MC68020 Dec 05 '22 at 12:20
  • sure I mark, but this variant not working, may be this important # nvidia-smi Mon Dec 5 12:59:38 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 | – Nikolay Baranenko Dec 05 '22 at 13:59
  • @NikolayBaranenko : Please post the output of your /etc/modeprobe.d/nvidia.conf – MC68020 Dec 05 '22 at 15:53
  • Are you sure /etc/modEprobe.d/nvidia.conf? # cat /etc/modprobe.d/nvidia.conf #!/bin/bash options nvidia \ NVreg_DeviceFileGID=1226 \ NVreg_DeviceFileMode=0660 \ NVreg_DeviceFileUID=0 \ NVreg_ModifyDeviceFiles=1 – Nikolay Baranenko Dec 05 '22 at 16:33
  • on host exist only this files: # nvidia nvidia-bug-report.sh nvidia-cuda-mps-control nvidia-cuda-mps-server nvidia-debugdump nvidia-modprobe nvidia-persistenced nvidia-smi nvidia-xconfig nvidia - absent. – Nikolay Baranenko Dec 05 '22 at 16:54
  • @NikolayBaranenko : Yes ! My bad sorry ! modprobe.d . I managed to find the nvidia page relevent to your driver version : https://download.nvidia.com/XFree86/Linux-x86_64/460.27.04/README/faq.html leading to the question : What init system is your system running (systemd ? openrc ? other ?) How did you install your drivers ? (installing distro dedicated package ? other ?) In fine how are your nvidia modules loaded ? – MC68020 Dec 05 '22 at 16:54
  • I check installed RPMs # yum search nvidia-modprobe nvidia-modprobe-branch-460.x86_64 : NVIDIA kernel module loader nvidia-modprobe-latest.x86_64 : NVIDIA kernel module loader nvidia-modprobe-latest-dkms.x86_64 : NVIDIA kernel module loader – Nikolay Baranenko Dec 05 '22 at 16:58