We have one beefy ubuntu rig for our research department that can do heavy lifting with its CPUs and GPUs. All of our researchers SSH into the machine and run (machine learning) workloads on the system.
The problem is that we are having collisions with people using the system at the same time, and a simple chatbox where people call dibs hasn't sufficed. Essentially, if researcher A wants to do a time-sensitive benchmark using the GPUs, we don't want anybody else to touch the GPUs in order to maintain validity.
I am wondering if there is a tool available that can schedule and grant users exclusive access to certain commands or devices. All tasks are run via a centralised Conda (python) installation that is accessible via a custom group. Everybody SSHs into the system. Perhaps it would be possible to block SSH access/make GPUs exclusive/block python access?
EDIT: I should have specified earlier that while we do have an active userbase among our research group, we'd prefer to not complicate the setup with a queueing system. A less intrusive (more naive) change to our setup would be heavily preferred. I am sorry for not mentioning this earlier.