I'm working on a headless system, and trying to emulate a display (specifically so I can render headless with blender, but the problem is more fundamental).
I'm simulating a login to create the X session, and can run DISPLAY=:8 glxinfo on the host while ssh'd into the remote machine. Unfortunately, what I haven't been able to do is get it to run inside a docker container.
The container is simple:
ARG CUDA_VERSION=11.4.2
ARG UBUNTU_VERSION=20.04
# Dev/deploy images build from the nvidia runtime
FROM nvidia/cudagl:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}
# Setup non-root user.
# Note that the UID needs to match the UID of the external user!
ARG USER_UID=<external_user_uid>
ARG USER_GID=${USER_UID}
ARG USERNAME=<external_username>
ARG HOME=/home/${USERNAME}
# Avoid warnings by switching to noninteractive
ARG DEBIAN_FRONTEND=noninteractive
# Configure apt and install packages
RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get -y install \
# for demo
mesa-utils\
# nonroot
&& groupadd --gid ${USER_GID} ${USERNAME} \
&& useradd --uid ${USER_UID} --gid ${USER_GID} -m ${USERNAME} \
# Clean up
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*
USER ${USERNAME}
The image can be started like so:
# gpus flag isn't needed for the demo, but is part of my actual deployment
docker run --rm --gpus all --mount "source=/tmp/.X11-unix,target=/tmp/.X11-unix,type=bind,consistency=cached" -e DISPLAY=${DISPLAY} reprocontainer glxinfo
This successfully runs on my local dev box (where I'm actually logged in), but not on the remote server I'm ssh'd into, where it still reports:
No protocol specified
Error: unable to open display :8
I can get it to work by mapping the .Xauthority (one of the outputs of the simulated login) file and changing the container hostname, like so:
docker run --rm -it --gpus=all --mount "source=/tmp/.X11-unix,target=/tmp/.X11-unix,type=bind,consistency=cached" -e DISPLAY=:8 -v "/home/<external_username>/.Xauthority:/home/<internal_username>/.Xauthority:rw" -u <internal_username> -h <external_hostname> glxinfo
While this works, it's still confusing that the extra parameters are required for a headless session. My guess is that there's something additional/different that "real" logins do that removes the need for this somehow, but I'm not sure what it could be.
... what gives? What am I missing?