Background information
Assume /tmp is mounted with private propagation, as per the kernel default (but not the systemd default), and as a separate filesystem. If necessary, you can ensure this by running commands inside unshare -m, and/or using mount --bind /tmp /tmp.
# findmnt -n -o propagation /tmp
private
Notice that the following commands return an error:
# touch /tmp/a
# mount --bind /proc/self/ns/mnt /tmp/a
mount: /tmp/a: wrong fs type, bad option, bad superblock on /proc/self/ns/mnt, missing codepage or helper program, or other error.
This is because the kernel code (see extracts below) prevents a simple mount namespace loop. The code comments explain why this is not allowed. The lifetime of a mount namespace is tracked by a simple reference count. If you have a loop where mount namespaces A and B both reference the other, then both A and B will always have at least one reference, and they would never be freed. The allocated memory would be lost, until you rebooted the entire system.
For comparison, the kernel allows the following, which is not a loop:
# unshare -m
# echo $$
8456
# kill -STOP $$
[1]+ Stopped unshare -m
# touch /tmp/a
# mount --bind /proc/8456/ns/mnt /tmp/a
#
# umount /tmp/a # now clean up
# kill %1; echo
#
Question
Where does the kernel code distinguish between the following two cases?
If I try to create a loop using mount propagation, it fails:
# mount --make-shared /tmp
# unshare -m --propagation shared
# echo $$
8456
# kill -STOP $$
[1]+ Stopped unshare -m
# mount --bind /proc/8456/ns/mnt /tmp/a
mount: /tmp/a: wrong fs type, bad option, bad superblock on /proc/9061/ns/mnt, missing codepage or helper program, or other error.
But if I remove the mount propagation, no loop is created, and it succeeds:
# unshare -m --propagation private
# echo $$
8456
# kill -STOP $$
[1]+ Stopped unshare -m
# mount --bind /proc/8456/ns/mnt /tmp/a
#
# umount /tmp/a # cleanup
Kernel code which handles the simpler case
https://elixir.bootlin.com/linux/v4.18/source/fs/namespace.c
static bool mnt_ns_loop(struct dentry *dentry)
{
/* Could bind mounting the mount namespace inode cause a
* mount namespace loop?
*/
struct mnt_namespace *mnt_ns;
if (!is_mnt_ns_file(dentry))
return false;
mnt_ns = to_mnt_ns(get_proc_ns(dentry->d_inode));
return current->nsproxy->mnt_ns->seq >= mnt_ns->seq;
}
...
err = -EINVAL;
if (mnt_ns_loop(old_path.dentry))
goto out;
...
* Assign a sequence number so we can detect when we attempt to bind
* mount a reference to an older mount namespace into the current
* mount namespace, preventing reference counting loops. A 64bit
* number incrementing at 10Ghz will take 12,427 years to wrap which
* is effectively never, so we can ignore the possibility.
*/
static atomic64_t mnt_ns_seq = ATOMIC64_INIT(1);
static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)