Is it possible to disable unnecessary disk writes to mmap files on linux?

Question

I would like to know if there is a way to prevent Linux from periodically syncing mmap'd files to disk, while still allowing the OS to write back when physical memory gets tight.

I am writing applications which process large images, so large that several multiples of the amount of swap space may be needed. This can result in unexpected OOM crashes as swap is exhausted.

A simple way to allocate large-memory objects without using swap is the mmap() call. This call is very simple to use, and works correctly, but has one major problem which greatly saps performance: The operating system will periodically write out dirty pages from the mmap region.

For my application, this has the effect of reducing CPU utilization from about 2900% to around 700%, making the process 4 times slower.

In the past, RedHat has allowed this behaviour to be turned off by setting the OS parameter vm.flush_mapped_pages to 0, but this setting no longer exists, and would be likely to have unexpected behaviour: I would prefer not to tune OS parameters just to make one process work properly.

FreeBSD allows this behaviour to be turned off by using the NO_SYNC flag in the mmap call, but this is not available in linux.

Here is how I am creating my mmap buffer, with code to handle errors and EINTR removed:

        size_t n = 1048576*(size_t)1024*16; // 16G
        int fd = open("mem.bin", O_TRUNC | O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
        ftruncate(fd, n);
        void* data = mmap(nullptr, n, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_HUGE_2MB, fd, 0);
        close(fd);
        unlink("mem.bin");

Is there a way to make this memory efficient to use?

Thanks, that is a good suggestion, but I should have clarified: I *do* want the OS to be able to page out the memory if physical RAM gets tight, but only if that is necessary. — Peter Fletcher, Aug 09 '20 at 22:08
The purpose for doing all of this is to allow processes to use more memory than is available in (physical RAM + existing swap). Sadly, many mmap options, such as MAP_PRIVATE, have the effect of moving the mapping away from the regular filesystem and back into physical RAM + swap, which I am trying to avoid. — Peter Fletcher, Aug 10 '20 at 22:57
Back-linking my Q on SO: https://stackoverflow.com/questions/69599663 . If you gained some insights, I'd love to hear them. — MWB, Oct 16 '21 at 21:38

score 4 · Answer 1 · answered Feb 16 '21 at 04:22

You can set the vm.dirty_writeback_centisecs sysctl to 0 to disable the kernel's background dirty page flusher threads. You will also want to increase vm.dirty_ratio so that your process won't be forced to do dirty writeback itself until absolutely necessary. Note, with the background flusher threads disabled, the vm.dirty_background_ratio, vm.dirty_background_bytes, and vm.dirty_expire_centisecs sysctls are inconsequential.

Unfortunately, I do not believe the writeback behavior can be tweaked on a per-page or per-VMA basis. You can only adjust it for the entire VM subsystem.

Is it possible to disable unnecessary disk writes to mmap files on linux?

1 Answers1