2

The unfortunate situation that we're in right now is that historically, we've been over-provisioning RAM to our on-prem VMs and we're starting to run up to the limits of actual usage with midday spikes threatening to OOM a host. At the moment we're seeing VMware's balloon driver kick in to reclaim RAM from cache, but some of our applications are sensitive to this particular blunt instrument, namely Elasticsearch, causing oom-killer to trigger.

What I've been looking for is a tunable parameter to cause older inactive pages to be evicted from the cache after a period of time, rather than residing there until some kind of contention throws them out. It looks like RHEL 5 had /proc/sys/vm/pagecache to at least define a ratio for how much overall space the cache could consume, but that didn't even last until RHEL 6 which I'm not terribly surprised by since a ratio approach quite obviously "smells bad" and there's already min_free_kb that accomplishes the same goal, but better.

Is there a "cache expiry" tunable I've missed somewhere, or perhaps another approach to clear out the cache that isn't quite as aggressive as sync; echo 1 > /proc/sys/vm/drop_caches?

For the record, I know that the true solution is "use less RAM" and/or "get more RAM" and I am very loudly sounding those alarms, but the business is slow to approve any course of action and I need to address this somehow in the interim.

Sammitch
  • 334
  • 2
  • 10
  • Have you looked into revising the `swappiness` param ? – steve Feb 13 '20 at 22:25
  • @steve I hadn't considered it until you mentioned it. I had assumed that had to do with anything other that RSS vs swap. Even so, after some reading `vm.swappiness` only seems to control how aggressively the kernel converts fscache to RSS vs paging out old RSS to swap, whereas I'm looking to convert fscache directly to "free" memory. – Sammitch Feb 13 '20 at 23:01

0 Answers0