PostgreSQL and the OOM Killer: Why You Must Use Strict Memory Overcommit
Key takeaways
- Our team members built and operated five managed Postgre SQL services over the past 15 years.
- Linux allows processes to allocate more virtual memory than what is physically available.
- For most processes, handling an OOM kill is simple: the process restarts, reconnects, and picks up where it left off.
Our team members built and operated five managed Postgre SQL services over the past 15 years. Across all of them, one configuration has remained constant: strict memory overcommit. In this blog post, we will explain how strict memory overcommit protects your database from catastrophic OOM (out of memory) kills. We will also share how a three-character kernel bug forced us to temporarily disable this setting. Finally, we will explain our heuristic for determining the right memory overcommit limit. Hopefully, this will help you find the right setting for your workloads.
Why PostgreSQL Can't Tolerate the OOM KillerStrict Overcommit: Fail Early, Not CatastrophicallyDiscoveryNarrowing It DownFleet-Wide AnalysisThe One-Character BugSetting the Commit LimitWhy 80%Why +2 GBImplementationConclusion
Linux allows processes to allocate more virtual memory than what is physically available. When a process allocates memory, for example with malloc(), the kernel reserves virtual address space for it. However, the kernel does not immediately back that space with physical memory. Physical pages are only consumed when the process actually touches the memory.The kernel relies on the assumption that not all allocated memory will be actively used at the same time. Usually, this assumption holds. When it doesn’t, the kernel invokes the OOM killer to free memory by terminating a process.