Looking at the oom-kill stack trace we can see a process named zabbix_agentd attempted to allocate 16kb of memory from the order=2 line.
Linux allocates memory in 4kb pages, which need to be contiguous, so this would be 4kb^2=16Kb:
[Sat Jun 3 23:13:15 2023] zabbix_agentd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 [Sat Jun 3 23:13:15 2023] zabbix_agentd cpuset=/ mems_allowed=0
This was then reported that we did not have any 16Kb contiguous pages of memory able to be allocated. We did have about 1.5G~ of memory unused, however, it seems this memory was defragmented thus we were unable to allocate the contiguous pages of memory requested.
[Sat Jun 3 23:13:16 2023] Node 0 Normal: 3926334kB (U) 08kB 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 1570532kB
vm.min_free_kbytes will invoke OOM killer if below the configured threshold (also depends on vm.panic_on_oom value if system will restart or not).
In the above case, even though its close, the free memory was higher than min_free_kbytes which looks to be related to fragmentation of memory.
This is also on CentOS 7 kernel 3.10 which has weak memory compaction during page allocation.
['sysctl vm.min_free_kbytes']
vm.min_free_kbytes = 1153434
Check for the dirty cache in the kernel log
[Wed Mar 1 20:16:58 2023] Node 0 DMA free:15836kB min:52kB low:64kB high:76kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15920kB managed:15836kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables>
[Wed Mar 1 20:16:58 2023] lowmem_reserve[]: 0 2804 386780 386780
[Wed Mar 1 20:16:58 2023] Node 0 DMA32 free:1540444kB min:9504kB low:11880kB high:14256kB active_anon:46292kB inactive_anon:106088kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3126080kB managed:2872088kB mlocked:0kB dirty:0kB writeback:0kB mapped:106084kB shmem:106092kB slab_reclaimable:1136708kB slab_unre>
[Wed Mar 1 20:16:58 2023] lowmem_reserve[]: 0 0 383975 383975
[Wed Mar 1 20:16:58 2023] Node 0 Normal free:207126048kB min:1301160kB low:1626448kB high:1951740kB active_anon:6945872kB inactive_anon:168432228kB active_file:4296kB inactive_file:3500kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:399507456kB managed:393191048kB mlocked:0kB dirty:0kB writeback:0kB mapped:105168332kB shmem:168476144kB slab_>
[Wed Mar 1 20:16:58 2023] lowmem_reserve[]: 0 0 0 0
Check free -m for any potential overprovisioning
Workaround to remove cache and defrag memory
echo 1 > /proc/sys/vm/shrink_memory #remove cache
echo 1 > /proc/sys/vm/compact_memory #defrag memory
During a scenario of memory fragmentation when it is a cache that is defragmenting (in old kernels), the following command can be run frequently to drop the cache as a workaround. It drops all page caches and therefore invalidates cache fragmentation.
echo 3 > /proc/sys/vm/drop_caches
Note: The old kernels are likely to face the above issue more as memory compaction, reclaim, and page allocator logic have all substantially changed over time and are much better in newer versions.