About Linux Memory and Swap

6 min readDec 29, 2019

Written before: I am not a system administrator. The purpose of this article is for troubleshooting my own problems and I have read too many misleading articles about this topic on the web, except the https://www.tldp.org.

The word memory will refer to RAM (Random Access Memory)

The word virtual memory will refer to application’s all available logical addresses which have or have not got corresponding physical address due to Linux OS overcommitting

The word swap will refer to the hard disk memory

First let’s have a glimpse of the linux server memory output, which is on kernel version 3.10.0–123

[eclipse@hkclapps15 sa]$ free -h             total      used      free    shared   buffers    cachedMem:          5.7G       3.7G       1.9G       23M        0B       2.1G-/+ buffers/cache:       1.6G       4.1GSwap:         8.0G      150M       7.9G

More columns using sar

[eclipse@hkclapps15 sa]$ sar -r -f sa14 -s 12:00:00 | head -5
Linux 3.10.0-123.20.1.el7.x86_64 (hkclapps15.hk.eclipseoptions.com)     03/14/19        _x86_64_       (4 CPU)
12:00:01    kbmemfree kbmemused %memused kbbuffers kbcached kbcommit  %commit kbactive  kbinact  kbdirty
12:10:01       151292   5796656     97.46         0   2602792   5313264     37.06   3442716   2128300       168
12:20:01       150568   5797380     97.47         0   2601060   5315152     37.07   3415056   2156808       268

Linux system divides physical memory into page blocks (typically 4k). Then give to processes requiring virtual memory like code, cache, metadata, java heap, garbage collector, code cache, compiler, class loading, symbol table, threads, etc. (Refer https://stackoverflow.com/questions/53451103/java-using-much-more-memory-than-heap-size-or-size-correctly-docker-memory-limi/53624438#53624438 for jvm memory allocation)

There are two types of page blocks: file based and non-file based. The non-file based pages are called anonymous pages (rather than annonymous mapping) — the memory allocation made inside code which not based on file like stack and heap. (Refer https://landley.net/writing/memory-faq.txt) The file based pages can be both how application use RAM, and mmap (memory mapped) files including buffer cache and binary image storing on disk. The performance differences between the two can be explained during swapping-in. For file based pages, swap-in is done by kernel looking up the mapping from process opened files. But for anonymous pages, swap-in is done by kernel remembering the region for the non-file based pages on physical memory. Remember the anonymous page swap-in has better performance than file based swap-in when we have fast disk like SSD compared to spinning disk with enough disk space. (reference: https://www.quora.com/How-do-anonymous-VMAs-work-in-Linux)

Why do we care is because swappiness is essentially the anonymous-page-reclaim-priority under system memory stress (also it’s the disk IO cost for file pages). So definitely not the percentage of free memory/total memory! The word “reclaim” here means after pages being purged from physical memory, we can get the data back somewhere.

This is where the swappiness is set.

[eclipse@hkclapps15 sa]$ cat /proc/sys/vm/swappiness30

According to the linux vmscan.c source code (refer https://github.com/torvalds/linux/blob/master/mm/vmscan.c). We are subtracting max 100 swappiness from 200. The highest swappiness means we tell Linux kernel to treat clean pages, dirty pages, anonymous pages equally when making room in the RAM for whatever scenarios.

clean cache — can drop pages without losing data
dirty cache — need to write to disk before dropping pages to not lose data
anonymous cache — cannot be dropped at all unless we have swap

Atop output:MEM | tot    25.5G | free    4.6G | cache  18.3G | dirty 254.3M | buff    0.0M | slab  319.6M | slrec 198.6M | shmem  48.5M | shrss   0.6M | shswp   0.0M

How swap matters in different scenarios:

Memory under stress — clean cache memory starts to drop; file based memory starts to increase; swappiness decides the page reclaim priority for non file based memory. Then everything will try to use swap and slow down the application if swap is on spinning disk. (more likely your swap should be on spinning disk rather than SSD, because SSD costs more than just adding memory)
Memory not under stress — kernel scans RAM and decides to swap anonymous pages out to improve performance
Memory starvation — swap acts as emergency memory and prolongs the thrashing (unrecoverable page faults)

/*     * With swappiness at 100, anonymous and file have the same priority.     * This scanning priority is essentially the inverse of IO cost.     */    anon_prio = swappiness;    file_prio = 200 - anon_prio;

However a big big warning here that swappiness is only the tip of the iceberg for how linux MMU (memory management unit) decides to use swap among CPU scheduling, physical memory temperature, etc. And currently in kernel version 3 there is no good way to tell whether a linux system is under memory pressure or not. (refer https://chrisdown.name/2018/01/02/in-defence-of-swap.html)

In the normal setting, a value of swappiness 0 will avoid ever swapping out just for caching space. Using 100 will always favor making disk cache bigger. In general, high swappiness value maximize throughput: how much work system gets down during a unit of time. Low swappiness value favor latency: getting a quick response time from other applications.

Some Frequently Asked Questions:

Why Swap is used when plenty of free memory is left?

https://serverfault.com/questions/420778/why-swap-is-used-when-plenty-of-free-memory-is-left
system tends to put unused memory and very infrequently accessed memory in swap rather than cache
By default, Linux aggressively swaps processes out of physical memory onto disk in order to keep the disk cache as large as possible

Why is swap still allocated when there is free RAM?

When program closes a file, it is a good idea too keep the file in cache in case it will be used again. Rather than assigning pages to it next time opening a file. When swappiness is 0, system only puts inactive file based memory in to swap. When swappiness is 100, system puts both inactive file memory and anonymous memory into swap. Therefore if the free+cached memory is ok a box. but swap overused. Increasing swappiness will put more inactive heap and stack memory in swap, giving more cached memory for application to use or quickly drop. Decreasing swappiness can decrease swap usage, but will increase the physical used memory thus having less cached memory for application to use or drop.
When swap is allocated, doesn’t necessarily mean it’s being used
Can be application keeping reading the opened file on disk
The system may swap-in the inactive pages whenever it thinks reasonable. Can do “vmstat 1” to see whether system is swapping in or out now (si/so)
Alternatively can use swapoff -a && swapon -a to clear swap used. But may cause problems, especially memory under stress
https://askubuntu.com/questions/1357/how-to-empty-swap-if-there-is-free-ram

How to define memory stress?

slow increase of actually used physical memory
system flushing cached memory and buffer memory (decrease) to make space
system page-in lots of cache as if it’s hungrey
high active (recently accessed) vs inactive memory ratio
Increasing faults per second and major faults per second
If virtual memory is higher than physical memory due to linux overcommitting, it will demand virtual pages according to https://www.tldp.org/LDP/tlk/mm/memory.html
https://www.linuxjournal.com/article/8178 — how to monitor linux memory

What is the result of memory stress?

large paging out to disk
increase dirty cache due to write-back caching for disk writes
CPU spike to keep up paging
thrashing (system doing more paging than processing application)

What exactly is page in and page out stats from sar -B?

Since linux 2.6 buffer cache was included in the page cache from /proc/vmstats. Thus page activity can be reading and writing executable part of binary code so accessing memory mapped files on hard disk.
Currently Linux doesn’t have a clear counter for page cache. So only the si/so from vmstats can tell about swapping
https://serverfault.com/questions/270283/what-does-the-fields-in-sar-b-output-mean
https://lists.gt.net/linux/kernel/1131720?do=post_view_threaded

References:

How redhat explains swappiness and server tuning https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables
Kafka virtual memory handling https://www.cloudera.com/documentation/enterprise/6/6.0/topics/kafka_system_level_broker_tuning.html#virtual_memory_handling
http://www.westnet.com/~gsmith/content/linux-pdflush.htm

About Linux Memory and Swap

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Charlie

No responses yet