Facebook introduced TMO mechanism that saves 20-32% of memory on servers

Engineers from Facebook (prohibited in the Russian Federation) published the report on the implementation of TMO technology (Transpart Memory Offloading) last year, which allows to significantly save RAM on the servers by displacing the secondary work not required for the work of secondary data for cheaper drives, such as NVMe SSDs. According to Facebook, the use of TMO allows you to save from 20 to 32% of RAM on each server. The solution is designed for use in infrastructures in which applications are launched in isolated containers. TMO components working on the nucleus are already included in the Linux nucleus.

On the side of the Linux nucleus, the work of the technology is provided by the PSI subsystem (Pressure Stall Information), supplied from the release of 4.20.
PSI is already used in various memory lack processors and allows you to analyze information about the time for waiting for various resources (CPU, memory, input/output) to accurately assess the level of workload of the system and the nature of the slowdown in case of lack of resources.

In the user space, the work of TMO provides the component senpai , which through CGROUP2 dynamically adjusts the memory limit for data containers obtained from PSI. Senpai analyzes the signs of the onset of resources through PSI, evaluates the sensitivity of the applications to slow down and tries to determine the minimum necessary memory size in which the data required for work remains in the RAM, and the related data settled in the cache and not directly used in this moment, displaced in the Swimming section.



Thus, the essence of TMO is to keep the processes on a strict diet in terms of memory consumption, forcibly seeking the transfer to the unused pages of memory, the crowding out of which does not significantly affect productivity (for example, pages with a code used only in initialization and isolated data in a disk cache). In contrast to the displacement of information into the pumping section in response to a lack of memory in TMO, the data is supplanted on the basis of proactive forecasting and taking into account the file cache.

/Media reports.