Optimizing Memory Consumption on Small Systems

Optimizing Memory Consumption on Small Systems – Again

Many moons ago, I wrote about optimizing memory availability on a small system with only 512MB of RAM. Systems got bigger since then, but there are many cases where such optimisations are still needed, and default configurations are not always cooperative with that goal out of the box. When working all day every day in the MySQL consulting business, this kind of thing matters to cost conscious clients with a lot of VMs.

So here are a few additional setting to consider for systems with no more than 4GB of RAM:

crashkernel=0

If your links distro configures a crashkernel reservation by default, this will save some memory.

swiotlb=0

Only use this if you have less than or equal to 4GB of memory. But if you do, it will save approximately 256MB of RAM.

Between these and some of the settings in the previous article mentioned, on something like the Oracle Cloud free-forever tier instance with only 1GB of RAM, it can make the difference between unusable and useful.

Virtualized Windows 10 Idle CPU Consumption

I recently came across a rather interesting issue that seems to be relatively unrecognised – since 18xx updates, the idling Windows guest VMs seem to be consuming about 30% of CPU on the Linux KVM host. This took me a little while to get to the bottom off, and after excluding the possibility of it being caused by any active processes from inside the VM, I eventually pinned it down to the way system timers are used.

Diagnosis

What seems to be happening is that the Windows kernel keeps polling the CPU timer all the time at a rather aggressive rate, which manifests as rather high CPU usage on the host even though the guest is not doing any productive work. On large virtualization server, this is obviously going to pointlessly burn through a huge amount of CPU for no benefit.

Solution

The solution is to expose an emulated Hyper-V clock. For all other clocks, the kernel seems to incessantly poll the timers, but for Hyper-V it recognises that this is a bad idea in a virtual machine, and starts to behave in a more sensible way.

To achieve this, add this to your libvirt XML guest definition:

<features>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <synic state='on'/>
      <stimer state='on'/>
    </hyperv>
</features>

This gets the VM’s idle CPU usage from 30%+ down to a much more reasonable 1%.

Warm Reboot on Linux with kexec (Remember QEMM?)

If you are old enough to remember QEMM from back in the ’90s, along with other tools we used to squeeze every last byte of memory under the 640KB limit, you may remember a rather cool feature it had – warm reboot.

What is a Warm Reboot?

Reboot involves the computer doing a Power-On Self Test (POST). This takes time, often as much as a few minutes on some servers and workstations. While you are setting something up and need to test frequently that things come up correctly at boot time, the POST can make progress painfully slow. If only we had something like the warm reboot feature that QEMM had back in the ’90s, which allowed us to reset the RAM and reboot DOS without rebooting the entire machine and suffer the POST time. Well, such a thing does actually exist in modern Linux.

Enter kexec

kexec allows us to do exactly this – load a new kernel, kill all processes, and hand over control to the new kernel as the bootloader does at boot time. What do we need for this magic to work? On a modern distro, not much, it is all already included. Let’s start with a script that I use and explain what each component does:

#!/bin/bash

systemctl isolate multi-user.target

rmmod nvidia_drm nvidia_modeset nvidia_uvm
rmmod nvidia

kexec --load=/boot/vmlinuz-$(uname -r) \
      --initrd=/boot/initramfs-$(uname -r).img \
      --command-line="$(cat /proc/cmdline)"

kexec --exec

Let’s look at the kexec lines first. uname -r returns the current kernel version. $(uname -r) bash syntax allows is to take the output of a command and use it as a string in the invoking command. On recent CentOS 8 here is what we get:

$ uname -r
4.18.0-193.6.3.el8_2.centos.plus.x86_64
$ echo $(uname -r)
4.18.0-193.6.3.el8_2.centos.plus.x86_64

The kernel and initial ramdisk usually have the kernel version in their names in /boot/:

$ ls /boot/
initramfs-4.18.0-193.6.3.el8_2.centos.plus.x86_64.img
vmlinuz-4.18.0-193.6.3.el8_2.centos.plus.x86_64

So in our warm reboot script, vmlinuz-$(uname -r) will expand to vmlinuz-4.18.0-193.6.3.el8_2.centos.plus.x86_64. Similar will happen with the initramfs file name.

Next, what is in /proc/cmdline ? This contains the boot parameters that our currently running kernel was booted with, as provided in our grub conifguration, for example:

$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos2)/vmlinuz-4.18.0-193.6.3.el8_2.centos.plus.x86_64 root=ZFS=tank/ROOT quiet elevator=deadline transparent_hugepage=never

This is the minimum needed to boot the kernel. Once we have supplied this information, we initiate the shutdown and process purge, and hand over to the new kernel, using:

kexec --exec.

But what are the systemctl and rmmod lines about? They are mostly to work around finnickiness of Nvidia drivers and GPUs. If you execute kexec immediately, with the Nvidia driver still running, the GPU won’t reset properly and won’t get properly re-initialised by the driver when the kernel warm-boots. So we have to rmmod the nvidia driver. Legacy nvidia driver only includes the nvidia module. Newer versions also include nvidia_drm, nvidia_modeset and nvidia_uvm which depend on the nvidia module, so we have to remove those first. But before we do that, we have to make sure that Xorg isn’t running, otherwise we won’t be able to unload the nvidia driver. To make sure graphical environment isn’t running, we switch the runlevel target to multi-user.target (on a workstation we are probably running graphical.target by default). Once Xorg is no longer running, we can proceed with unloading the nvidia driver modules. And with that done, we can proceed with the warm boot and enjoy a reboot time saving.