[Fixed by supplemental update] kernel panic from memory leak on 10.15.6

Discussions related to using VirtualBox on Mac OS X hosts.
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: kernel panic from memory leak on 10.15.6

Post by fth0 »

xbuntugest wrote:there are 10.15.6 implosions also happening to vmware users
Thanks for sharing the information.

I found a relevant thread where the identification of the symptoms seems to be at least one step ahead of us, and a lead developer is taking part. Here's the posting with the relevant details: VM Ware Fusion potentially causes macOS 10.15.6 to crash.

yyc wrote:if you have a particular "hunch" you would like to test, let me know and i can probably run something overnight.
Now that we have the information from the VMware Fusion thread linked above, we can either lean back and wait or do some own quicker tests to further investigate the issue. First, you could check if VirtualBox also heavily uses the kalloc.32 kernel memory zone or a different zone. Then, you could strip components of your VirtualBox VM with the Windows guest, to eliminate some possible candidates. Alternatively, you could test if an empty VM also exhibits the symptoms.

To be fair, I cannot tell you if its worth the effort. Maybe only check which kernel memory zone is heavily used by VirtualBox.
xbuntugest
Posts: 17
Joined: 26. Jul 2020, 17:18

Re: kernel panic from memory leak on 10.15.6

Post by xbuntugest »

yyc wrote:i used Activity Monitor, observing memory usage increasing in "Memory Used" and "Wired Memory". i understand that this is normal behaviour, except for the fact that memory consumption increases steadily and relentlessly until all available memory is consumed, resulting in a kernel panic on the host.
i'm seeing something horribly wrong in activity monitor, in the memory tab: zero swap used, zero compressed, zero "vm compressed" for all running apps

this is consistent with vm_stat

Code: Select all

Pages stored in compressor:                   0.
Pages occupied by compressor:                 0.
Decompressions:                               0.
Compressions:                                 0.
Pageins:                                 585837.
Pageouts:                                     0.
Swapins:                                      0.
Swapouts:                                     0.
i don't know if this just started but uh it's NOT how it's supposed to work
yyc
Posts: 6
Joined: 24. Jul 2020, 19:07

Re: kernel panic from memory leak on 10.15.6

Post by yyc »

fth0 wrote:
xbuntugest wrote:there are 10.15.6 implosions also happening to vmware users
Thanks for sharing the information.

I found a relevant thread where the identification of the symptoms seems to be at least one step ahead of us, and a lead developer is taking part. Here's the posting with the relevant details: VM Ware Fusion potentially causes macOS 10.15.6 to crash.

yyc wrote:if you have a particular "hunch" you would like to test, let me know and i can probably run something overnight.
Now that we have the information from the VMware Fusion thread linked above, we can either lean back and wait or do some own quicker tests to further investigate the issue. First, you could check if VirtualBox also heavily uses the kalloc.32 kernel memory zone or a different zone. Then, you could strip components of your VirtualBox VM with the Windows guest, to eliminate some possible candidates. Alternatively, you could test if an empty VM also exhibits the symptoms.

To be fair, I cannot tell you if its worth the effort. Maybe only check which kernel memory zone is heavily used by VirtualBox.
glad this is getting somewhere!

after reading the fusion forum thread kindly provided, a quick test with

Code: Select all

sudo zprint -d
shows the same thing fusion users are seeing...

a sampling on the host with windows 10 guest gives:
* kalloc.32 increasing from 255,671,721 to 255,681,737 from one line of output to the next. the difference is 10,016
vs a sampling on the host with linux guest gives:
* kalloc.32 increasing from 227,025,390 to 227,025,965 from one line of output to the next. the difference is 575

my estimate of 15 to 1 difference between windows guest and linux guest memory consumption looks like it's in the ballpark...
iains
Posts: 9
Joined: 23. Jul 2020, 22:31

Re: kernel panic from memory leak on 10.15.6

Post by iains »

When running a VirtualBox VM in VirtualBox 6.1.12 on a macOS Catalina 10.15.6 host, the amount of wired kernel memory grows faster than under macOS Catalina 10.15.5, e.g. 1 GB per hour, eventually leading to a memory shortage.

Q1: macOS offers several ways to obtain memory usage information from different angles, and the results are often seemingly not consistent. Which method did you use to observe the memory usage growth rate?
I see consistent reporting from the GUI (activity monitor) and zprint for the leakage of wired pages (it's also consistent for the non-wired memory but that's not so much of an issue here).
Q2: When the VirtualBox VM is shut down, is part of the memory freed again, and if so, is the remaining memory usage growth rate significantly less?
The part of the memory that's correctly associated with the vbox instance is returned, (most of) the growth in wired memory is not returned and is a cumulative loss between sessions. Note that "sync; purge" (as root) returns the memory that's used for caching files - and can be done on a live instance with no ill-effect [other than a loss of caching performance, I suppose]. However neither quitting the VB instance nor purging the memory caches frees the leaked wired memory. The only way to recover that is a reboot :roll:
Q4: @iains: Can you try a Windows and/or Linux guest to see if it makes a difference in comparison to your macOS guests?
In due course, (I'm the main maintainer for GCC on Darwin/macOS so my machines are tied up with testing there etc. Usually I use the cfarm machines for Linux, so not much in the way of VMs around.
I don't know what's the connection between VirtualBox and the growing wired kernel memory usage you're experiencing. What I do know is that the allocation of physical memory pages for the VirtualBox VM depends on the way the guest OS touches its memory. I don't know if that correlates to your problems at all, therefore I just hinted at that.
Well, [for OSX guests] the way in which memory grows is two-fold
(1) associated with 'ramp up' of the guest ... I am doing toolchain builds which are a good stress of most machines - there's a lot of IO, memory and CPU use. My guests are mostly 4-cores, with a CPU edition compatible with the OSX era in question. Usually they have 4G RAM <= 10.7 or 6G RAM >= 10.8). The ramp-up happens quite quickly with load on the system - and saturates the VB wired memory at a little above the allocation.

(2) long-term losses .. these are mot necessarily at a constant rate, and the 1G/hour is a higher figure (when I had 3 VB instance running on the 16core machine). The rate is lower on a smaller machine with one VB instance.

I'd speculate IO buffers, screen-scraper stuff, etc. but the "losses" are accounted in the anonymous 'zones' section in the zprint output so it's hard to be sure.
Q5: Does the growth rate of the wired kernel memory correlate to the amount of physical memory pages (e.g. Real Memory in Activity Monitor, rss in ps) in use by VirtualBox somehow? Or to any other configurable memory property of VirtualBox VMs?
No that I can see directly - although as noted above I'd say it's related to the number of clients active.
iains wrote:
General question is "where do we go from here?"
.. should someone file a bug (and where ?) ?

In the VirtualBox forums, you have a broader audience to contribute observations, which is suited for discussing problems and refining the picture, but usually there are no VirtualBox developers active here. You can create a ticket in the VirtualBox Bugtracker. Provide enough information to easily reproduce the problem, and add a link to this thread. Then hope for the bug ticket to catch the interest of a VirtualBox developer ... ;)
The post mentioning VMware reinforces the probability that this is a virtualisation framework bug - but I'd say that a bug reported from the VB developers is going to be more successful than my anonymous panic reports in getting a fix ....

@fth0 : I'm doing a new run at the moment keeping the output from zprint at various points - when that's done, I'll attach the wired portion at least, (a full test cycle takes most of a day so do't hold your breath ;) )
iains
Posts: 9
Joined: 23. Jul 2020, 22:31

Re: kernel panic from memory leak on 10.15.6

Post by iains »

and, yes, kalloc.32 does seem to be largest growth. although this is marked as collectable rather than wired (but then the zones are included in the wired summary total. so ...)
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: kernel panic from memory leak on 10.15.6

Post by fth0 »

xbuntugest wrote:i'm seeing something horribly wrong in activity monitor, in the memory tab: zero swap used, zero compressed, zero "vm compressed" for all running apps
That's perfectly fine for a macOS system with plenty of RAM during the first hours of runtime. It should be this way as long as the sum of Memory Used and Cached Files is significantly lower than Physical Memory.
xbuntugest
Posts: 17
Joined: 26. Jul 2020, 17:18

Re: kernel panic from memory leak on 10.15.6

Post by xbuntugest »

fth0 wrote:
xbuntugest wrote:i'm seeing something horribly wrong in activity monitor, in the memory tab: zero swap used, zero compressed, zero "vm compressed" for all running apps
That's perfectly fine for a macOS system with plenty of RAM during the first hours of runtime. It should be this way as long as the sum of Memory Used and Cached Files is significantly lower than Physical Memory.
thank you for saying it. i realized it myself too, eventually. i'm jittery from these last few days. i'd been setting staggered shutdown timers on guest & host to keep from having to do recovery. it's nice to have a (possibly placebo) metric to watch instead, by keeping one eye on activity monitor 'memory used' & quitting unused apps like the olden days.

ps. menumeters uses very low resources & can be optioned down to show only memory used/free
xbuntugest
Posts: 17
Joined: 26. Jul 2020, 17:18

Re: kernel panic from memory leak on 10.15.6

Post by xbuntugest »

from vmware forum, which may be overloaded from being linked by macrumors story
Thanks for your patience, everyone.

We have narrowed down the problem to a regression in the com.apple.security.sandbox kext (or one of its related components) included in macOS 10.15.6 (19G73), and we have now filed a comprehensive report with Apple including a minimal reproduction case which should allow them to easily identify and address the issue.

We have not yet identified any workaround other than refraining from installing macOS 10.15.6, which is a bit painful (and the advice will come too late for anyone who finds this thread because they are already running into this problem), or shutting down your VMs whenever you aren't using them and rebooting your host every day or every few hours or every hour, which is ... ugggggh.

We'll keep investigating possible workarounds to see if we can pop out a point release with a mitigation... To Be Determined. (In all honesty, it isn't looking good, but we've come up with some mighty creative workarounds in the past, so I'll never say never.)

Thanks again to everyone here for the epic assistance and for being so utterly polite and patient despite the mess. Y'all are the best.
--
Darius
trifster
Posts: 19
Joined: 16. Oct 2019, 01:54

Re: kernel panic from memory leak on 10.15.6

Post by trifster »

Not sure if this helps but because of a different issue I've been avoiding upgrading to 6.1.12 on my iMac running 10.15.6 and staying on VB 6.1.10. My Ubuntu VM has been working just fine for several days.

Trifster
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: kernel panic from memory leak on 10.15.6

Post by fth0 »

I've created a ticket in the VirtualBox Bugtracker: 19772.
trifster
Posts: 19
Joined: 16. Oct 2019, 01:54

Re: kernel panic from memory leak on 10.15.6

Post by trifster »

fth0 wrote:I've created a ticket in the VirtualBox Bugtracker: 19772.
Nice write-up!!

So I haven't rebooted in a few days but did fully close VB. Does this line in zprint -d indicate memory leaking?
kalloc.32 32 3563020K 3882952K 114016640 124254492 114016603 4K 128 C
Thx
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: kernel panic from memory leak on 10.15.6

Post by fth0 »

trifster wrote:Does this line in zprint -d indicate memory leaking?
If you let sudo zprint -d run for a minute (or longer), and get such a line for kalloc.32 every 1 to 5 seconds, then you have the problem, and you can also calculate the growth rate.
trifster
Posts: 19
Joined: 16. Oct 2019, 01:54

Re: kernel panic from memory leak on 10.15.6

Post by trifster »

fth0 wrote:
trifster wrote:Does this line in zprint -d indicate memory leaking?
If you let sudo zprint -d run for a minute (or longer), and get such a line for kalloc.32 every 1 to 5 seconds, then you have the problem, and you can also calculate the growth rate.
yes I saw that occur. frustrating apple messing with kext both on Catalina and Big Sur Beta. FWIW I have SPI disabled.
xbuntugest
Posts: 17
Joined: 26. Jul 2020, 17:18

Re: kernel panic from memory leak on 10.15.6

Post by xbuntugest »

from https://communities.vmware.com/message/2973899#2973899
The severity of the issue will vary with guest OS type, with the selection of virtual devices present in the VM (USB, sound, NICs), and with the guest OS workload – not just the amount of CPU load but also the way in which the workload interacts with the virtual hardware.

I will try to generalize how I would expect this to behave – with the caveat that this is largely guesswork, I have not taken any measurements to back this up, and there are simply too many factors at play (and interplay) to even hope to characterize them all:

• Fewer vCPUs will generally fare better than more vCPUs.
• Fewer peripherals will generally be better (removing unnecessary sound cards, USB controllers, NICs might help).
• In a relatively "quiet" VM, a modern guest OS will generally fare better than an older guest OS. A tickless kernel will fare much better than a "tickful" kernel.
• VMs running I/O-heavy workloads (network services, compilers, ...) will fare the worst, with most I/Os triggering the leak.
• Idle guest processor cores (0% CPU) and fully busy processor cores (a CPU-bound workload, 100% CPU) will both fare better than guest processor cores with intermediate "noisy" CPU usage. A virtual processor core's transition from idle to busy and back again will trigger the leak.

There is no VM configuration that I'm aware of which will be fully immune to this leak... it's just a matter of degree. If you are not observing problems with a VM running in Fusion on macOS 10.15.6, it will almost certainly still be leaking memory... just not at a high enough rate to cause a real-world problem.
--
Darius
xbuntugest
Posts: 17
Joined: 26. Jul 2020, 17:18

Re: kernel panic from memory leak on 10.15.6

Post by xbuntugest »

so many people seem not to have the worst of it, i wonder if it's related to or exacerbated by an opt-in feature, like maybe disk encryption
Post Reply