Page 1 of 1

hang on windows update reboot; ram not being returned to host

Posted: 29. May 2020, 21:24
by alexuniqueunusedusername
I use Virtualbox 6.1:
Version 6.1.2 r135662 (Qt5.9.5)
on Kunbuntu 18.04:
Linux boulez 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I have a windows 10 guest. It works fine - 100% stable - except for reboots, especially when doing a windows update. It'll just....stop. Black screen. Eventually I'll close the window and get the option to terminate it. At this point, the guest is showing as Aborted, but the ram used for the vm isn't returned to linux. I have to reboot. If I try and run it or another vm the OOM killer kicks in and I typically lose the desktop environment. My machine is in this state now.

Using htop I can see 25.4G/31.4G.

Code: Select all

~$ free
              total        used        free      shared  buff/cache   available
Mem:       32896520    26533640     3028276       55216     3334604     5854604
Swap:       2097148      513280     1583868

Code: Select all

~$ vmstat -w
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
 r  b         swpd         free         buff        cache   si   so    bi    bo   in   cs  us  sy  id  wa  st
 1  3       512768      2985668      1111252      2374596    0    0  1678  1678    0    3   1   2  88  10   0

Code: Select all

~$ cat /proc/meminfo 
MemTotal:       32896520 kB
MemFree:         2950448 kB
MemAvailable:    5794756 kB
Buffers:         1086776 kB
Cached:          2095248 kB
SwapCached:        12464 kB
Active:          1124620 kB
Inactive:        2949244 kB
Active(anon):     470860 kB
Inactive(anon):   485580 kB
Active(file):     653760 kB
Inactive(file):  2463664 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        1579260 kB
Dirty:           1012708 kB
Writeback:          5124 kB
AnonPages:        886096 kB
Mapped:           550488 kB
Shmem:             64932 kB
KReclaimable:     180748 kB
Slab:             535740 kB
SReclaimable:     180748 kB
SUnreclaim:       354992 kB
KernelStack:       10928 kB
PageTables:        64928 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18545408 kB
Committed_AS:    5679928 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       45756 kB
VmallocChunk:          0 kB
Percpu:            15360 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:    30338756 kB
DirectMap2M:     3164160 kB
DirectMap1G:     1048576 kB
I can't see any option within virtualbox to free up any memory (not that top is showing that much as belonging to any virtualbox process).

Ideally I'd fix the crash in the first place but I'd settle for being able to not require rebooting the PC to load this or another VM!

Which further information would be useful in diagnosing this problem?

Edit: I've kept digging. Could this be it:

Code: Select all

~$ lsmod | grep -i nvidia
nvidia_uvm            942080  0
nvidia_drm             49152  6
nvidia_modeset       1114112  26 nvidia_drm
nvidia              20463616  1438 nvidia_uvm,nvidia_modeset
drm_kms_helper        180224  1 nvidia_drm
drm                   491520  9 drm_kms_helper,nvidia_drm
ipmi_msghandler       102400  2 ipmi_devintf,nvidia
i2c_nvidia_gpu         16384  0
A "used by" value of 1438, each using 20463616 bytes = 29426679808 bytes = 27.4 gigs. This is close to what's missing.

But:

Code: Select all

~$ nvidia-smi
Fri May 29 21:26:24 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:08:00.0  On |                  N/A |
|  0%   49C    P8    12W / 140W |    513MiB /  5943MiB |     19%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1175      G   /usr/lib/xorg/Xorg                           177MiB |
|    0      1482      G   kwin_x11                                      59MiB |
|    0      1484      G   /usr/bin/krunner                               2MiB |
|    0      1486      G   /usr/bin/plasmashell                         129MiB |
|    0      5345      G   ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files   134MiB |
+-----------------------------------------------------------------------------+
Is virtualbox using nvidia to lock memory for its own use and not freeing it in certain crash situations?

I did:

Code: Select all

$ sudo modprobe -r nvidia_uvm
and now I get:

Code: Select all

~$ lsmod | grep -i nvidia
nvidia_drm             49152  6
nvidia_modeset       1114112  26 nvidia_drm
nvidia              20463616  1437 nvidia_modeset
drm_kms_helper        180224  1 nvidia_drm
drm                   491520  9 drm_kms_helper,nvidia_drm
ipmi_msghandler       102400  2 ipmi_devintf,nvidia
i2c_nvidia_gpu         16384  0
But I can't modprobe -r nvidia_modeset

Code: Select all

modprobe: FATAL: Module nvidia_modeset is in use.

Re: hang on windows update reboot; ram not being returned to host

Posted: 29. May 2020, 23:45
by fth0
You're encountering a variation of a known, yet unsolved bug (Ticket #19007, comment:4).

Your VM is in the saved state. When you resume your VM, and then reboot the guest OS, the crash happens. If you don't need the saved state, simply discard the saved state. If you need the saved state, resume the VM and shut down the guest OS.

Re: hang on windows update reboot; ram not being returned to host

Posted: 30. May 2020, 09:46
by alexuniqueunusedusername
fth0 wrote:You're encountering a variation of a known, yet unsolved bug (Ticket #19007, comment:4).
Thanks. Interesting. Was it something in my log which pointed you in this direction?
fth0 wrote: Your VM is in the saved state. When you resume your VM, and then reboot the guest OS, the crash happens. If you don't need the saved state, simply discard the saved state. If you need the saved state, resume the VM and shut down the guest OS.
Not sure I understand. My VM was active, and I attempted a restart. It ended up in the aborted state. There is no state, and no resume. I can only start it.

Re: hang on windows update reboot; ram not being returned to host

Posted: 30. May 2020, 13:23
by fth0
alexuniqueunusedusername wrote:Was it something in my log which pointed you in this direction?
Yes, a lot. If you follow the link inside Ticket #19007, comment:4 to my detailed explanation in another forum thread, you can maybe recognize most of it. The main difference is the consequential final error indicated by:
VBox.log file wrote:00:00:50.333223 AssertLogRel /home/vbox/vbox-6.1.2/src/VBox/Devices/Graphics/DevVGA.cpp(5645) int vgaR3PciIORegionVRamMapUnmap(PPDMDEVINS, PPDMPCIDEV, uint32_t, RTGCPHYS, RTGCPHYS, PCIADDRESSSPACE): RT_SUCCESS_NP(rc)
00:00:50.333416 VERR_PGM_HANDLER_PHYSICAL_CONFLICT (-1603) - Attempt to register an access handler for a physical range of which a part was already handled.
[...]
00:01:09.279615 AssertLogRel /home/vbox/vbox-6.1.2/src/VBox/Devices/Graphics/DevVGA.cpp(5645) int vgaR3PciIORegionVRamMapUnmap(PPDMDEVINS, PPDMPCIDEV, uint32_t, RTGCPHYS, RTGCPHYS, PCIADDRESSSPACE): RT_SUCCESS_NP(rc)
00:01:09.279647 VERR_PGM_HANDLER_PHYSICAL_CONFLICT (-1603) - Attempt to register an access handler for a physical range of which a part was already handled.
Regarding the saved state: When you started the VM at 2020-05-29T18:48:31.651322000Z, it was in a saved state (or in a combination of an aborted state and a previous saved state):
VBox.log file wrote:00:00:00.358048 Log opened 2020-05-29T18:48:31.651322000Z
00:00:00.363381 Console: Machine state changed to 'Restoring'
[...]
00:00:00.542707 SSM: Saved state info:
alexuniqueunusedusername wrote:Not sure I understand. My VM was active, and I attempted a restart. It ended up in the aborted state. There is no state, and no resume. I can only start it.
Then start the VM and shut down the guest OS. The important part is shutting down the guest OS instead of rebooting it, because thereby you'll get rid of the saved state, which is probably a precondition for the error you're experiencing. HTH

Re: hang on windows update reboot; ram not being returned to host

Posted: 30. May 2020, 16:31
by alexuniqueunusedusername
alexuniqueunusedusername wrote:Not sure I understand. My VM was active, and I attempted a restart. It ended up in the aborted state. There is no state, and no resume. I can only start it.
fth0 wrote:Then start the VM and shut down the guest OS. The important part is shutting down the guest OS instead of rebooting it, because thereby you'll get rid of the saved state, which is probably a precondition for the error you're experiencing. HTH
I think I'm failing to understand the terminology here. I have a host os (linux) running virtualbox. I use virtualbox to run a guest OS (windows) which is a VM. So I'm reading "Then start the VM and shut down the guest OS" as "start the guest OS then shut down the guest OS". "The important part is shutting down the guest OS instead of rebooting it". When it's aborted, all I can do is start it again. Well, I can "discard saved state" and then start it.

Are you using VM to mean virtualbox itself, but not any of the guests it controls?

Are you suggesting that I never restart windows, but instead always shut it down and power it up?

Re: hang on windows update reboot; ram not being returned to host

Posted: 30. May 2020, 17:44
by mpack
alexuniqueunusedusername wrote:So I'm reading "Then start the VM and shut down the guest OS"
No, the advice was to RESUME the vm then SHUT IT DOWN (meaning don't suspend it again). I.e. ensure that the next boot does a full hardware check. This advice was given because the log shows you had resumed from a saved state: the relevant log passage was quoted to you.

You can also just right click the VM and discard the saved state. The VM will behave as if after a crash: any open documents will be lost.

Re: hang on windows update reboot; ram not being returned to host

Posted: 30. May 2020, 18:24
by fth0
alexuniqueunusedusername wrote:I think I'm failing to understand the terminology here.
Ok, let's get that straight first:

A VM (Virtual Machine) is a virtual computer. A virtual computer contains a virtual hard disk. On the virtual hard disk, there is a guest OS installed. So, technically speaking you start the VM (turn the virtual computer on), and the VM starts and runs the guest OS. When you stop using the guest OS, you're supposed to shut down the guest OS by using the method inside the guest OS (e.g. Windows 10 Start menu > Power > Shut down).
alexuniqueunusedusername wrote:Well, I can "discard saved state" and then start it.
If you had no important and unsaved documents inside your guest OS, when last using it, then do so. Otherwise, just start the VM as you always do, and shut it down immediately afterwards.
alexuniqueunusedusername wrote:Are you suggesting that I never restart windows, but instead always shut it down and power it up?
That should prevent the current problem from occurring again, yes.

Re: hang on windows update reboot; ram not being returned to host

Posted: 30. May 2020, 22:32
by alexuniqueunusedusername
mpack wrote:
alexuniqueunusedusername wrote:So I'm reading "Then start the VM and shut down the guest OS"
No, the advice was to RESUME the vm then SHUT IT DOWN (meaning don't suspend it again). I.e. ensure that the next boot does a full hardware check. This advice was given that because the log shows you resumed from a saved state: the relevant log passage was quoted to you.

You can also just right click the VM and discard the saved state. The VM will behave as if after a crash: any open documents will be lost.
I'm not sure what the log shows but my experience was:
1) working machine
2) click restart
3) VM starts to restart but gets stuck on black screen
4) I terminate VM
5) VM in "aborted" state.

At no point was resuming the VM manually a possibility for me. Perhaps the restart invoked a resume and it was this resume which got stuck.

Still interested to know why this situation caused the host to not free up the ram necessitating a reboot. I've never had that in many years of using Linux boxes.

Re: hang on windows update reboot; ram not being returned to host

Posted: 31. May 2020, 00:47
by fth0
Your description is a little bit fuzzy. I'll go backwards from step 2) to step 1):
alexuniqueunusedusername wrote:2) click restart
I assume you mean that in Windows, you clicked on Start > Power > Restart?
alexuniqueunusedusername wrote:1) working machine
It's not clear at all what you mean by that, but: However you got the VM running before step 2), that was not a clean start from power off, but a start from a saved state, resuming the VM.
alexuniqueunusedusername wrote:Still interested to know why this situation caused the host to not free up the ram necessitating a reboot.
I'm sorry, but I don't think that I'm able to explain that to you. Like you're struggling to comprehend the (relatively easy) topic of 'saved states' and resuming VMs (which is also described in the VirtualBox User Manual BTW), I'm struggling to explain the (relatively complex) nature of the bug to you. I've already told you about the bug entry with a link to my technically quite accurate description of the problem, and I don't like to repeat myself the whole day. I hope you understand that this is nothing personal.

Re: hang on windows update reboot; ram not being returned to host

Posted: 31. May 2020, 01:57
by alexuniqueunusedusername
fth0 wrote:Your description is a little bit fuzzy. I'll go backwards from step 2) to step 1):
alexuniqueunusedusername wrote:2) click restart
I assume you mean that in Windows, you clicked on Start > Power > Restart?
alexuniqueunusedusername wrote:1) working machine
It's not clear at all what you mean by that, but: However you got the VM running before step 2), that was not a clean start from power off, but a start from a saved state, resuming the VM.
alexuniqueunusedusername wrote:Still interested to know why this situation caused the host to not free up the ram necessitating a reboot.
I'm sorry, but I don't think that I'm able to explain that to you. Like you're struggling to comprehend the (relatively easy) topic of 'saved states' and resuming VMs (which is also described in the VirtualBox User Manual BTW), I'm struggling to explain the (relatively complex) nature of the bug to you. I've already told you about the bug entry with a link to my technically quite accurate description of the problem, and I don't like to repeat myself the whole day. I hope you understand that this is nothing personal.
> I assume you mean that in Windows, you clicked on Start > Power > Restart?
Yes.

>It's not clear at all what you mean by that, but: However you got the VM running before step 2), that was not a clean start from power off, but a start from a saved state, resuming
>the VM.
I mean I started with a perfectly functioning vm, which i've used for months, and been suspending and resuming almost daily, entirely without problem. Resuming - with state - normally. Up until the point where I did a restart as part of the windows update installation process.

Like I said, although the logs might show that I resumed the vm, and you've said that I should have resumed and then shut down the VM, this was at no point possible in the context of this bug. I've been resuming and suspending the VM daily for a while, and when I did a restart (as part of installing windows updates) it hung (perhaps in the resume part, which might be what is confusing you), and when I then loaded virtualbox I could see that the machine was in the aborted state. I could at no point - once I initiated the restart from within windows, as part of completing the windows update process - resume the vm; i was now starting it, without saved state, after it had entered the aborted state. I thought I'd made this clear, a number of times now.

Re: hang on windows update reboot; ram not being returned to host

Posted: 31. May 2020, 16:23
by fth0
Yes, you made this clear. And since you now made this even clearer by beginning earlier in the timeline, I can try and explain my reasoning better:

You have been regularly suspending and resuming the VM in the past: Every time you suspended the VM, it was in a so-called saved state, and every time you resumed the VM, the saved state was deleted.

One day, you suspended the VM as usual, but for some unknown timing-related reasons, the saved state was inconsistent this time. Then you resumed this VM from the inconsistent saved state at 2020-05-29T18:48:31.651322000Z (the beginning of the VBox.log file you provided). In this VM run, several consequential errors happened, ultimately leading to the VM abort after the next restart of the Windows guest OS.

If this VM run from the inconsistent saved state did not delete the saved state for some reason, then the next VM run from the aborted state could again try to resume the VM from the saved state, leading to the next VM abort. To get rid of the potentially still existing saved state, I suggested to resume or start the VM, and then shut down the Windows guest OS, which also powers off the VM. As an alternative, using Discard saved state... from the context menu of the VM in the VirtualBox Manager would also get rid of the saved state.

Since the inconsistent saved state is the root cause of your problem, avoiding to suspend the VM in the future is a possible strategy to avoid the bug.