Disconnecting ethernet on one VM causes other VMs to crash

Discussions related to using VirtualBox on Windows hosts.
Post Reply
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

Well, this seems to be a little bit of a weird one. I've been struggling with it for a few weeks to no avail, so I thought I'd throw it out there to see if anyone has any wild ideas. Just to clarify I'm not 100% certain it's VirtualBox yet, but I'll take any advice at this point!

The setup is a bunch of Debian 11 x64 guests on a Windows 10 host. Hyper-V is off. The host is a 6-core/12-thread machine with 32GB of RAM. The guests are single CPU, 1GB RAM headless setups.

The "server" VMs have two ethernet adapters, an internal ethernet and a host-only ethernet. The "client" VM has a single ethernet adapter which is on the same host-only ethernet.

If I disconnect the virtual ethernet cable on the host-only adapter on the "client" VM (via the VBox GUI), there is about a 70% chance that *each* "server" VM will hang/crash. On the PIIX3 chipset, it is a hard hang (100% guest CPU usage). The only log is that the guest has become unresponsive. On the ICH9 chipset, there is an attempt at writing a kernel panic, but it never gets that far and the guest resets. It happens within a few seconds after disconnect. I am at quite a loss how changing the connection state of ethernet on one VM is affecting others at all.

I've tried VirtualBox 7.0 through 7.0.6, and using Debian Linux kernels 5.10, 5.18, and 6.0. I have also tried an internal network rather than host-only and this has the same effect. Same with swapping Debian for Ubuntu on the "server" VMs - exact same hang. Other straw-clutching attempts include changing the paravirtualisation interface (minimal/none/etc.) and also the network interface driver - I tried Intel, paravirt, and the older PcFAST-III one as well. It feels like something really funky is going on here, but I'm starting to reach the limits of my knowledge. If I get a chance in the near future, I'm going to try on a Linux host, and there is also a plan to make doubly sure with a physical setup equivalent to this when possible, but if anyone has any suggestions in the meantime I'm more than open to them.
scottgus1
Site Moderator
Posts: 20965
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by scottgus1 »

rdghickman wrote:I've tried VirtualBox 7.0 through 7.0.6
Does this work correctly on 6.1.40 or 6.1.42? Did it ever work correctly on any other version & if so, which one?
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

This was indeed on my list of testing so I've bumped up the priority. With some very quick testing with the same VMs I can verify so far that it appears:

6.1.42 does not work
6.0.24 does work

I'll see if I can bisect a little further.
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

Hmm, okay, well there's bad news. The crash seems a bit less likely on older versions - as in, it will almost never happen on the first disconnect, but it still happens if you yank the cable out on the "client" a couple times. After a retest of 6.0.24 I got the same crash. Confirmed it was still an issue as far back as 5.2.44. Unfortunately 5.2.10 won't run as Windows 10 refuses.

I'm not sure where this leaves me unfortunately. I'm still struggling to determine a plausible case where this might be a Linux bug but I don't know enough. I think it's probably time to try a Linux host.
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

Update: the crash only happens if there is an established TCP connection between the VMs at the time the virtual cable is yanked. I suspect which side crashes depends on the direction the TCP connection was created. So this is how one VM is affecting another, so it could be some fault with Linux in theory, if the sudden reset/termination of a TCP connection was triggering some bad code somehow.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by mpack »

Well, can you clarify if the VM is crashing like you said, or just guest code inside the VM?

These are very different scenarios - and talking about guests making network connections makes it sound like you are talking about guest problems.
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

I've done some more work on this, and I can confirm that while the VM appears to simply hang/crash with no output or kernel dump file, by attaching a serial port in VirtualBox I have managed to get the panic text, so it is definitely the guest kernel dying. I will go away and study this for a while to see if I can figure anything out. While the kernel crash only seems to happen only when the guest software is running and has a connection, it looks to be related to the queuing disciplines and hierarchical token bucket processing when sending something.

Code: Select all

[  109.810702] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  109.812987] #PF: supervisor read access in kernel mode
[  109.814030] #PF: error_code(0x0000) - not-present page
[  109.815078] PGD 3b45067 P4D 3b45067 PUD 0 
[  109.815889] Oops: 0000 [#1] SMP NOPTI
[  109.817372] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-20-amd64 #1 Debian 5.10.158-2
[  109.818715] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  109.820062] RIP: 0010:rb_next+0x0/0x50
[  109.820932] Code: 89 fe 4c 89 c7 48 89 14 24 e8 dc 66 71 00 48 8b 43 10 48 8b 14 24 49 89 d8 e9 ee fe ff ff 48 c7 07 01 00 00 00 c3 cc cc cc cc <48> 8b 17 48 39 d7 74 3d 48 8b 47 08 48 85 c0 74 20 49 89 c0 48 8b
[  109.824077] RSP: 0000:ffffb1c000003e68 EFLAGS: 00010246
[  109.825030] RAX: 0000000000000000 RBX: ffff8cdbc5628278 RCX: 0000000000000000
[  109.826125] RDX: ffff8cdbc5628140 RSI: ffff8cdbc4a00400 RDI: 0000000000000000
[  109.827176] RBP: ffff8cdbc4a00000 R08: 0000000000000000 R09: 0000000000000001
[  109.828214] R10: ffff8cdbc5628140 R11: ffffffff8ee060c0 R12: 0000000000000000
[  109.829363] R13: ffff8cdbc5628000 R14: ffff8cdbc5628280 R15: ffff8cdbc5628140
[  109.830407] FS:  0000000000000000(0000) GS:ffff8cdbfec00000(0000) knlGS:0000000000000000
[  109.831645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.832623] CR2: 0000000000000000 CR3: 0000000003b2a006 CR4: 00000000000706f0
[  109.834099] Call Trace:
[  109.834731]  <IRQ>
[  109.835306]  htb_dequeue+0x7c1/0x840 [sch_htb]
[  109.836114]  __qdisc_run+0x88/0x560
[  109.836923]  net_tx_action+0x105/0x270
[  109.837659]  __do_softirq+0xc5/0x279
[  109.838446]  asm_call_irq_on_stack+0x12/0x20
[  109.839322]  </IRQ>
[  109.839908]  do_softirq_own_stack+0x37/0x50
[  109.840776]  irq_exit_rcu+0x92/0xc0
[  109.841491]  sysvec_apic_timer_interrupt+0x36/0x80
[  109.842372]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  109.843320] RIP: 0010:mwait_idle+0x57/0x80
[  109.844118] Code: 89 d1 65 48 8b 04 25 c0 fb 01 00 0f 01 c8 48 8b 00 a8 08 75 33 0f 1f 44 00 00 0f 00 2d 0c 6d 50 00 31 c0 48 89 c1 fb 0f 01 c9 <65> 48 8b 04 25 c0 fb 01 00 3e 80 60 02 df c3 cc cc cc cc 0f ae f0
[  109.847096] RSP: 0000:ffffffff8ee03ec0 EFLAGS: 00000246
[  109.847965] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  109.848991] RDX: 0000000000000000 RSI: ffffffff8ee03e50 RDI: 00000019887e5976
[  109.850172] RBP: ffffffff8ee13940 R08: 0000000000000001 R09: 000000000007c000
[  109.851152] R10: 000000000007c000 R11: 0000000000000000 R12: 0000000000000000
[  109.852201] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  109.853216]  default_idle_call+0x3c/0xd0
[  109.853899]  do_idle+0x20c/0x2b0
[  109.854508]  cpu_startup_entry+0x19/0x20
[  109.855179]  start_kernel+0x574/0x599
[  109.855843]  secondary_startup_64_no_verify+0xb0/0xbb
[  109.856626] Modules linked in: sch_tbf sch_netem cls_u32 sch_htb intel_rapl_msr intel_rapl_common intel_pmc_core intel_powerclamp ghash_clmulni_intel aesni_intel libaes crypto_simd vmwgfx cryptd glue_helper rapl ttm drm_kms_helper pcspkr sg serio_raw joydev vboxguest evdev ac cec button drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sd_mod t10_pi crc_t10dif crct10dif_generic ohci_pci ehci_pci ohci_hcd ahci libahci ehci_hcd libata usbcore virtio_net net_failover failover scsi_mod crct10dif_pclmul crct10dif_common crc32_pclmul psmouse crc32c_intel virtio_pci virtio_ring i2c_piix4 usb_common virtio battery video
[  109.865010] CR2: 0000000000000000
[  109.865766] ---[ end trace 45651f0b70afd64a ]---
[  109.866674] RIP: 0010:rb_next+0x0/0x50
[  109.867388] Code: 89 fe 4c 89 c7 48 89 14 24 e8 dc 66 71 00 48 8b 43 10 48 8b 14 24 49 89 d8 e9 ee fe ff ff 48 c7 07 01 00 00 00 c3 cc cc cc cc <48> 8b 17 48 39 d7 74 3d 48 8b 47 08 48 85 c0 74 20 49 89 c0 48 8b
[  109.870318] RSP: 0000:ffffb1c000003e68 EFLAGS: 00010246
[  109.871171] RAX: 0000000000000000 RBX: ffff8cdbc5628278 RCX: 0000000000000000
[  109.872297] RDX: ffff8cdbc5628140 RSI: ffff8cdbc4a00400 RDI: 0000000000000000
[  109.873458] RBP: ffff8cdbc4a00000 R08: 0000000000000000 R09: 0000000000000001
[  109.874513] R10: ffff8cdbc5628140 R11: ffffffff8ee060c0 R12: 0000000000000000
[  109.875580] R13: ffff8cdbc5628000 R14: ffff8cdbc5628280 R15: ffff8cdbc5628140
[  109.876647] FS:  0000000000000000(0000) GS:ffff8cdbfec00000(0000) knlGS:0000000000000000
[  109.877760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.878673] CR2: 0000000000000000 CR3: 0000000003b2a006 CR4: 00000000000706f0
[  109.879745] Kernel panic - not syncing: Fatal exception in interrupt
[  109.880779] Kernel Offset: 0xc800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  109.882950] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
scottgus1
Site Moderator
Posts: 20965
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by scottgus1 »

A kernel panic on a Windows host has lately been because Hyper-V is enabled in the Windows host OS, and the VM does not have 2 processors.

Check to see if the VM has 2 processors, & if not set it to 2. Then try the VM.

If it goes unstable again, Start the VM from full normal shutdown, not save-state. Run until you see the problem happen, then shut down the VM from within the VM's OS if possible. If not possible, close the Virtualbox window for the VM with the Power Off option set.

Right-click the VM in the main Virtualbox window's VM list, choose Show in Explorer/Finder/File Manager. In the "Logs" subfolder, zip the VM's "vbox.log", and post the zip file, using the forum's Upload Attachment tab. (Configure your host OS to show all extensions so you can find the "vbox.log", not "vbox.log.1", etc.)
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

I am relatively sure Hyper-V is off. This is because I had to disable it some time ago, to avoid the bizarre situation where if it was on, copying a file to the VM would corrupt it. I have double checked in Windows features and it is unchecked. All the Hyper-V services are not running in Windows services (if that matters), but I won't claim to be an expert here.

Adding a second CPU to the VM creates an interesting variant on the problem. When the client VM is disconnected, I get the kernel panic dump but it keeps running. However, when the ethernet is then reconnected on the client, it *then* hard hangs. The kernel panic looks to be the same trace.

Attaching logs for the 2-CPU hanging guest.
Attachments
VBox_sim01_230123_2cpu.zip
(56.63 KiB) Downloaded 6 times
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by mpack »

I doubt this is the solution, but don't you think it's about time the GAs were updated?
00:00:13.241881 VMMDev: Guest Log: vboxguest: host-version: 7.0.6r155176 0x8000000f
00:00:13.243515 VMMDev: Guest Additions information report: Version 6.0.0 r127566 '6.0.0'
You are correct that Hyper-v is not running.

Is 1024MB RAM really enough for a Debian (64bit) VM? I recently had to update mine to 16GB to make it stable when running Yocto. I would increase to 4096MB.

And 16MB is definitely not enough graphics RAM. I'd increase to 128MB.
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by rdghickman »

The "server" VMs are tiny headless installations with about 700MB available on boot. I did just try increasing them to 4GB with 128MB video memory to no effect.

I'll have a look at guest additions.

Update: guest additions at version 7.0.6 made no effect.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Disconnecting ethernet on one VM causes other VMs to crash

Post by mpack »

rdghickman wrote:The "server" VMs are tiny headless installations with about 700MB available on boot.
AFAIK even headless PCs maintain a virtual display which is what they paint when you connect to them. Assuming it's a GUI OS of course. If the installed OS is entirely text based - real text, not a rendered font - then yes it should be stable with less graphics RAM. But if you can afford it then I'd still provide it.
Post Reply