Disconnecting ethernet on one VM causes other VMs to crash
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Disconnecting ethernet on one VM causes other VMs to crash
Well, this seems to be a little bit of a weird one. I've been struggling with it for a few weeks to no avail, so I thought I'd throw it out there to see if anyone has any wild ideas. Just to clarify I'm not 100% certain it's VirtualBox yet, but I'll take any advice at this point!
The setup is a bunch of Debian 11 x64 guests on a Windows 10 host. Hyper-V is off. The host is a 6-core/12-thread machine with 32GB of RAM. The guests are single CPU, 1GB RAM headless setups.
The "server" VMs have two ethernet adapters, an internal ethernet and a host-only ethernet. The "client" VM has a single ethernet adapter which is on the same host-only ethernet.
If I disconnect the virtual ethernet cable on the host-only adapter on the "client" VM (via the VBox GUI), there is about a 70% chance that *each* "server" VM will hang/crash. On the PIIX3 chipset, it is a hard hang (100% guest CPU usage). The only log is that the guest has become unresponsive. On the ICH9 chipset, there is an attempt at writing a kernel panic, but it never gets that far and the guest resets. It happens within a few seconds after disconnect. I am at quite a loss how changing the connection state of ethernet on one VM is affecting others at all.
I've tried VirtualBox 7.0 through 7.0.6, and using Debian Linux kernels 5.10, 5.18, and 6.0. I have also tried an internal network rather than host-only and this has the same effect. Same with swapping Debian for Ubuntu on the "server" VMs - exact same hang. Other straw-clutching attempts include changing the paravirtualisation interface (minimal/none/etc.) and also the network interface driver - I tried Intel, paravirt, and the older PcFAST-III one as well. It feels like something really funky is going on here, but I'm starting to reach the limits of my knowledge. If I get a chance in the near future, I'm going to try on a Linux host, and there is also a plan to make doubly sure with a physical setup equivalent to this when possible, but if anyone has any suggestions in the meantime I'm more than open to them.
The setup is a bunch of Debian 11 x64 guests on a Windows 10 host. Hyper-V is off. The host is a 6-core/12-thread machine with 32GB of RAM. The guests are single CPU, 1GB RAM headless setups.
The "server" VMs have two ethernet adapters, an internal ethernet and a host-only ethernet. The "client" VM has a single ethernet adapter which is on the same host-only ethernet.
If I disconnect the virtual ethernet cable on the host-only adapter on the "client" VM (via the VBox GUI), there is about a 70% chance that *each* "server" VM will hang/crash. On the PIIX3 chipset, it is a hard hang (100% guest CPU usage). The only log is that the guest has become unresponsive. On the ICH9 chipset, there is an attempt at writing a kernel panic, but it never gets that far and the guest resets. It happens within a few seconds after disconnect. I am at quite a loss how changing the connection state of ethernet on one VM is affecting others at all.
I've tried VirtualBox 7.0 through 7.0.6, and using Debian Linux kernels 5.10, 5.18, and 6.0. I have also tried an internal network rather than host-only and this has the same effect. Same with swapping Debian for Ubuntu on the "server" VMs - exact same hang. Other straw-clutching attempts include changing the paravirtualisation interface (minimal/none/etc.) and also the network interface driver - I tried Intel, paravirt, and the older PcFAST-III one as well. It feels like something really funky is going on here, but I'm starting to reach the limits of my knowledge. If I get a chance in the near future, I'm going to try on a Linux host, and there is also a plan to make doubly sure with a physical setup equivalent to this when possible, but if anyone has any suggestions in the meantime I'm more than open to them.
-
- Site Moderator
- Posts: 20965
- Joined: 30. Dec 2009, 20:14
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Windows, Linux
Re: Disconnecting ethernet on one VM causes other VMs to crash
Does this work correctly on 6.1.40 or 6.1.42? Did it ever work correctly on any other version & if so, which one?rdghickman wrote:I've tried VirtualBox 7.0 through 7.0.6
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Re: Disconnecting ethernet on one VM causes other VMs to crash
This was indeed on my list of testing so I've bumped up the priority. With some very quick testing with the same VMs I can verify so far that it appears:
6.1.42 does not work
6.0.24 does work
I'll see if I can bisect a little further.
6.1.42 does not work
6.0.24 does work
I'll see if I can bisect a little further.
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Re: Disconnecting ethernet on one VM causes other VMs to crash
Hmm, okay, well there's bad news. The crash seems a bit less likely on older versions - as in, it will almost never happen on the first disconnect, but it still happens if you yank the cable out on the "client" a couple times. After a retest of 6.0.24 I got the same crash. Confirmed it was still an issue as far back as 5.2.44. Unfortunately 5.2.10 won't run as Windows 10 refuses.
I'm not sure where this leaves me unfortunately. I'm still struggling to determine a plausible case where this might be a Linux bug but I don't know enough. I think it's probably time to try a Linux host.
I'm not sure where this leaves me unfortunately. I'm still struggling to determine a plausible case where this might be a Linux bug but I don't know enough. I think it's probably time to try a Linux host.
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Re: Disconnecting ethernet on one VM causes other VMs to crash
Update: the crash only happens if there is an established TCP connection between the VMs at the time the virtual cable is yanked. I suspect which side crashes depends on the direction the TCP connection was created. So this is how one VM is affecting another, so it could be some fault with Linux in theory, if the sudden reset/termination of a TCP connection was triggering some bad code somehow.
-
- Site Moderator
- Posts: 39156
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Mostly XP
Re: Disconnecting ethernet on one VM causes other VMs to crash
Well, can you clarify if the VM is crashing like you said, or just guest code inside the VM?
These are very different scenarios - and talking about guests making network connections makes it sound like you are talking about guest problems.
These are very different scenarios - and talking about guests making network connections makes it sound like you are talking about guest problems.
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Re: Disconnecting ethernet on one VM causes other VMs to crash
I've done some more work on this, and I can confirm that while the VM appears to simply hang/crash with no output or kernel dump file, by attaching a serial port in VirtualBox I have managed to get the panic text, so it is definitely the guest kernel dying. I will go away and study this for a while to see if I can figure anything out. While the kernel crash only seems to happen only when the guest software is running and has a connection, it looks to be related to the queuing disciplines and hierarchical token bucket processing when sending something.
Code: Select all
[ 109.810702] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 109.812987] #PF: supervisor read access in kernel mode
[ 109.814030] #PF: error_code(0x0000) - not-present page
[ 109.815078] PGD 3b45067 P4D 3b45067 PUD 0
[ 109.815889] Oops: 0000 [#1] SMP NOPTI
[ 109.817372] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-20-amd64 #1 Debian 5.10.158-2
[ 109.818715] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 109.820062] RIP: 0010:rb_next+0x0/0x50
[ 109.820932] Code: 89 fe 4c 89 c7 48 89 14 24 e8 dc 66 71 00 48 8b 43 10 48 8b 14 24 49 89 d8 e9 ee fe ff ff 48 c7 07 01 00 00 00 c3 cc cc cc cc <48> 8b 17 48 39 d7 74 3d 48 8b 47 08 48 85 c0 74 20 49 89 c0 48 8b
[ 109.824077] RSP: 0000:ffffb1c000003e68 EFLAGS: 00010246
[ 109.825030] RAX: 0000000000000000 RBX: ffff8cdbc5628278 RCX: 0000000000000000
[ 109.826125] RDX: ffff8cdbc5628140 RSI: ffff8cdbc4a00400 RDI: 0000000000000000
[ 109.827176] RBP: ffff8cdbc4a00000 R08: 0000000000000000 R09: 0000000000000001
[ 109.828214] R10: ffff8cdbc5628140 R11: ffffffff8ee060c0 R12: 0000000000000000
[ 109.829363] R13: ffff8cdbc5628000 R14: ffff8cdbc5628280 R15: ffff8cdbc5628140
[ 109.830407] FS: 0000000000000000(0000) GS:ffff8cdbfec00000(0000) knlGS:0000000000000000
[ 109.831645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.832623] CR2: 0000000000000000 CR3: 0000000003b2a006 CR4: 00000000000706f0
[ 109.834099] Call Trace:
[ 109.834731] <IRQ>
[ 109.835306] htb_dequeue+0x7c1/0x840 [sch_htb]
[ 109.836114] __qdisc_run+0x88/0x560
[ 109.836923] net_tx_action+0x105/0x270
[ 109.837659] __do_softirq+0xc5/0x279
[ 109.838446] asm_call_irq_on_stack+0x12/0x20
[ 109.839322] </IRQ>
[ 109.839908] do_softirq_own_stack+0x37/0x50
[ 109.840776] irq_exit_rcu+0x92/0xc0
[ 109.841491] sysvec_apic_timer_interrupt+0x36/0x80
[ 109.842372] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 109.843320] RIP: 0010:mwait_idle+0x57/0x80
[ 109.844118] Code: 89 d1 65 48 8b 04 25 c0 fb 01 00 0f 01 c8 48 8b 00 a8 08 75 33 0f 1f 44 00 00 0f 00 2d 0c 6d 50 00 31 c0 48 89 c1 fb 0f 01 c9 <65> 48 8b 04 25 c0 fb 01 00 3e 80 60 02 df c3 cc cc cc cc 0f ae f0
[ 109.847096] RSP: 0000:ffffffff8ee03ec0 EFLAGS: 00000246
[ 109.847965] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 109.848991] RDX: 0000000000000000 RSI: ffffffff8ee03e50 RDI: 00000019887e5976
[ 109.850172] RBP: ffffffff8ee13940 R08: 0000000000000001 R09: 000000000007c000
[ 109.851152] R10: 000000000007c000 R11: 0000000000000000 R12: 0000000000000000
[ 109.852201] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 109.853216] default_idle_call+0x3c/0xd0
[ 109.853899] do_idle+0x20c/0x2b0
[ 109.854508] cpu_startup_entry+0x19/0x20
[ 109.855179] start_kernel+0x574/0x599
[ 109.855843] secondary_startup_64_no_verify+0xb0/0xbb
[ 109.856626] Modules linked in: sch_tbf sch_netem cls_u32 sch_htb intel_rapl_msr intel_rapl_common intel_pmc_core intel_powerclamp ghash_clmulni_intel aesni_intel libaes crypto_simd vmwgfx cryptd glue_helper rapl ttm drm_kms_helper pcspkr sg serio_raw joydev vboxguest evdev ac cec button drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sd_mod t10_pi crc_t10dif crct10dif_generic ohci_pci ehci_pci ohci_hcd ahci libahci ehci_hcd libata usbcore virtio_net net_failover failover scsi_mod crct10dif_pclmul crct10dif_common crc32_pclmul psmouse crc32c_intel virtio_pci virtio_ring i2c_piix4 usb_common virtio battery video
[ 109.865010] CR2: 0000000000000000
[ 109.865766] ---[ end trace 45651f0b70afd64a ]---
[ 109.866674] RIP: 0010:rb_next+0x0/0x50
[ 109.867388] Code: 89 fe 4c 89 c7 48 89 14 24 e8 dc 66 71 00 48 8b 43 10 48 8b 14 24 49 89 d8 e9 ee fe ff ff 48 c7 07 01 00 00 00 c3 cc cc cc cc <48> 8b 17 48 39 d7 74 3d 48 8b 47 08 48 85 c0 74 20 49 89 c0 48 8b
[ 109.870318] RSP: 0000:ffffb1c000003e68 EFLAGS: 00010246
[ 109.871171] RAX: 0000000000000000 RBX: ffff8cdbc5628278 RCX: 0000000000000000
[ 109.872297] RDX: ffff8cdbc5628140 RSI: ffff8cdbc4a00400 RDI: 0000000000000000
[ 109.873458] RBP: ffff8cdbc4a00000 R08: 0000000000000000 R09: 0000000000000001
[ 109.874513] R10: ffff8cdbc5628140 R11: ffffffff8ee060c0 R12: 0000000000000000
[ 109.875580] R13: ffff8cdbc5628000 R14: ffff8cdbc5628280 R15: ffff8cdbc5628140
[ 109.876647] FS: 0000000000000000(0000) GS:ffff8cdbfec00000(0000) knlGS:0000000000000000
[ 109.877760] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.878673] CR2: 0000000000000000 CR3: 0000000003b2a006 CR4: 00000000000706f0
[ 109.879745] Kernel panic - not syncing: Fatal exception in interrupt
[ 109.880779] Kernel Offset: 0xc800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 109.882950] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
-
- Site Moderator
- Posts: 20965
- Joined: 30. Dec 2009, 20:14
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Windows, Linux
Re: Disconnecting ethernet on one VM causes other VMs to crash
A kernel panic on a Windows host has lately been because Hyper-V is enabled in the Windows host OS, and the VM does not have 2 processors.
Check to see if the VM has 2 processors, & if not set it to 2. Then try the VM.
If it goes unstable again, Start the VM from full normal shutdown, not save-state. Run until you see the problem happen, then shut down the VM from within the VM's OS if possible. If not possible, close the Virtualbox window for the VM with the Power Off option set.
Right-click the VM in the main Virtualbox window's VM list, choose Show in Explorer/Finder/File Manager. In the "Logs" subfolder, zip the VM's "vbox.log", and post the zip file, using the forum's Upload Attachment tab. (Configure your host OS to show all extensions so you can find the "vbox.log", not "vbox.log.1", etc.)
Check to see if the VM has 2 processors, & if not set it to 2. Then try the VM.
If it goes unstable again, Start the VM from full normal shutdown, not save-state. Run until you see the problem happen, then shut down the VM from within the VM's OS if possible. If not possible, close the Virtualbox window for the VM with the Power Off option set.
Right-click the VM in the main Virtualbox window's VM list, choose Show in Explorer/Finder/File Manager. In the "Logs" subfolder, zip the VM's "vbox.log", and post the zip file, using the forum's Upload Attachment tab. (Configure your host OS to show all extensions so you can find the "vbox.log", not "vbox.log.1", etc.)
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Re: Disconnecting ethernet on one VM causes other VMs to crash
I am relatively sure Hyper-V is off. This is because I had to disable it some time ago, to avoid the bizarre situation where if it was on, copying a file to the VM would corrupt it. I have double checked in Windows features and it is unchecked. All the Hyper-V services are not running in Windows services (if that matters), but I won't claim to be an expert here.
Adding a second CPU to the VM creates an interesting variant on the problem. When the client VM is disconnected, I get the kernel panic dump but it keeps running. However, when the ethernet is then reconnected on the client, it *then* hard hangs. The kernel panic looks to be the same trace.
Attaching logs for the 2-CPU hanging guest.
Adding a second CPU to the VM creates an interesting variant on the problem. When the client VM is disconnected, I get the kernel panic dump but it keeps running. However, when the ethernet is then reconnected on the client, it *then* hard hangs. The kernel panic looks to be the same trace.
Attaching logs for the 2-CPU hanging guest.
- Attachments
-
- VBox_sim01_230123_2cpu.zip
- (56.63 KiB) Downloaded 6 times
-
- Site Moderator
- Posts: 39156
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Mostly XP
Re: Disconnecting ethernet on one VM causes other VMs to crash
I doubt this is the solution, but don't you think it's about time the GAs were updated?
Is 1024MB RAM really enough for a Debian (64bit) VM? I recently had to update mine to 16GB to make it stable when running Yocto. I would increase to 4096MB.
And 16MB is definitely not enough graphics RAM. I'd increase to 128MB.
You are correct that Hyper-v is not running.00:00:13.241881 VMMDev: Guest Log: vboxguest: host-version: 7.0.6r155176 0x8000000f
00:00:13.243515 VMMDev: Guest Additions information report: Version 6.0.0 r127566 '6.0.0'
Is 1024MB RAM really enough for a Debian (64bit) VM? I recently had to update mine to 16GB to make it stable when running Yocto. I would increase to 4096MB.
And 16MB is definitely not enough graphics RAM. I'd increase to 128MB.
-
- Posts: 10
- Joined: 30. Nov 2020, 14:22
Re: Disconnecting ethernet on one VM causes other VMs to crash
The "server" VMs are tiny headless installations with about 700MB available on boot. I did just try increasing them to 4GB with 128MB video memory to no effect.
I'll have a look at guest additions.
Update: guest additions at version 7.0.6 made no effect.
I'll have a look at guest additions.
Update: guest additions at version 7.0.6 made no effect.
-
- Site Moderator
- Posts: 39156
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Mostly XP
Re: Disconnecting ethernet on one VM causes other VMs to crash
AFAIK even headless PCs maintain a virtual display which is what they paint when you connect to them. Assuming it's a GUI OS of course. If the installed OS is entirely text based - real text, not a rendered font - then yes it should be stable with less graphics RAM. But if you can afford it then I'd still provide it.rdghickman wrote:The "server" VMs are tiny headless installations with about 700MB available on boot.