Nat Network stops working after a couple days

Discussions related to using VirtualBox on Linux hosts.
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

Hi. I'm having the same issue, but it's not after a couple of days, it's after several hours...

I have virtualbox on a Linux host server with a mix of windows/linux guests. After several hours the host network disappears.
What I mean by this is the two IPs from the screenshot below do not show up on an arp scan and web browsing from the guests doens't work anymore.
ksnip_20210117-143032.png
ksnip_20210117-143032.png (31.62 KiB) Viewed 5802 times

The only way to bring them back and restore connectivity on the guests is by stopping and restarting the natnetwork on the host.

Code: Select all

vboxmanage natnetwork modify --netname "ECLabNet" --dhcp off
vboxmanage natnetwork stop --netname "ECLabNet"
vboxmanage natnetwork start --netname "ECLabNet"
Relevant Host Version Information:
Kernel: 5.4.0-60-generic x86_64
compiler: gcc v: 9.3.0
Desktop: Xfce 4.14.2
Distro: Ubuntu 20.04.1 LTS (Focal Fossa)
Virtualbox: 6.1.16 r140961
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

Please post a zip file containing the VBoxSVC.log files (~/.config/VirtualBox/VBoxSVC.log*).
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

vboxlogs.zip
(12.98 KiB) Downloaded 108 times
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

dheko wrote:I'm having the same issue, but it's not after a couple of days, it's after several hours...
In the VBoxSVC.log files you provided, it was rather after nearly 3 days: The VBoxNetDHCP and VBoxNetNAT services were started around 2021-01-13T20:31:35Z (00:06:50 in VBoxSVC.log.4), and since then at least 1 of 4 VMs were using the NatNetwork at any time, up until 2021-01-16T17:14:35Z (68:49:50 in VBoxSVC.log.2). During the last minute, an additional VBoxNetDHCP service was started, so it looks like there's something wrong there.

This whole workflow allows many different conclusions to investigate further, too many for my taste (and time). Can you reproduce the problem with an easier workflow? If you can, I'd be interested in the log files named VBoxSVC.log* and NatNetwork*.
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

Thank you for your time and I'm sorry those logs were not helpful. Maybe these logs will be a little better... Not sure where to find the NatNetwork log files.

I restarted my server to start fresh. I removed the ECLabNet Nat Network and created LabNat01. Assigned it to all 4 of the VM guest that are supposed to be on this Nat Network.

After I assigned it I changed a few settings. (see below) Took me a while to figure out I needed to restart the network for opt 15 to take effect. Everything was working perfectly last night. I came to check on things today and 10.0.2.1 & 10.0.2.2 are missing from the arp scan. So after several hours something happens where those two IPs drop out for some reason.

Code: Select all

vboxmanage dhcpserver modify --network "LabNat01" --upper-ip=10.0.2.199
vboxmanage dhcpserver modify --network "LabNat01" --lower-ip=10.0.2.100
vboxmanage dhcpserver modify --network "LabNat01" --set-opt=15 "ECHLab.local"
vboxmanage dhcpserver restart --network "LabNat01"
vboxmanage list natnets
vboxsvc.zip
(5.42 KiB) Downloaded 106 times
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

dheko wrote:I'm sorry those logs were not helpful.
I wouldn't say that. ;)

They already indicated a problem with the VBoxNetDHCP service, rather than with the VBoxNetNAT service, which both work together to provide the NatNetwork. But I don't know when those 2 VirtualBox services are supposed to be running (e.g. from host boot to shutdown, from first use to last use, or from first use to shutdown), and I don't know if the order in which your 4 VMs were started and stopped multiple times (or did they rather sleep?) played a role.
dheko wrote:Maybe these logs will be a little better...
I'll take a look and let you know ...
dheko wrote:Not sure where to find the NatNetwork log files.
Right next to the VBoxSVC.log files.
dheko wrote:Everything was working perfectly last night.
Can you provide me with some wall clock times (either in UTC or in your time zone)? ;)
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

I just tried an experiment:

On a Linux host, I started VM 1, started VM 2, stopped VM1 and stopped VM2 (in this interleaving order). The VBoxNetDHCP service and the VBoxNetNAT service were both started together with VM1, and stopped together with VM2, so they are usually running from first use to last use. The standard DHCP lease time seems to be 600 seconds, BTW. I'll let VM 1 run alone over night, to see what happens in the course of several hours ...

In your setup, the VBoxNetDHCP service doesn't behave like in my simple setup, at least sometimes. Did you restart the host before 2021-01-18T00:39Z, or do we have to consider processes and/or services that were already running at the beginning of your experiment?
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

So... I started fresh: deleted logs files, cleared out the Nat network from the VB gui, etc. Rebooted server so I could start clean and have a good idea of times when I did things.

I assigned the new Nat network to the VMs, started them up, made no changes and looked at the logs. Everything was good!

I did an ARP scan from one of the VMs on the nat network and everything checked out... all VMs, dhcp, host loopback and gateway were there.

Periodically checked that arp scan and everything was good. I waited overnight to see if the two IP's would drop and to my surprise when I checked it the next day everything was still good. Host loopback 10.0.2.2 and 10.0.2.1 (QEMU) entries were still there.

I proceeded to make a few changes to the DHCP server to better fit my network needs and thats when I noticed a few things:

1. When I issued vboxmanage list dhcpservers I noticed that all the dhcp servers from my previous Nat networks were still listed. This was odd and perhaps part of the problem, but everything was working.
2. After removing all but the dhcp server that I needed (and matched my latest network name), I proceeded to change some settings: opt 6, opt 15, lower and upper ips. but no matter what I did, I could not get the opts to save. I tried using the --global option; I tried to manually save the information in the config XML file; I tried doing things in various orders thinking maybe the way I started services or restarted them would cause the settings to be written to the config file. I could even see where the dhcp virtualbox service would startup using the config file, but it would not take my settings instead overwrite them every time. The only settings that would stick were the lower and upper ips. It's important to note that listdhcpservers would show the settings, even the ones after I set the options and everything would run fine, but if I turned off all my virtual machines and restarted them or manually restarted the natnetwork, the opt settings would revert back to default.

At this point, I decided that NatNetwork has taken up enough of my time and switched to "Internal Network" on my VM's with a new OPNSense VM running two virtual adapters, one in bridge mode (to my home network) and the other on Internal Network matching the other VMs in my virtual lab. This new VM takes care of my virtual lab's Nat/Routing/Dhcp on the "internal network" side, allowing those machines out on my home network and eventually the internet, etc.

I think the NatNetwork in Virtualbox is perhaps buggy at this point.

With all that said, I hope this helps others coming here for the same reasos, and thank you fth0 for taking some of your time to help.
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

Thank you for your detailed description. Using OPNsense is a good choice IMHO. ;)
dheko wrote:I tried to manually save the information in the config XML file; I tried doing things in various orders thinking maybe the way I started services or restarted them would cause the settings to be written to the config file.
I'll just comment on this part (taken a bit out of context, I know ;)):

The main background service VBoxSVC is automatically started together with the first VirtualBox frontend (e.g. VirtualBox Manager, VBoxManage) or the first VM, and it is automatically stopped a few seconds after all VirtualBox frontends and VMs have been stopped. When VBoxSVC is started, it reads the global and the VM specific configuration files, and "owns" them as long as it is running. Manually editing those files while VBoxSVC is running is mostly ignored and in any case overwritten at least in the end. It's easy to shoot yourself in the foot without this knowledge. ;)
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

Thank you! Ok... so I would need to stop VBoxSVC, make the dhcp config file changes in the Options section and then start everything?

If this is the case, I would propose (and be willing to submit it where appropriate) that a new method be added to the dhcpserver command. Something like:

Code: Select all

vboxmanage dhcpserver **save** 
or perhaps add it as part of the restart method.
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

dheko wrote:I would need to stop VBoxSVC, make the dhcp config file changes in the Options section and then start everything?
You have perhaps misunderstood part of what I wrote (or vice versa ;)): You can safely use the VBoxManage command while VBoxSVC is running, since it is one of the VirtualBox frontends, so there should not be any need for a save command. Only manual editing can be without effect, and is therefore discouraged by a big fat warning inside configuration files. Never start or stop VBoxSVC yourself, just stop all VMs and VirtualBox frontends (VBoxManage is stopped automatically after the command has been executed).
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

hmm...i think there might be some miscommunication happening here (as you said on my end or both) - try making these changes (or similar for your setup). what's happening now, at least after my clean start in an effort to get cleaner logs, is that as long as I don't shutdown all the vm machines that use LabNat01 everything stays as configured, but if I restart the natnetwork (or shutdown all vms and restart them) I loose the opt settings. In other words, I can't get them to persist. Nothing I tried works... I always have to re-configure the opt settings 15 and 6... that's what I was referring to in my previous post. I was hoping to manually change the dhcp config file, but as you said, that file seems to always revert (in the Options) section that is. The upper and lower ip settings are written and persist opt 6 and 15 don't. (This may warrant a new thread...)

Code: Select all

vboxmanage dhcpserver modify --network "LabNat01" --upper-ip=10.0.2.199
vboxmanage dhcpserver modify --network "LabNat01" --lower-ip=10.0.2.100

vboxmanage dhcpserver modify --network "LabNat01" --set-opt=15 "ECHLab.local" <----
vboxmanage dhcpserver modify --network "LabNat01" --set-otp=6 10.0.2.200 <----
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Nat Network stops working after a couple days

Post by fth0 »

In VirtualBox 6.1.18, the VBoxNetDHCP service has been modified. In my own test setup, both the VBoxNetDHCP service and the VBoxNetNAT service now stay alive when the host is suspended and resumed, and also keep running when all VirtualBox frontends and VMs are stopped. If you haven't given up yet, you should definitely try VirtualBox 6.1.18. ;)
dheko
Posts: 7
Joined: 17. Jan 2021, 22:22

Re: Nat Network stops working after a couple days

Post by dheko »

Will give that a try. Thanks again for all your time and help!
CoYoTeNq
Posts: 1
Joined: 26. Mar 2022, 23:47

Re: Nat Network stops working after a couple days

Post by CoYoTeNq »

Hi PPL!

In one of my recent setups, i hit this issue/bug. And, as far i can see, there is no solution to it. Linux host, 20 windows guests nated.

I found a workaround after a couple of weeks of testing and trying to fix this.

Short history:

Code: Select all

VBoxManage bandwidthctl "machine_name" add Limit --type network --limit 20m
Long History:

In my setup, 20 VMS shares NAT network. Time to time (can be minutes or hours) NAT service goes down, without any traceable reason. Logs shows nothing, and the workaround of stoping and restarting NAT service, was the only way to recover connectivity in the virtual enviroment without restarting the whole host.

I dont need to limit connectivity in my guests. But, as a measure trying to mitigate this issue, setting the mentioned limit (but if you notice, there is not applied to any interface, so, is not really limiting anything) makes network failures dissapear.
A simple way to reproduce this issue, is to do a speed test (ie using speedtest site) in any of the VMS. The ones with this limit setted, works flawlessly. The ones without this limit setted up, ends generating the issue (the whole NAT stop responding)


I hope this info helps to really track and fix this issue.

Greetings,
Alejandro
Post Reply