[Solved] Bridged networking is broken, and NAT is slow, so ... any ideas?

This is for discussing general topics about how to use VirtualBox.
Post Reply
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

[Solved] Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

Bug #14374 makes bridged networking unusable for my workflow. When bridged DOES work, i.e. prior to suspending the host, it performs reasonably well: not great, but about 60mb/s for writes to a NAS that the host gets ~90mb/s to (100mbit ethernet). NAT works reliably, but is significantly slower at less than 40mb/s on average.

In the absence of a third option or a fix for 14374, are there any settings for NAT that can be tweaked to improve the performance? (It's already using the virtio driver). For example, has anyone experimented with the NAT buffer sizes (section 9.11.3 of the manual) to any noticeable effect?
Last edited by arQon on 1. Apr 2018, 02:32, edited 1 time in total.
ChipMcK
Volunteer
Posts: 1095
Joined: 20. May 2009, 02:17
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Windows, OSX
Location: U S of A

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by ChipMcK »

Both of these
  1. Do not allow the guest to sleep
  2. Do not allow the host to sleep
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

heh. :)
(BTW, guests CAN'T sleep, so that's not a problem).

To be more specific: what I'm after is improving the performance for large-ish data files, generally between a few hundred MB and 1GB. If tuning for that means smaller files are slower, that's not really a concern to me.
ChipMcK
Volunteer
Posts: 1095
Joined: 20. May 2009, 02:17
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Windows, OSX
Location: U S of A

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by ChipMcK »

BTW, guests CAN sleep.

If you believe otherwise, then you will suffer
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

We obviously have a different understanding of what "sleep" means. The bug with bridged networking, which is what I'm referencing, is the result of the "real" computer, i.e. the host, going into S3. This is not something a guest can trigger, for obvious reasons. Guests may hibernate or somesuch, but that's a different fish and irrelevant to this problem.
BillG
Volunteer
Posts: 5102
Joined: 19. Sep 2009, 04:44
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows 10,7 and earlier
Location: Sydney, Australia

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by BillG »

It may be irrelevant to your problem (why did you mention it in that case?), but your statement that a guest cannot sleep was simply wrong and needed to be corrected. The OS in the guest is exactly the same as it would be in a physical machine and has the same power options.
Bill
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

*I* *didn't* mention it. While it's nice to know (despite making no sense in the abstract and also not being the case for my own VMs, where pm-suspend does nothing), thank you both, it still has nothing to do with the bridged networking bug, as the ticket says and as I've detailed in here twice already.

in an attempt to wrestle this thread back on track...
"vboxmanage modifyvm <vm> --natsettings1 1500,256,256,256,256" seems to improve the PEAK throughput to nearly 60mb/s, but makes no meaningful difference to the average rate: that does creep up to about 44mb/s, but frequently has troughs below 20mb/s, so it looks like it's basically just buffering more.
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

Similar results with various other natsettings values over the past couple of days, including buffer-only and window-only changes. essentially, you can change the height of the sine wave but it's pretty consistently centered in the same place, give or take 10%.

Worth experimenting with it if that 10% is meaningful to you. I've abandoned it in favor of just dumping the data to a shared folder and copying the files over from the host instead.
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

I didn't go into the details before, but the NAS is being mounted as a Windows share (ie SMB). After poking around some more, it looks like part of the problem is interaction between the NAT layer and SMB: that is, the settings/behavior of the two of them happen to be working out to produce especially-poor results.

To drift into somewhat-guest-specific aspects for a moment: a linux host (Ubuntu 16.04) with the same configuration as the problematic guest still easily sustains line rate on transfers to the NAS. looking just at the guest though, the default settings for CIFS are "rsize=61440,wsize=65536", and from what I can see these are a non-trivial factor in the poor performance with NAT. whenever you have a pipeline, it's important to keep the pipe as full as possible, but it looks like different sets of buffers (guest, vNIC, NAT) are all draining at different times, resulting in the sine-wave for transfer rates mentioned earlier.

Forcing the guest to use SMB3 instead changes the CIFS buffers to "rsize=1048576,wsize=1048576", and things get pretty interesting then: once it's "warmed up", the transfer spends a lot of time at 60mb/s - on par with using Bridged - and even creeps into the 70mb/s range at times, surpassing it. Towards the end of the transfer (but much much less than just a few MB away) the rate falls off a cliff again to less than 40 mb/s. It's hard to guess at "why" - it could be determined by the buffer sizes of either the host or the host NIC, but I don't really have any solid ideas. Regardless, the meaningful amount of time spent at the high transfer rate result in an average throughput speed of ~58mb/s. So, the VB networking layer is obviously much more performant if fed data in large chunks - even if the natsettings are at their default small 64K buffers.

I expect the performance of Bridged mode would also increase significantly in this scenario, since it makes no sense for it to ever be slower than NAT, but that'll have to wait til the next time it's convenient to restart the guest.
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

Well I'll be damned...

When I tried that on the "sister" VM, I saw the transfer rate peak at 96mb/s! Very desirable, but rather puzzling...
Poking around the .vbox files for the two, I discovered that the second VM still had an explicit
<NAT mtu="1500" sockrcv="256" socksnd="256" tcprcv="1024" tcpsnd="1024"/>
in its config file. I had reset it with "modifyvm --natsettings1 0,0,0,0,0" several days ago, but that clearly doesn't actually work.

So, natsettings was obviously the right way to go after all, because the combination of that and the samba change got the *average* transfer speed for a 1GB file to 10.8MB/s. 86.4mb/s is certainly close enough to line rate for me to not care about the missing few percent, and more than double what I was getting when I started this effort. :)

(Now I just have to figure out why the original VM is topping out at 7MB/s still rather than 10+, but that's probably a difference in the guest's configuration rather than a VB issue).
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: [Solved] Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

hrm - it IS a VB issue after all.

I couldn't find any relevant differences in the client configurations, so I went back to poking at the .vbox files. The "bad" VM was
<VirtualBox xmlns="http://www.virtualbox.org/" version="1.15-windows">
and the good one was
<VirtualBox xmlns="http://www.virtualbox.org/" version="1.16-windows">
and one difference that stood out between the two was
<X2APIC enabled="true"/>

Both machines have multiple cores assigned to them and thus have "Enable I/O APIC" checked, but only the more-recently-created one had that line. Rather than hack the config files by hand, I removed the "bad" VM and recreated it, reconfigured the natsettings, and started it up. The behavior then matched the "good" VM, running at 95mb/s until it hit the "draining" stage, when it dropped to 60s, then 50s, then 40s (also like the good VM), but not sinking all the way into the 20s like it had before the regeneration, and finishing up with an average of 9.6MB/s. A second test hit 11.0MB/s, and you can't ask for better than that. :)

So that's pretty interesting, and has some clear implications for "old" VMs: it looks like the migration from one version of VB to the next may opt to not fully upgrade the config files. That's not an unreasonable choice for it to make, but as a result two "identical" VMs may in fact have significantly-different behavior depending solely on when they were created.
arQon
Posts: 228
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Ubuntu 16.04 x64, W7

Re: [Solved] Bridged networking is broken, and NAT is slow, so ... any ideas?

Post by arQon »

After updating to 5.2.14 a few weeks ago, network throughput in those VMs suddenly fell off a cliff. After some experimentation, it turns out that the virtio driver has suffered a massive performance regression (> 50%) somewhere between 5.2.8 and 5.2.14, but only in a specific situation: when VCPU count == PCPU count. (Cue the chorus of "We keep saying not to do that!" :P).

Leave one physical CPU "spare" per the community's standard advice, and you avoid the problem. But since the problem is clearly in the virtio driver rather than some abstract space, if you happen to NEED as many cores as possible in the VM, changing the adapter to the Intel 1000 MT Desktop avoids the bug and returns the network performance to what the virtio driver had in 5.2.8, ie sustained line rate rather than less than half that. (PC-NET III probably also works fine, but Intel is the UG-recommended card for when virtio doesn't work properly so I went with that).

Interesting side note, the "new" adapter actually shows line speeds of up to 150% of what they really are, which must be a side-effect of the guest data moving to buffers on the host. The virtio driver doesn't exhibit this, and caps at 99+%.

unrelated but on topic, tcprcv and tcpsnd values of 256 perform within 3% of the 1024 setting with this configuration. I haven't tested smaller values than that, but I probably will next time I have a few hours to spare.
Post Reply