Page 1 of 1

Low (bridged) network throughput on Solaris 11.4 hosts with 10 Gbit/s NICs

Posted: 24. Aug 2021, 16:27
by Steffen M.
Hi all,

we run a few Solaris 11.4 servers (fully-patched to SRU 35), each of them contains two Intel Xeon CPUs E5-2630 v3 @ 2.40 GHz (i.e. 16 cores/32 threads on two sockets), 256 GB RAM, and plenty of SSD storage. The servers are equipped with Intel 82599ES 10-Gigabit Ethernet NICs. Besides some Solaris zones, these servers run VirtualBox (6.1.26) VMs containing instances of Ubuntu 18.04, Ubuntu 20.04, or Windows Server 2016. All of our VMs are connected using bridge networking to the 10 Gbit/s NIC of their respective host.

While the Solaris hosts reach the full 10 Gbit/s throughputs both inbound and outbound, we do see a very poor network performance when it comes to the VirtualBox VMs. Both, our Windows and Linux VMs are affected, but for our Linux VMs, it's a much bigger problem as they ought to run high-bandwidth and, some of them, also low-latency applications (e.g. Big Blue Button). The basic problem is observable with all hosts: The VMs reach only about 1.2 Gbit/s (inbound) and about 2 Gbit/s (outbound). Interestingly, it is not really better when the VMs exchange data only with the hosts they run on. Enabling jumbo frames (by choosing an MTU of 9000 instead of 1500 for the 10 Gbit/s interface – which leads to a jumbo frame-enabled "vboxvnicX" interface) roughly doubles the throughput, but I am not sure about the consequences, yet.

When generating network traffic in one of the VMs, we do see a lot of "ksoftirqd" activity on the Linux VMs, but we do not see a process in the VM reaching 100% of a core's capacity. The VMs themselves do not feel sluggish (even when generating a lot of network traffic), but the virtual network link between guest and host seems really to be congested, a parallel-running "ping" of various intensities clearly shows how the latencies rise. When doing large downloads from test servers, I can see TCP-induced breakdowns of data rate, sometimes it gets down to about 500 Mbit/s. Most probably a high delay jitter (or even packet-loss?) might have triggered it, we are currently analyzing PCAPs. Interestingly, this occurs, when host and VM exchange data, so no real NIC, driver nor switch are involved at all. We make similar observations when using "iperf3" and do both, TCP and UDP measurements.

We use "virtio" as the virtual bridged NIC between Solaris and Linux, but the same problem occurs when using the virtual Intel PRO/1000 MT Desktop (or Server) NICs.

Disabling some of the Meltdown and Spectre mitigations in Solaris using "sxadm" improved the VMs' general performance a lot, but we miss any positive influence on the network throughput. It also does not seem to matter how many cores I allocate. I went down from 12 to 2 cores per VM, but did not see much of an improvement.

Does anyone have an idea why the network throughputs of our VMs on a Solaris 11.4 host are so low and so far away from the real interface's throughput? Does anyone run VMs on Solaris (or Solaris-derivates) and has 10 Gbit/s (or faster) NICs in his/her server? Is there anything I can/should do regarding various "offloading" (segmentation, checksumming, and so on) parameters on host or guest? Especially what I see from the jumbo frame behavior might lead into the direction that segmentation causes too much load…

I am really looking forward to reading any helpful hints! Thank you very much in advance!

Kind regards,
Steffen

Re: Low (bridged) network throughput on Solaris 11.4 hosts with 10 Gbit/s NICs

Posted: 28. Aug 2021, 19:50
by stes
As I wrote in a thread in the Solaris guest forum,

https://forums.virtualbox.org/viewtopi ... 20&t=95496

it could be useful to be able to present a 10G NIC to a Linux VM (as a choice different than virtio-net or Intel Pro 1000)

In the solaris guest forum I raised the question for 10G NIC for guests but I seem to recall some VirtualBox developer said that would be a lot of work and I'm not sure there is a RFE for it.

David Stes