Network Performance Problems

Discussions related to using VirtualBox on Solaris hosts.
Post Reply
nxnlvz
Posts: 28
Joined: 16. Dec 2008, 07:45
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)

Network Performance Problems

Post by nxnlvz »

This seems to be a long standing issue with VBox on Solaris now so maybe someone can help me work through this.

I am going to start out in general terms. Having any guest on the host in bridged mode seems to clobber network performance system wide for the host. In the latest beta (4.2.0) it even seems to affect iSCSI targets attached to the guests to the point were it causes errors inside the guest environment. I am not sure if the performance effects span multiple interfaces at this point but there is evidence that it might. A simplistic view of this is that the iSCSI target attached that was using the loopback interface on the host.

Now interesting enough I have looked at some of the other people seeing performance issues with NFS and so on for the host if a VBox guest is on the system. I have even tried a few of the suggestions concerning VNICs. In the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level. It shows up in dladm just as if I have created it myself. I am not ready to dedicate a physical NIC to VBox just yet either. I don't see the problem going away but just being avoided for the rest of the services that are not on the same NIC. This is still a problem as the Guests with still suffer network performance issues as is evident in my own testing.

Now out of shear dumb luck, while trying to fix an iSCSI volume because of timeout issues, I might have stumbled across something. The particular guest I was working with is Debian 5 Linux. It was the only guest active at the time. I started it in single user mode to do the disk repairs. This of course does not initialize any of the networking. I not sure it even tries to enumerate the device. But what I noticed is that the iSCSI timeouts were gone. I ran a couple of quick network tests as well and things seemed to be somewhat normal. The zpool stats were also somewhat normal at 110MB/s.

I am not really sure where to go from here with this and I am on the verge of looking at the source and seeing where the issue rests. Maybe someone could give me a few ideas on how better to profile VBox with DTrace and and understand what it is doing at a lower level before I go mucking about.

One thing I can say is that this problem only presents itself on Solaris. My Linux host operates with much better network throughput. That makes me what to switch this last server over but I would miss all the ZFS tools.

The floor is open.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: Network Performance Problems

Post by martyscholes »

Our Solaris 11 server runs multiple instances of VirtualBox all the time with no networking issues, but NONE of them bridge against the main adapter. Current dladm show-link follows.

Code: Select all

bash-4.1$ dladm show-link
LINK                CLASS     MTU    STATE    OVER
net0                phys      1500   up       --
net1                phys      1500   up       --
aggr0               aggr      1500   up       net0 net1
global0             vnic      1500   up       aggr0
bcus0               vnic      1500   up       aggr0
quicken0            vnic      1500   up       aggr0
jackson0            vnic      1500   up       aggr0
quinn0              vnic      1500   up       aggr0
vboxnet0            phys      1500   up       --
drake_pear0         vnic      1500   up       aggr0
w7media0            vnic      1500   up       aggr0
xpmedia0            vnic      1500   up       aggr0
quinn1              vnic      1500   up       aggr0
quinn2              vnic      1500   up       aggr0
quinn3              vnic      1500   up       aggr0
quinn4              vnic      1500   up       aggr0
quinn5              vnic      1500   up       aggr0
quinn6              vnic      1500   up       aggr0
quinn7              vnic      1500   up       aggr0
drupal0             vnic      1500   up       aggr0
joomla0             vnic      1500   up       aggr0
ubuntu0             vnic      1500   up       aggr0
squid/net0          vnic      1500   up       aggr0
otrs/net0           vnic      1500   up       aggr0
quinn/net0          vnic      1500   up       aggr0
I am not sure about a couple of things you mention.
I have even tried a few of the suggestions concerning VNICs
Emphatic YES! The dladm command can stamp out VNICs almost for free, so use and abuse them.
In the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level.
I am not sure I agree. The strange vboxnet0 device is not a dladm VNIC, but a fake NIC from a kernel driver which (I think) VB uses for host-only networking and all of the other strange networking modes. My understanding is that these weird modes exist to accommodate other operating systems which do not have the flexibility and power of the Solaris Crossbow network stack. I cannot imagine a scenario where under Solaris that fake NIC is needed. As far as I can tell, all of that functionality (and more) can be had via dladm with better performance.

In fact, if you do not run aggregates (which I do run), then I understand you can create a "template" VNIC which allows VirtualBox to create a dladm VNIC on the fly when a guest is started. This allows you to use VNICs everywhere without having to track which VNIC is used with what VirtualBox image.

Do some benchmarking with a dladm VNIC and let us know if you see the same issues you currently experience.

Let us know what you learn.

Thanks,
Marty
michaln
Oracle Corporation
Posts: 2973
Joined: 19. Dec 2007, 15:45
Primary OS: MS Windows 7
VBox Version: PUEL
Guest OSses: Any and all
Contact:

Re: Network Performance Problems

Post by michaln »

martyscholes wrote:My understanding is that these weird modes exist to accommodate other operating systems which do not have the flexibility and power of the Solaris Crossbow network stack.
Which includes Solaris 10.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: Network Performance Problems

Post by martyscholes »

michaln wrote:
martyscholes wrote:My understanding is that these weird modes exist to accommodate other operating systems which do not have the flexibility and power of the Solaris Crossbow network stack.
Which includes Solaris 10.
Well said. How soon we forget.
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: Network Performance Problems

Post by Ramshankar »

martyscholes wrote:
In the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level.
I am not sure I agree. The strange vboxnet0 device is not a dladm VNIC, but a fake NIC from a kernel driver which (I think) VB uses for host-only networking and all of the other strange networking modes
VirtualBox -does- indeed create a VNIC in kernel on-the-fly if you use Solaris 11. Run "modinfo | grep vbox" and if it includes "vboxbow" then it is using VirtualBox's Crossbow-based bridged networking driver. If it shows "vboxflt" then it is using the STREAMS driver.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: Network Performance Problems

Post by martyscholes »

Ramshankar wrote:VirtualBox -does- indeed create a VNIC in kernel on-the-fly if you use Solaris 11. Run "modinfo | grep vbox" and if it includes "vboxbow" then it is using VirtualBox's Crossbow-based bridged networking driver. If it shows "vboxflt" then it is using the STREAMS driver.
You know this stuff far better than I do. You have me curious and I found this http://comments.gmane.org/gmane.comp.em ... devel/4737 where you said, "Simply bridging to a physical link is sufficient (the vnic creation, MAC address updates etc. are all done internally)."

I must now confess ignorance because I thought the whole VNIC-on-the-fly thing was handled by the templates, but I now see that it is done whenever something bridges to the main adapter. I still cannot do it because I have an aggregate link.

So that I understand this all better, is this stuff documented somewhere? All I see in the VB documentation is what is listed in chapter 9.
Starting with VirtualBox 4.1, VirtualBox ships a new network filter driver that utilizes Solaris 11's Crossbow functionality. By default, this new driver is installed for Solaris 11 hosts (builds 159 and above) that has support for it.

To force installation of the older STREAMS based network filter driver, execute as root execute the below command before installing the VirtualBox package:

touch /etc/vboxinst_vboxflt

To force installation of the Crossbow based network filter driver, execute as root the below command before installing the VirtualBox package:

touch /etc/vboxinst_vboxbow

To check which driver is currently being used by VirtualBox, execute:

modinfo | grep vbox

If the output contains "vboxbow", it indicates VirtualBox is using the Crossbow network filter driver, while the name "vboxflt" indicates usage of the older STREAMS network filter.
The documentation talks about Crossbow, but not the advantages of Crossbow. Where does it say that using Crossbow allows automagic VNIC creation?

Thanks,
Marty
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: Network Performance Problems

Post by Ramshankar »

martyscholes wrote:
Ramshankar wrote:VirtualBox -does- indeed create a VNIC in kernel on-the-fly if you use Solaris 11. Run "modinfo | grep vbox" and if it includes "vboxbow" then it is using VirtualBox's Crossbow-based bridged networking driver. If it shows "vboxflt" then it is using the STREAMS driver.
You know this stuff far better than I do. You have me curious and I found this http://comments.gmane.org/gmane.comp.em ... devel/4737 where you said, "Simply bridging to a physical link is sufficient (the vnic creation, MAC address updates etc. are all done internally)."
Yes that is correct. If you use vboxbow and if you assign a physical link to a VM (say "rge0" or "nge0") the vboxbow kernel driver will create a VNIC (from kernel land obv.) and use it to bridge to the physical link.
I must now confess ignorance because I thought the whole VNIC-on-the-fly thing was handled by the templates, but I now see that it is done whenever something bridges to the main adapter. I still cannot do it because I have an aggregate link.
See User Manual section 14.2 Known Issue and under Solaris hosts:
"Crossbow-based bridged networking on Solaris 11 hosts does not work directly with aggregate links. However, you can manually create a VNIC (using dladm) over the aggregate link and use that with a VM. This technical limitation will be addressed in a future Solaris 11 release." This is actually a bug in Solaris 11.
So that I understand this all better, is this stuff documented somewhere? All I see in the VB documentation is what is listed in chapter 9.
Starting with VirtualBox 4.1, VirtualBox ships a new network filter driver that utilizes Solaris 11's Crossbow functionality. By default, this new driver is installed for Solaris 11 hosts (builds 159 and above) that has support for it.

To force installation of the older STREAMS based network filter driver, execute as root execute the below command before installing the VirtualBox package:

touch /etc/vboxinst_vboxflt

To force installation of the Crossbow based network filter driver, execute as root the below command before installing the VirtualBox package:

touch /etc/vboxinst_vboxbow

To check which driver is currently being used by VirtualBox, execute:

modinfo | grep vbox

If the output contains "vboxbow", it indicates VirtualBox is using the Crossbow network filter driver, while the name "vboxflt" indicates usage of the older STREAMS network filter.
The documentation talks about Crossbow, but not the advantages of Crossbow. Where does it say that using Crossbow allows automagic VNIC creation?
The real advantages of Crossbow are off-topic in the VirtualBox manage, it belongs in the Solaris user manual. As for automagic VNIC creation it is supposed to be a transparent and seamless to a user. Users can either throw physical links or VNICs (or aggregates once the bug is fixed in S11) at VBox and it will "just" work, which is why the behind-the-scenes internals of VNIC creations in kernel-land etc. are not documented.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: Network Performance Problems

Post by martyscholes »

So the question then is whether or not nxnlvz is using Crossbow in his virtual machines.
nxnlvz
Posts: 28
Joined: 16. Dec 2008, 07:45
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)

Re: Network Performance Problems

Post by nxnlvz »

To answer the question... yes I am using vboxbow.

Code: Select all

nxn@nebula:/etc$ dladm show-vnic
LINK                OVER         SPEED  MACADDRESS        MACADDRTYPE       VID
vboxvnic5           rge0         1000   8:0:27:b7:e6:99   fixed             0


nxn@nebula:/etc$ modinfo | grep vbox
231 fffffffff8bdc000   3828 183   1  vboxbow (VirtualBox NetBow 4.2.0_BETA1r7)
232 fffffffff8be0000  34ce0 293   1  vboxdrv (VirtualBox HostDrv 4.2.0_BETA1r)
235 fffffffff8235578    cf0 294   1  vboxnet (VirtualBox NetAdp 4.2.0_BETA1r7)
237 fffffffff8c27000   4568 296   1  vboxusbmon (VirtualBox USBMon 4.2.0_BETA1r7
238 fffffffff8c2c000   7480 297   1  vboxusb (VirtualBox USB 4.2.0_BETA1r7975)
I also tried using templates to check out if make there was something else I could tweak. The one thing you will notice here is the Realteck rge interface. It is not the best and my major annoyance with it is no jumbo frames. It is performant on its own however which ultimately means that it should be okay.

There is another quick thing to report. Using and VNIC with a prefix of "vboxvnic" not just "vboxvnic_template" will cause a problem. First off it will not show up in the Management as a select-able interface. Eg. I created "vboxvnic1"
dladm create-vnic -l rge0 vboxvnic1
And then went to look for it to assign it. It was not present in the drop down list. Undeterred, I just used the CLI to do it. I don't remember the errors that came up because of this, and I am not going to test again, but I had to remove it. I would not call this a bug. Just something to be advised about since a sysadmin might instinctively use the same naming scheme.


In the end I am left with no way to really figure out where the problem is centered. That is what is really getting to me.
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: Network Performance Problems

Post by Ramshankar »

nxnlvz wrote:To answer the question... yes I am using vboxbow.

Code: Select all

nxn@nebula:/etc$ dladm show-vnic
LINK                OVER         SPEED  MACADDRESS        MACADDRTYPE       VID
vboxvnic5           rge0         1000   8:0:27:b7:e6:99   fixed             0


nxn@nebula:/etc$ modinfo | grep vbox
231 fffffffff8bdc000   3828 183   1  vboxbow (VirtualBox NetBow 4.2.0_BETA1r7)
232 fffffffff8be0000  34ce0 293   1  vboxdrv (VirtualBox HostDrv 4.2.0_BETA1r)
235 fffffffff8235578    cf0 294   1  vboxnet (VirtualBox NetAdp 4.2.0_BETA1r7)
237 fffffffff8c27000   4568 296   1  vboxusbmon (VirtualBox USBMon 4.2.0_BETA1r7
238 fffffffff8c2c000   7480 297   1  vboxusb (VirtualBox USB 4.2.0_BETA1r7975)
I also tried using templates to check out if make there was something else I could tweak. The one thing you will notice here is the Realteck rge interface. It is not the best and my major annoyance with it is no jumbo frames. It is performant on its own however which ultimately means that it should be okay.

There is another quick thing to report. Using and VNIC with a prefix of "vboxvnic" not just "vboxvnic_template" will cause a problem. First off it will not show up in the Management as a select-able interface. Eg. I created "vboxvnic1"
dladm create-vnic -l rge0 vboxvnic1
And then went to look for it to assign it. It was not present in the drop down list. Undeterred, I just used the CLI to do it. I don't remember the errors that came up because of this, and I am not going to test again, but I had to remove it. I would not call this a bug. Just something to be advised about since a sysadmin might instinctively use the same naming scheme.


In the end I am left with no way to really figure out where the problem is centered. That is what is really getting to me.
You are not supposed to create vboxvnic on your own. It is done automatically by VBox when you give it "rge0". They are dynamically created and destroyed VNICs and thus will not show up in the GUI as available NICs.

Regarding your performance issues, here is what I understood:
You have a Solaris 11 host with1 physical NIC and a VM using bridged networking with iSCSI, you get iSCSI timeout errors because of bridged networking "taking over" the NIC? Is this correct? Have you tried limiting the bandwidth resources (when you used the VNIC template)?
Oracle Corp.
nxnlvz
Posts: 28
Joined: 16. Dec 2008, 07:45
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)

Re: Network Performance Problems

Post by nxnlvz »

Not yet on limiting the bandwidth but that is a good idea.

I am going to test some other things also. On a lark I changed out the network switch from a unmanaged Cisco Gb switch to a fully managed Dlink Gb switch so that I could look at the traffic on the wire. Something changed when I did that and produced a different result. Not sure how it affected it yet. In theory this should not have mattered or if it did it should have slowed throughput a bit.

One interesting thing is that happened was a test transfer over NFS, without a guest active, I was able to get full wire speed. That surprised me since I haven't been able to that with this Realtek NIC before. There was still a difference when a guest was active but I don't have a quantitative result. At this point it stirs thoughts of either a conflict with implementation of STP or 802.3p or 802.1q or something else. I haven't run across something like that for some time now.

Two things I will immediately now be looking for:
1. Any packet errors on the wire. None were present on the interface stats before the change.
2. Any packets at all for the iSCSI stuff since it is pointed to loopback address.

At least some progress has been made.
nxnlvz
Posts: 28
Joined: 16. Dec 2008, 07:45
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)

Re: Network Performance Problems

Post by nxnlvz »

So a little update.

Whatever is causing my performance is _is_ some strange correlation with a non-manged Cisco gigabit switch, the Realtek NIC, Solaris 11 and VirtualBox. Take out any one of those components, especially that network switch, and all is mostly normal. Of course testing the strange behavior with that switch inline is time consuming and difficult. In the end it is not worth it to hold on to a relatively cheap piece of hardware. I suspect it has something to do with the ARP table in the switch or the host but have been unable to verify.

One think I am very happy about is that I have not seen the strange iSCSI errors with the change. I am also pleased to see that the host is sustaining wire speed IO with multiple NFS and iSCSI clients. Solaris + ZFS (stripped+mirrored) + napp-it really makes for a snappy little NAS.
Post Reply