Network Performance Problems
-
- Posts: 28
- Joined: 16. Dec 2008, 07:45
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)
Network Performance Problems
This seems to be a long standing issue with VBox on Solaris now so maybe someone can help me work through this.
I am going to start out in general terms. Having any guest on the host in bridged mode seems to clobber network performance system wide for the host. In the latest beta (4.2.0) it even seems to affect iSCSI targets attached to the guests to the point were it causes errors inside the guest environment. I am not sure if the performance effects span multiple interfaces at this point but there is evidence that it might. A simplistic view of this is that the iSCSI target attached that was using the loopback interface on the host.
Now interesting enough I have looked at some of the other people seeing performance issues with NFS and so on for the host if a VBox guest is on the system. I have even tried a few of the suggestions concerning VNICs. In the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level. It shows up in dladm just as if I have created it myself. I am not ready to dedicate a physical NIC to VBox just yet either. I don't see the problem going away but just being avoided for the rest of the services that are not on the same NIC. This is still a problem as the Guests with still suffer network performance issues as is evident in my own testing.
Now out of shear dumb luck, while trying to fix an iSCSI volume because of timeout issues, I might have stumbled across something. The particular guest I was working with is Debian 5 Linux. It was the only guest active at the time. I started it in single user mode to do the disk repairs. This of course does not initialize any of the networking. I not sure it even tries to enumerate the device. But what I noticed is that the iSCSI timeouts were gone. I ran a couple of quick network tests as well and things seemed to be somewhat normal. The zpool stats were also somewhat normal at 110MB/s.
I am not really sure where to go from here with this and I am on the verge of looking at the source and seeing where the issue rests. Maybe someone could give me a few ideas on how better to profile VBox with DTrace and and understand what it is doing at a lower level before I go mucking about.
One thing I can say is that this problem only presents itself on Solaris. My Linux host operates with much better network throughput. That makes me what to switch this last server over but I would miss all the ZFS tools.
The floor is open.
I am going to start out in general terms. Having any guest on the host in bridged mode seems to clobber network performance system wide for the host. In the latest beta (4.2.0) it even seems to affect iSCSI targets attached to the guests to the point were it causes errors inside the guest environment. I am not sure if the performance effects span multiple interfaces at this point but there is evidence that it might. A simplistic view of this is that the iSCSI target attached that was using the loopback interface on the host.
Now interesting enough I have looked at some of the other people seeing performance issues with NFS and so on for the host if a VBox guest is on the system. I have even tried a few of the suggestions concerning VNICs. In the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level. It shows up in dladm just as if I have created it myself. I am not ready to dedicate a physical NIC to VBox just yet either. I don't see the problem going away but just being avoided for the rest of the services that are not on the same NIC. This is still a problem as the Guests with still suffer network performance issues as is evident in my own testing.
Now out of shear dumb luck, while trying to fix an iSCSI volume because of timeout issues, I might have stumbled across something. The particular guest I was working with is Debian 5 Linux. It was the only guest active at the time. I started it in single user mode to do the disk repairs. This of course does not initialize any of the networking. I not sure it even tries to enumerate the device. But what I noticed is that the iSCSI timeouts were gone. I ran a couple of quick network tests as well and things seemed to be somewhat normal. The zpool stats were also somewhat normal at 110MB/s.
I am not really sure where to go from here with this and I am on the verge of looking at the source and seeing where the issue rests. Maybe someone could give me a few ideas on how better to profile VBox with DTrace and and understand what it is doing at a lower level before I go mucking about.
One thing I can say is that this problem only presents itself on Solaris. My Linux host operates with much better network throughput. That makes me what to switch this last server over but I would miss all the ZFS tools.
The floor is open.
-
- Posts: 202
- Joined: 11. Sep 2011, 00:24
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances
Re: Network Performance Problems
Our Solaris 11 server runs multiple instances of VirtualBox all the time with no networking issues, but NONE of them bridge against the main adapter. Current dladm show-link follows.
I am not sure about a couple of things you mention.
In fact, if you do not run aggregates (which I do run), then I understand you can create a "template" VNIC which allows VirtualBox to create a dladm VNIC on the fly when a guest is started. This allows you to use VNICs everywhere without having to track which VNIC is used with what VirtualBox image.
Do some benchmarking with a dladm VNIC and let us know if you see the same issues you currently experience.
Let us know what you learn.
Thanks,
Marty
Code: Select all
bash-4.1$ dladm show-link
LINK CLASS MTU STATE OVER
net0 phys 1500 up --
net1 phys 1500 up --
aggr0 aggr 1500 up net0 net1
global0 vnic 1500 up aggr0
bcus0 vnic 1500 up aggr0
quicken0 vnic 1500 up aggr0
jackson0 vnic 1500 up aggr0
quinn0 vnic 1500 up aggr0
vboxnet0 phys 1500 up --
drake_pear0 vnic 1500 up aggr0
w7media0 vnic 1500 up aggr0
xpmedia0 vnic 1500 up aggr0
quinn1 vnic 1500 up aggr0
quinn2 vnic 1500 up aggr0
quinn3 vnic 1500 up aggr0
quinn4 vnic 1500 up aggr0
quinn5 vnic 1500 up aggr0
quinn6 vnic 1500 up aggr0
quinn7 vnic 1500 up aggr0
drupal0 vnic 1500 up aggr0
joomla0 vnic 1500 up aggr0
ubuntu0 vnic 1500 up aggr0
squid/net0 vnic 1500 up aggr0
otrs/net0 vnic 1500 up aggr0
quinn/net0 vnic 1500 up aggr0
Emphatic YES! The dladm command can stamp out VNICs almost for free, so use and abuse them.I have even tried a few of the suggestions concerning VNICs
I am not sure I agree. The strange vboxnet0 device is not a dladm VNIC, but a fake NIC from a kernel driver which (I think) VB uses for host-only networking and all of the other strange networking modes. My understanding is that these weird modes exist to accommodate other operating systems which do not have the flexibility and power of the Solaris Crossbow network stack. I cannot imagine a scenario where under Solaris that fake NIC is needed. As far as I can tell, all of that functionality (and more) can be had via dladm with better performance.In the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level.
In fact, if you do not run aggregates (which I do run), then I understand you can create a "template" VNIC which allows VirtualBox to create a dladm VNIC on the fly when a guest is started. This allows you to use VNICs everywhere without having to track which VNIC is used with what VirtualBox image.
Do some benchmarking with a dladm VNIC and let us know if you see the same issues you currently experience.
Let us know what you learn.
Thanks,
Marty
-
- Oracle Corporation
- Posts: 2973
- Joined: 19. Dec 2007, 15:45
- Primary OS: MS Windows 7
- VBox Version: PUEL
- Guest OSses: Any and all
- Contact:
Re: Network Performance Problems
Which includes Solaris 10.martyscholes wrote:My understanding is that these weird modes exist to accommodate other operating systems which do not have the flexibility and power of the Solaris Crossbow network stack.
-
- Posts: 202
- Joined: 11. Sep 2011, 00:24
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances
Re: Network Performance Problems
Well said. How soon we forget.michaln wrote:Which includes Solaris 10.martyscholes wrote:My understanding is that these weird modes exist to accommodate other operating systems which do not have the flexibility and power of the Solaris Crossbow network stack.
-
- Oracle Corporation
- Posts: 793
- Joined: 7. Jan 2008, 16:17
Re: Network Performance Problems
VirtualBox -does- indeed create a VNIC in kernel on-the-fly if you use Solaris 11. Run "modinfo | grep vbox" and if it includes "vboxbow" then it is using VirtualBox's Crossbow-based bridged networking driver. If it shows "vboxflt" then it is using the STREAMS driver.martyscholes wrote:I am not sure I agree. The strange vboxnet0 device is not a dladm VNIC, but a fake NIC from a kernel driver which (I think) VB uses for host-only networking and all of the other strange networking modesIn the end they don't make sense to me since VBox actually allocates a VNIC at the kernel level.
Oracle Corp.
-
- Posts: 202
- Joined: 11. Sep 2011, 00:24
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances
Re: Network Performance Problems
You know this stuff far better than I do. You have me curious and I found this http://comments.gmane.org/gmane.comp.em ... devel/4737 where you said, "Simply bridging to a physical link is sufficient (the vnic creation, MAC address updates etc. are all done internally)."Ramshankar wrote:VirtualBox -does- indeed create a VNIC in kernel on-the-fly if you use Solaris 11. Run "modinfo | grep vbox" and if it includes "vboxbow" then it is using VirtualBox's Crossbow-based bridged networking driver. If it shows "vboxflt" then it is using the STREAMS driver.
I must now confess ignorance because I thought the whole VNIC-on-the-fly thing was handled by the templates, but I now see that it is done whenever something bridges to the main adapter. I still cannot do it because I have an aggregate link.
So that I understand this all better, is this stuff documented somewhere? All I see in the VB documentation is what is listed in chapter 9.
The documentation talks about Crossbow, but not the advantages of Crossbow. Where does it say that using Crossbow allows automagic VNIC creation?Starting with VirtualBox 4.1, VirtualBox ships a new network filter driver that utilizes Solaris 11's Crossbow functionality. By default, this new driver is installed for Solaris 11 hosts (builds 159 and above) that has support for it.
To force installation of the older STREAMS based network filter driver, execute as root execute the below command before installing the VirtualBox package:
touch /etc/vboxinst_vboxflt
To force installation of the Crossbow based network filter driver, execute as root the below command before installing the VirtualBox package:
touch /etc/vboxinst_vboxbow
To check which driver is currently being used by VirtualBox, execute:
modinfo | grep vbox
If the output contains "vboxbow", it indicates VirtualBox is using the Crossbow network filter driver, while the name "vboxflt" indicates usage of the older STREAMS network filter.
Thanks,
Marty
-
- Oracle Corporation
- Posts: 793
- Joined: 7. Jan 2008, 16:17
Re: Network Performance Problems
Yes that is correct. If you use vboxbow and if you assign a physical link to a VM (say "rge0" or "nge0") the vboxbow kernel driver will create a VNIC (from kernel land obv.) and use it to bridge to the physical link.martyscholes wrote:You know this stuff far better than I do. You have me curious and I found this http://comments.gmane.org/gmane.comp.em ... devel/4737 where you said, "Simply bridging to a physical link is sufficient (the vnic creation, MAC address updates etc. are all done internally)."Ramshankar wrote:VirtualBox -does- indeed create a VNIC in kernel on-the-fly if you use Solaris 11. Run "modinfo | grep vbox" and if it includes "vboxbow" then it is using VirtualBox's Crossbow-based bridged networking driver. If it shows "vboxflt" then it is using the STREAMS driver.
See User Manual section 14.2 Known Issue and under Solaris hosts:I must now confess ignorance because I thought the whole VNIC-on-the-fly thing was handled by the templates, but I now see that it is done whenever something bridges to the main adapter. I still cannot do it because I have an aggregate link.
"Crossbow-based bridged networking on Solaris 11 hosts does not work directly with aggregate links. However, you can manually create a VNIC (using dladm) over the aggregate link and use that with a VM. This technical limitation will be addressed in a future Solaris 11 release." This is actually a bug in Solaris 11.
The real advantages of Crossbow are off-topic in the VirtualBox manage, it belongs in the Solaris user manual. As for automagic VNIC creation it is supposed to be a transparent and seamless to a user. Users can either throw physical links or VNICs (or aggregates once the bug is fixed in S11) at VBox and it will "just" work, which is why the behind-the-scenes internals of VNIC creations in kernel-land etc. are not documented.So that I understand this all better, is this stuff documented somewhere? All I see in the VB documentation is what is listed in chapter 9.The documentation talks about Crossbow, but not the advantages of Crossbow. Where does it say that using Crossbow allows automagic VNIC creation?Starting with VirtualBox 4.1, VirtualBox ships a new network filter driver that utilizes Solaris 11's Crossbow functionality. By default, this new driver is installed for Solaris 11 hosts (builds 159 and above) that has support for it.
To force installation of the older STREAMS based network filter driver, execute as root execute the below command before installing the VirtualBox package:
touch /etc/vboxinst_vboxflt
To force installation of the Crossbow based network filter driver, execute as root the below command before installing the VirtualBox package:
touch /etc/vboxinst_vboxbow
To check which driver is currently being used by VirtualBox, execute:
modinfo | grep vbox
If the output contains "vboxbow", it indicates VirtualBox is using the Crossbow network filter driver, while the name "vboxflt" indicates usage of the older STREAMS network filter.
Oracle Corp.
-
- Posts: 202
- Joined: 11. Sep 2011, 00:24
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances
Re: Network Performance Problems
So the question then is whether or not nxnlvz is using Crossbow in his virtual machines.
-
- Posts: 28
- Joined: 16. Dec 2008, 07:45
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)
Re: Network Performance Problems
To answer the question... yes I am using vboxbow.
I also tried using templates to check out if make there was something else I could tweak. The one thing you will notice here is the Realteck rge interface. It is not the best and my major annoyance with it is no jumbo frames. It is performant on its own however which ultimately means that it should be okay.
There is another quick thing to report. Using and VNIC with a prefix of "vboxvnic" not just "vboxvnic_template" will cause a problem. First off it will not show up in the Management as a select-able interface. Eg. I created "vboxvnic1"
dladm create-vnic -l rge0 vboxvnic1
And then went to look for it to assign it. It was not present in the drop down list. Undeterred, I just used the CLI to do it. I don't remember the errors that came up because of this, and I am not going to test again, but I had to remove it. I would not call this a bug. Just something to be advised about since a sysadmin might instinctively use the same naming scheme.
In the end I am left with no way to really figure out where the problem is centered. That is what is really getting to me.
Code: Select all
nxn@nebula:/etc$ dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE VID
vboxvnic5 rge0 1000 8:0:27:b7:e6:99 fixed 0
nxn@nebula:/etc$ modinfo | grep vbox
231 fffffffff8bdc000 3828 183 1 vboxbow (VirtualBox NetBow 4.2.0_BETA1r7)
232 fffffffff8be0000 34ce0 293 1 vboxdrv (VirtualBox HostDrv 4.2.0_BETA1r)
235 fffffffff8235578 cf0 294 1 vboxnet (VirtualBox NetAdp 4.2.0_BETA1r7)
237 fffffffff8c27000 4568 296 1 vboxusbmon (VirtualBox USBMon 4.2.0_BETA1r7
238 fffffffff8c2c000 7480 297 1 vboxusb (VirtualBox USB 4.2.0_BETA1r7975)
There is another quick thing to report. Using and VNIC with a prefix of "vboxvnic" not just "vboxvnic_template" will cause a problem. First off it will not show up in the Management as a select-able interface. Eg. I created "vboxvnic1"
dladm create-vnic -l rge0 vboxvnic1
And then went to look for it to assign it. It was not present in the drop down list. Undeterred, I just used the CLI to do it. I don't remember the errors that came up because of this, and I am not going to test again, but I had to remove it. I would not call this a bug. Just something to be advised about since a sysadmin might instinctively use the same naming scheme.
In the end I am left with no way to really figure out where the problem is centered. That is what is really getting to me.
-
- Oracle Corporation
- Posts: 793
- Joined: 7. Jan 2008, 16:17
Re: Network Performance Problems
You are not supposed to create vboxvnic on your own. It is done automatically by VBox when you give it "rge0". They are dynamically created and destroyed VNICs and thus will not show up in the GUI as available NICs.nxnlvz wrote:To answer the question... yes I am using vboxbow.
I also tried using templates to check out if make there was something else I could tweak. The one thing you will notice here is the Realteck rge interface. It is not the best and my major annoyance with it is no jumbo frames. It is performant on its own however which ultimately means that it should be okay.Code: Select all
nxn@nebula:/etc$ dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID vboxvnic5 rge0 1000 8:0:27:b7:e6:99 fixed 0 nxn@nebula:/etc$ modinfo | grep vbox 231 fffffffff8bdc000 3828 183 1 vboxbow (VirtualBox NetBow 4.2.0_BETA1r7) 232 fffffffff8be0000 34ce0 293 1 vboxdrv (VirtualBox HostDrv 4.2.0_BETA1r) 235 fffffffff8235578 cf0 294 1 vboxnet (VirtualBox NetAdp 4.2.0_BETA1r7) 237 fffffffff8c27000 4568 296 1 vboxusbmon (VirtualBox USBMon 4.2.0_BETA1r7 238 fffffffff8c2c000 7480 297 1 vboxusb (VirtualBox USB 4.2.0_BETA1r7975)
There is another quick thing to report. Using and VNIC with a prefix of "vboxvnic" not just "vboxvnic_template" will cause a problem. First off it will not show up in the Management as a select-able interface. Eg. I created "vboxvnic1"
dladm create-vnic -l rge0 vboxvnic1
And then went to look for it to assign it. It was not present in the drop down list. Undeterred, I just used the CLI to do it. I don't remember the errors that came up because of this, and I am not going to test again, but I had to remove it. I would not call this a bug. Just something to be advised about since a sysadmin might instinctively use the same naming scheme.
In the end I am left with no way to really figure out where the problem is centered. That is what is really getting to me.
Regarding your performance issues, here is what I understood:
You have a Solaris 11 host with1 physical NIC and a VM using bridged networking with iSCSI, you get iSCSI timeout errors because of bridged networking "taking over" the NIC? Is this correct? Have you tried limiting the bandwidth resources (when you used the VNIC template)?
Oracle Corp.
-
- Posts: 28
- Joined: 16. Dec 2008, 07:45
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)
Re: Network Performance Problems
Not yet on limiting the bandwidth but that is a good idea.
I am going to test some other things also. On a lark I changed out the network switch from a unmanaged Cisco Gb switch to a fully managed Dlink Gb switch so that I could look at the traffic on the wire. Something changed when I did that and produced a different result. Not sure how it affected it yet. In theory this should not have mattered or if it did it should have slowed throughput a bit.
One interesting thing is that happened was a test transfer over NFS, without a guest active, I was able to get full wire speed. That surprised me since I haven't been able to that with this Realtek NIC before. There was still a difference when a guest was active but I don't have a quantitative result. At this point it stirs thoughts of either a conflict with implementation of STP or 802.3p or 802.1q or something else. I haven't run across something like that for some time now.
Two things I will immediately now be looking for:
1. Any packet errors on the wire. None were present on the interface stats before the change.
2. Any packets at all for the iSCSI stuff since it is pointed to loopback address.
At least some progress has been made.
I am going to test some other things also. On a lark I changed out the network switch from a unmanaged Cisco Gb switch to a fully managed Dlink Gb switch so that I could look at the traffic on the wire. Something changed when I did that and produced a different result. Not sure how it affected it yet. In theory this should not have mattered or if it did it should have slowed throughput a bit.
One interesting thing is that happened was a test transfer over NFS, without a guest active, I was able to get full wire speed. That surprised me since I haven't been able to that with this Realtek NIC before. There was still a difference when a guest was active but I don't have a quantitative result. At this point it stirs thoughts of either a conflict with implementation of STP or 802.3p or 802.1q or something else. I haven't run across something like that for some time now.
Two things I will immediately now be looking for:
1. Any packet errors on the wire. None were present on the interface stats before the change.
2. Any packets at all for the iSCSI stuff since it is pointed to loopback address.
At least some progress has been made.
-
- Posts: 28
- Joined: 16. Dec 2008, 07:45
- Primary OS: Solaris
- VBox Version: PUEL
- Guest OSses: Widnows (XP,7,8) / Linux (Debian, Unbuntu) / MacOS (Lion)
Re: Network Performance Problems
So a little update.
Whatever is causing my performance is _is_ some strange correlation with a non-manged Cisco gigabit switch, the Realtek NIC, Solaris 11 and VirtualBox. Take out any one of those components, especially that network switch, and all is mostly normal. Of course testing the strange behavior with that switch inline is time consuming and difficult. In the end it is not worth it to hold on to a relatively cheap piece of hardware. I suspect it has something to do with the ARP table in the switch or the host but have been unable to verify.
One think I am very happy about is that I have not seen the strange iSCSI errors with the change. I am also pleased to see that the host is sustaining wire speed IO with multiple NFS and iSCSI clients. Solaris + ZFS (stripped+mirrored) + napp-it really makes for a snappy little NAS.
Whatever is causing my performance is _is_ some strange correlation with a non-manged Cisco gigabit switch, the Realtek NIC, Solaris 11 and VirtualBox. Take out any one of those components, especially that network switch, and all is mostly normal. Of course testing the strange behavior with that switch inline is time consuming and difficult. In the end it is not worth it to hold on to a relatively cheap piece of hardware. I suspect it has something to do with the ARP table in the switch or the host but have been unable to verify.
One think I am very happy about is that I have not seen the strange iSCSI errors with the change. I am also pleased to see that the host is sustaining wire speed IO with multiple NFS and iSCSI clients. Solaris + ZFS (stripped+mirrored) + napp-it really makes for a snappy little NAS.