Multi-CPU guest = 10x slower performance?
Posted: 8. Apr 2014, 12:18
Hello,
Host: VB 4.3.10 on CentOS 6.5 (2.6.32)
Guest: Ubuntu 13.10 (3.11.0)
Everything runs well except memcached (so far).. When running the guest with 2 or more CPUs:
mdv1:~$ memcslap --server 127.0.0.1:11211
Threads connecting to servers 1
Took 2.997 seconds to load data
When running the guest with 1 CPU:
mdv1:~$ memcslap --server 127.0.0.1:11211
Threads connecting to servers 1
Took 0.302 seconds to load data
I've tried with memcached 1.4.10 and 1.4.17, same results. I've also tried the same guest under the same scenarios except on a OSX/10.7.5 host, same results. strace on memcached reveals that they're essentially doing the same thing in both scenarios (epoll, etc). Tried running memcached in 1 thread, 4 threads, 16 threads, same result. Tried before and after installing vbox guest additions, same.
Guest settings: 4096MB, PAE/NX off, VT-x/AMD-V on, Nested Paging on, Host I/O Cache on SATA drive on.
I'm getting this from dmesg:
mdv1:~$ dmesg|grep CPU
[ 0.000000] CPU MTRRs all blank - virtualized system.
[ 0.000000] ACPI: SSDT 00000000dfff02b0 001CC (v01 VBOX VBOXCPUT 00000002 INTL 20100528)
[ 0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 29 pages/cpu @ffff88011fc00000 s86720 r8192 d23872 u524288
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-255.
[ 0.019442] CPU: Physical Processor ID: 0
[ 0.020018] CPU: Processor Core ID: 0
[ 0.020026] mce: CPU supports 0 MCE banks
[ 0.091463] smpboot: CPU0: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz (fam: 06, model: 1e, stepping: 05)
[ 0.092000] Performance Events: unsupported p6 CPU model 30 no PMU driver, software events only.
[ 0.092000] mce: CPU supports 0 MCE banks
[ 0.103667] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.103674] Measured 89831 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.103795] mce: CPU supports 0 MCE banks
[ 0.115706] mce: CPU supports 0 MCE banks
[ 0.127526] Brought up 4 CPUs
[ 0.675795] ledtrig-cpu: registered to indicate activity on CPUs
[ 2.261072] microcode: CPU0 sig=0x106e5, pf=0x4, revision=0x616
[ 2.349037] microcode: CPU1 sig=0x106e5, pf=0x4, revision=0x616
[ 2.372894] microcode: CPU2 sig=0x106e5, pf=0x4, revision=0x616
[ 2.390995] microcode: CPU3 sig=0x106e5, pf=0x4, revision=0x616
Any ideas?
Thanks
Host: VB 4.3.10 on CentOS 6.5 (2.6.32)
Guest: Ubuntu 13.10 (3.11.0)
Everything runs well except memcached (so far).. When running the guest with 2 or more CPUs:
mdv1:~$ memcslap --server 127.0.0.1:11211
Threads connecting to servers 1
Took 2.997 seconds to load data
When running the guest with 1 CPU:
mdv1:~$ memcslap --server 127.0.0.1:11211
Threads connecting to servers 1
Took 0.302 seconds to load data
I've tried with memcached 1.4.10 and 1.4.17, same results. I've also tried the same guest under the same scenarios except on a OSX/10.7.5 host, same results. strace on memcached reveals that they're essentially doing the same thing in both scenarios (epoll, etc). Tried running memcached in 1 thread, 4 threads, 16 threads, same result. Tried before and after installing vbox guest additions, same.
Guest settings: 4096MB, PAE/NX off, VT-x/AMD-V on, Nested Paging on, Host I/O Cache on SATA drive on.
I'm getting this from dmesg:
mdv1:~$ dmesg|grep CPU
[ 0.000000] CPU MTRRs all blank - virtualized system.
[ 0.000000] ACPI: SSDT 00000000dfff02b0 001CC (v01 VBOX VBOXCPUT 00000002 INTL 20100528)
[ 0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 29 pages/cpu @ffff88011fc00000 s86720 r8192 d23872 u524288
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-255.
[ 0.019442] CPU: Physical Processor ID: 0
[ 0.020018] CPU: Processor Core ID: 0
[ 0.020026] mce: CPU supports 0 MCE banks
[ 0.091463] smpboot: CPU0: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz (fam: 06, model: 1e, stepping: 05)
[ 0.092000] Performance Events: unsupported p6 CPU model 30 no PMU driver, software events only.
[ 0.092000] mce: CPU supports 0 MCE banks
[ 0.103667] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.103674] Measured 89831 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.103795] mce: CPU supports 0 MCE banks
[ 0.115706] mce: CPU supports 0 MCE banks
[ 0.127526] Brought up 4 CPUs
[ 0.675795] ledtrig-cpu: registered to indicate activity on CPUs
[ 2.261072] microcode: CPU0 sig=0x106e5, pf=0x4, revision=0x616
[ 2.349037] microcode: CPU1 sig=0x106e5, pf=0x4, revision=0x616
[ 2.372894] microcode: CPU2 sig=0x106e5, pf=0x4, revision=0x616
[ 2.390995] microcode: CPU3 sig=0x106e5, pf=0x4, revision=0x616
Any ideas?
Thanks