nanosleep() works until it doesn't

Discussions about using Linux guests in VirtualBox.
Post Reply
xorbe
Posts: 39
Joined: 4. Apr 2013, 02:50
Primary OS: MS Windows other
VBox Version: OSE other
Guest OSses: openSUSE Tumbleweed

nanosleep() works until it doesn't

Post by xorbe »

I am developing an audio application, and one of my machines is a quad-core Trinity 4600M with 8GB dual-chan mem using vbox 4.2.10 (Host: Win7 SP1 x64, Guest: openSUSE 12.2 x64). It's plugged into the wall for power for max performance. All "power saving" (performance robbing) features are disabled. The host is not running any notable processes or background processes, except for vbox.

I have a small test program that reads the time, nanosleep()s for 10ms { .tv_sec=0, .tv_nsec=10000000 }, and then checks the time again. Generally there is about 1ms of delay beyond the requested 10 ms, and that can wander up to 4ms occasionally (no big deal for devel work in a virtualize environment), but after about 30~120 seconds of these continuous 10ms nanosleeps, it suddenly doesn't return for 200~3800 ms! This does NOT happen on the bare metal (typically just 80-110us extra delay beyond 10ms, max observed 1.4ms). I tried backing up to vbox 4.2.6 since I saw a nanosleep/SIG_ALRM fix for 4.2.8, but that didn't help. VirtualBox otherwise does not freeze during this period -- it's doing fine. The cores are only 10~20% utilized when this happens. It doesn't freeze forever, so I can't get a kernel debugger on it. I tried nanosleep, clock_nanosleep with CLOCK_MONOTONIC and with/without TIMER_ABSTIME, and the portable select(0, NULL, NULL, NULL, &tv) method. All produce the same issue. If I use nanosleep(80ms) then the issue doesn't seem to happen. Also I can omit the nanosleep and let the thread burn in a hot loop, and all is well.

Possible vbox bug around small timers?
noteirak
Site Moderator
Posts: 5231
Joined: 13. Jan 2012, 11:14
Primary OS: Debian other
VBox Version: OSE Debian
Guest OSses: Debian, Win 2k8, Win 7
Contact:

Re: nanosleep() works until it doesn't

Post by noteirak »

There are inherent timing issue with Virtualization - anything timing related will most likely fail, just like you pointed out.
Hyperbox - Virtual Infrastructure Manager - https://apps.kamax.lu/hyperbox/
Manage your VirtualBox infrastructure the free way!
xorbe
Posts: 39
Joined: 4. Apr 2013, 02:50
Primary OS: MS Windows other
VBox Version: OSE other
Guest OSses: openSUSE Tumbleweed

Re: nanosleep() works until it doesn't

Post by xorbe »

Well that's unfortunate. The same thing seems to work okay on my other desktop with same versions of vbox / windows host / linux guest (edit: actually saw a 90ms hitch). I guess I'll stick to Linux on metal. It was convenient to code on the go with my Windows laptop. Hard to believe that nanosleep() / usleep() blocking for extra seconds is legit
Last edited by xorbe on 5. Apr 2013, 16:50, edited 1 time in total.
noteirak
Site Moderator
Posts: 5231
Joined: 13. Jan 2012, 11:14
Primary OS: Debian other
VBox Version: OSE Debian
Guest OSses: Debian, Win 2k8, Win 7
Contact:

Re: nanosleep() works until it doesn't

Post by noteirak »

I cannot talk on the matter of seconds, which seems to be much, but the timing issue is a fact.
maybe mpack can give more insight on this.
Hyperbox - Virtual Infrastructure Manager - https://apps.kamax.lu/hyperbox/
Manage your VirtualBox infrastructure the free way!
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Mostly XP

Re: nanosleep() works until it doesn't

Post by mpack »

The guest can't time things to greater accuracy than the host, and Windows hosts are not noted for their real-time responsiveness. I believe the standard tick rate of the system clock on NT hosts is still several milliseconds, and any wait would be rounded to some multiple of that. The inaccuracy has nothing to do with VirtualBox, it's the host that limits timer granularity, and IMHO VBox can't change the physical host tick rate without affecting the host.

As to the sudden change of behaviour, that does look like a bug. I'd bet that VirtualBox is implementing a queue of timer events, and sooner or later it falls behind and the queue overflows, and some time-to-expiry value goes negative. Total guess, but as a developer myself it has that feel, that's where I'd be looking. Bottom line: try to get fancy with VM timing and it will probably break. The devs may not give this a high priority unless their paying customers have similar problems.
xorbe
Posts: 39
Joined: 4. Apr 2013, 02:50
Primary OS: MS Windows other
VBox Version: OSE other
Guest OSses: openSUSE Tumbleweed

Re: nanosleep() works until it doesn't

Post by xorbe »

Yeah I'm not surprised about the reduction in timing accuracy within VirtualBox, and it's good enough (0.5~2 ms typical lag, occasional 3-6ms spikes) even for audio development on the go. The deal breaker is when it just doesn't come back for several seconds under minimal load -- the queue thing and negative delay is what I had in mind, after much nanosleep() bug research.

edit #1: I adjusted one thread of the program to use sem_wait / sem_post (which was the plan all along) instead of polling + nanosleep, which cured the one thread. But the other thread is truly time based (wait 15 ms, send event, wait 5 ms, send event, etc).

Code: Select all

lag:   0.240570ms (max:  5.758074ms)
lag:   0.993890ms (max:  5.758074ms)
lag:   0.351900ms (max:  5.758074ms)
lag: 338.950916ms (max:338.950916ms)
lag:   0.342136ms (max:338.950916ms)
lag:   0.292581ms (max:338.950916ms)
lag:   0.318937ms (max:338.950916ms)
edit #2 with work-around: it chugged for about 45 minutes with at most a 9.2ms lag spike, average lag 0.48ms, but then hitched for 345ms -- that's a lot better than hitching every 5~30 seconds. Perhaps the detail below will point developers in the direction of the bug:

Code: Select all

delay = when - now;
if (delay > 0) {
  sched_yield();      // Yielding right before nanosleep appears to be a decent work-around.
  delay = when - now; // Because now has probably changed.
  if (delay > 0) nanosleep(delay);
}
mpack wrote:The devs may not give this a high priority unless their paying customers have similar problems.
Hmm I'd not want to be a paying customer, and then see reported performance bugs like this brushed aside ... but I digress since I've enjoyed vbox for free so far. They might not realize they are suffering from unnatural delays, or know how to isolate or bother reporting it. I have a private 45KB test case if a dev wants to pursue this.
Post Reply