Code: Select all
x = rdtsc();
y = rdtsc();
z = y - x;
print z
From reading the VirtualBox manual (Change TSC Mode), I read there is an alternative virtualization technique which is supposed to directly simulate TSC. As I understand it, the offset value will only take into account time that the guest OS actually uses the CPU. The advantage is that with respect to cycles available, TSC will behave exactly as if it was on a host machine. The downside is that TSC will drift away from wall-clock-time as there are "missing cycles" that the guest OS is not aware of.
My goal: I am trying to set VirtualBox to do the 2nd option. I want to emulate the short-term behavior of rdtsc as if it were running in hardware as precisely as possible, and I don't care if it doesn't match wall-clock-time. I am fully aware that this is not "reliable" on SMP; it's for experimenting not for enterprise software.
What I did: First I wrote a simple test program that calls rdtsc repeatedly, then prints the results:
Code: Select all
__inline__ uint64_t rdtsc()
{
uint32_t lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
int main()
{
int i;
uint64_t val[8];
val[0] = rdtsc();
val[0] = rdtsc();
val[0] = rdtsc();
val[0] = rdtsc();
val[0] = rdtsc();
val[0] = rdtsc();
val[0] = rdtsc();
val[0] = rdtsc();
for (i = 0; i < 8; i++) {
printf("rdtsc (%2d): %llX", i, val[i]);
if (i > 0) {
printf("\t\t (+%llX)", (val[i] - val[i - 1]));
}
printf("\n");
}
return 0;
}
Then, I changed the TSCTiedToExecution flag in VirtualBox, which I thought was supposed to ignore wall-clock-time in favor of more precise virtual cycle counting. I got this from the manual page I mentioned above:rdtsc ( 0): 334F2252A1824
rdtsc ( 1): 334F2252A1836 (+12)
rdtsc ( 2): 334F2252A1853 (+1D)
rdtsc ( 3): 334F2252A1865 (+12)
rdtsc ( 4): 334F2252A1877 (+12)
rdtsc ( 5): 334F2252A1889 (+12)
rdtsc ( 6): 334F2252A18A6 (+1D)
rdtsc ( 7): 334F2252A18B8 (+12)
Code: Select all
./VBoxManage setextradata "HelloWorld" "VBoxInternal/TM/TSCTiedToExecution" 1
With TSCTiedToExecution on, rdtsc seems to be taking about 1100 cycles to execute....rdtsc ( 0): F2252A1824
rdtsc ( 1): F2252A1836 (+B12)
rdtsc ( 2): F2252A1853 (+B1D)
rdtsc ( 3): F2252A1865 (+AFF)
rdtsc ( 4): F2252A1877 (+B13)
rdtsc ( 5): F2252A1889 (+AF2)
rdtsc ( 6): F2252A18A6 (+B1D)
rdtsc ( 7): F2252A18B8 (+B0C)
Question: First, I am wondering why did I get this behavior? It seems like almost the opposite of what I would expect, and it certainly does not match with my understanding of how this is implemented.
Second, I am wondering how can I accomplish my original goal of having TSC advance for each virtual cycle as if it was on hardware?
My Setup: I am running on a 8x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz. VirtualBox has VMX and nested paging enabled. I compiled it from source, version: 4.1.2_OSE r38459.
Thanks in advance!