Typical data speed veriation - expected or possible issue?

Discussions related to using VirtualBox on Windows hosts.
Post Reply
TDIT
Posts: 21
Joined: 21. Feb 2016, 16:45

Typical data speed veriation - expected or possible issue?

Post by TDIT »

Good afternoon all,

Running 5.1.28 on Windows 7 host, Lenovo Intel i3, 500gb Samsung Evo SSD, 16gb ram.
* Host & VM's otherwise idle during tests.
** All hardware drivers up to date including Lenovo driver for SSD's.

Two client VB VM's; both Windows 7 64bit. 100gb each, 2 cpu's each, 2gb ram each. Both vdi files. Antivirus disabled on host and clients.

Scenario:
A) Copying data (single zip file approx 440mb) on the Host system to the host system - roughly 45 MB/s
B) Copying same data from host to vb client machine (via shared drive) - roughly 15 MB/s
C) Copying same data from vb client to new folder in same vb client - roughly 25 MB/s

Question:
Given that 'A' is the best possible result, is it considered normal to get just over half the host machines data rate when copying data within the same client vdi file 'C', but only a third when copying between host and client machines 'B'?

If this is considered normal, then all good. If not, I'd appreciate any suggestions as to how I can improve overall efficiency.

Many thanks to all!
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Typical data speed veriation - expected or possible issue?

Post by mpack »

'A' probably involved copying data that was already cached.

'B' involves a share. It's going to be slower.

The speed of 'C' depends on whether the VDI had to grow to accomodate it, the type of hdd controller being simulated, the amount of fragmentation and free space inside the guest (people often make virtual drives too small and fixed size, leading to poor performance). Etc. The details matter. Understanding matters a lot more than blind benchmarks.

That all said, a loss of performance is expected in a VM, as performance is not the point.
TDIT
Posts: 21
Joined: 21. Feb 2016, 16:45

Re: Typical data speed veriation - expected or possible issue?

Post by TDIT »

G'Day mpack,

It was the first test I did with that file for exactly that reason, to make sure there was no caching present. I should have mentioned it in my original post. I've also tested it was various other files and found the host transfer rate is reasonably constant.

I did expect the share to be the slowest test, but I wasn't really expecting that slow exactly. If it's considered typical, I'll accept it for what it is.

I also should have mentioned the vdi is 100gb with about 30gb used. They were actually resized with your program (thank you) because I originally made them too small and fixed. But I wasn't aware fragmentation played a big part on ssd's, or do you mean within the vdi? If I periodically compact the vdi, does this take care of the fragmentation, or are you talking about performing a de-fragmentation on the Windows install within the vdi?

Ports simulated are sata ahci. That was the default and from what I read sata is considered pretty much ideal in this setup.

I get what you mean by the details - it all makes sense. I just wanted to make sure what I was describing was not completely atypical. And yes, I do get that nesting one OS inside another is never going to be done for performance :) I just wanted to make sure I hadn't created latency by incorrect configuration.

Cheers and thanks for the quick reply - it's much appreciated.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Typical data speed veriation - expected or possible issue?

Post by mpack »

If you're already using SATA then there shouldn't really be a lot of overhead I'd have said, though the precise VM settings can make a difference (whether or not to enable host cacheing), and of course the exact circumstances as mentioned.
TDIT
Posts: 21
Joined: 21. Feb 2016, 16:45

Re: Typical data speed veriation - expected or possible issue?

Post by TDIT »

Question - is there a down side to enabling the i/o cache? It's not enabled by default, which normally is a for a reason...
socratis
Site Moderator
Posts: 27330
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: Typical data speed veriation - expected or possible issue?

Post by socratis »

TDIT wrote:which normally is a for a reason
Playing it safe. If you're on a workstation without UPS and the power goes down you have way higher chances of corrupting things that are cached. So "conservative" is the key word.
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
TDIT
Posts: 21
Joined: 21. Feb 2016, 16:45

Re: Typical data speed veriation - expected or possible issue?

Post by TDIT »

Thanks socratis. Leaving it disabled :)
socratis
Site Moderator
Posts: 27330
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: Typical data speed veriation - expected or possible issue?

Post by socratis »

I wouldn't... ;)
Your host is Windows 7 and I'll assume it's running on an NTFS based hard drive. NTFS is a journaled system which is way less corruption prone compared to FAT or any other non-journaled systems.
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Typical data speed veriation - expected or possible issue?

Post by mpack »

TDIT wrote:Thanks socratis. Leaving it disabled :)
But of course, the flip side of the same coin is that "more conservative" always means lower performance.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Typical data speed veriation - expected or possible issue?

Post by mpack »

socratis wrote:NTFS is a journaled system which is way less corruption prone compared to FAT or any other non-journaled systems.
Hmm. I'm not sure I buy that in this context. Journalling will protect host file metadata (directory structures, allocation bitmaps), not user file contents, and not if the contents were still in a cache when the power was cut (or whatever). Of course if the guest OS does journalling then that could certainly be significant.
TDIT
Posts: 21
Joined: 21. Feb 2016, 16:45

Re: Typical data speed veriation - expected or possible issue?

Post by TDIT »

If the performance of the guest os was thought to be poor, then I'd probably enable the i/o cache and run the risks. But even with a ups I can't completely rule out an os crash of the host, so I could (could) end up with a corrupted guest because I was chasing performance improvements. So for me, the safer option is leaving the cache disabled, even of that means I'm loosing out on some extra performance...
socratis
Site Moderator
Posts: 27330
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: Typical data speed veriation - expected or possible issue?

Post by socratis »

mpack wrote:Journalling will protect host file metadata (directory structures, allocation bitmaps), not user file contents
I always thought of the journaled corruption protection as follows: (user) data written to the file system will not be "committed" as successfully written unless the whole transaction has taken place, otherwise the "journal" of what happened can be replayed. Something roughly similar basically to the Wikipedia article I just discovered:
Journaling file system wrote:A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the intentions of such changes in a data structure known as a "journal", which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted.[1][2]

Depending on the actual implementation, a journaling file system may only keep track of stored metadata, resulting in improved performance at the expense of increased possibility for data corruption. Alternatively, a journaling file system may track both stored data and related metadata, while some implementations allow selectable behavior in this regard.[3]
You might be referring to the second paragraph of that introduction. I was not aware that it depended on the implementation, I always thought that NTFS was guarding both the metadata and the data. But, following the reading of that Wikipedia article, I found a paper "Analysis and Evolution of Journaling File Systems" (PDF), which had the following to say (among a lot of other things):
From our analysis, we found that NTFS does not do data journaling. This can be easily verified by the amount of data traffic observed by the SBA driver. We also found that NTFS, similar to JFS, does not do block-level journaling. It journals metadata in terms of records. We verified that whole blocks are not journaled in NTFS by matching the contents of the fixed-location traffic to the contents of the journal traffic.
Interesting stuff... ;)
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Typical data speed veriation - expected or possible issue?

Post by mpack »

socratis wrote: I always thought of the journaled corruption protection as follows: (user) data written to the file system will not be "committed" as successfully written unless the whole transaction has taken place
I don't see how that can work. There are several different data structures in different parts of the disk, these can't all be written at once, so there will always exist a moment in time where a power cut would leave only some of those structures updated. FAT was notorious for this kind of corruption.

Journaling is, AFAIK, just a fancy name for version control on the data structures. You update structure X and add a version record to say what you changed, this structure is now on version N+1. Then you update the next structure and adds a similar record to its version history. Since all structures should be updated for each transaction they should all have the same version history (i.e. they should all be on version N+1): if you detect a conflict then you roll back changes to version N or even earlier, i.e. until you can get back to the last version they can all sync to.

So journalling allows it to recover from minor corruptions of the filesystem, it doesn't prevent loss of user data say when a power cut happens, unless that data was written and all the structures synced - which for performance reasons will only happen periodically.

I doubt that NTFS is special in this regard. What I outlined above is basically physical law in spinning hdds, so it'll apply to other journalling filesytems too. There is admittedly a possibility that future incarnations of SSD will make some of this discussion obsolete, e.g. SSDs can potentially update multiple data structures simultaneously, and/or continuously.
Post Reply