Size of Snapshot-VDI

Discussions about using Windows guests in VirtualBox.
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Mostly XP

Re: Size of Snapshot-VDI

Post by mpack »

noteirak wrote:Finally, let's say you need to know what is the state of sector 118. If you actually compacted the diff disk like you expect it to work, it means that you'll remove the 0 from the diff disk, but then, when you try to read the empty sector, they are not defined into the diff disk, so you'll get the value from the base disk..... which actually contains data! So you couldn't compact it either :)
That isn't quite how it works. In VDI a 1MB block is classified in one of three ways: UNALLOCATED, ZERO-BLOCK (a block filled with zeros), and ALLOCATED (a block filled with anything - which may include all zeros). Only the last classification needs actual data storage in the image file. Currently zero blocks are not recognized dynamically, so once a block of disk storage is allocated it remains allocated forever...

Except... when you compact (using VBoxManage modifyhd --compact), or clone a disk (VBoxManage clonehd) then VBoxManage at that time checks the allocated blocks to see if they are filled with zeros. If yes then the allocated storage is discarded and the block map entry is changed to ZERO-BLOCK. Hence if you run sdelete and manually fill all unused virtual disk space with zeros then on compaction/cloning these will all be replaced with ZERO-BLOCK markers - they will never be replaced with UNALLOCATED markers. As you say, zeroed is not the same as unallocated.

In snapshot chains, only unallocated blocks cause the parent to be checked for data. Not zero blocks. A zero block in the current state will return a block filled with zeroes, regardless of what the parent had for this block.

So, it is technically quite possible to compact the current state of a snapshot chain - it's just likely to ineffective in a large number of cases because you can only apply it to the current state, which may be a small fraction of the disk space being used by the entire snapshot chain.
noteirak
Site Moderator
Posts: 5231
Joined: 13. Jan 2012, 11:14
Primary OS: Debian other
VBox Version: OSE Debian
Guest OSses: Debian, Win 2k8, Win 7
Contact:

Re: Size of Snapshot-VDI

Post by noteirak »

@mpack : You're totally right , for the bit you quoted it was only me trying to put into words how I thought Tsso was seeing it and in no way how I thought it works, but you gave the technical term for all my long explanation before, thank you for that.
@Tsso : all good then, I wasn't sure so I prefered to state everything so it's all clear and precise. As for the 0 not being 0 before, it can work, but only if the 2 area are both recognised as "empty", being zero'd or being undefined. In your case, it was operlapping for 100Mb, so you gained that space :)
Hyperbox - Virtual Infrastructure Manager - https://apps.kamax.lu/hyperbox/
Manage your VirtualBox infrastructure the free way!
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Mostly XP

Re: Size of Snapshot-VDI

Post by mpack »

Incidentally, the compaction feature in CloneVDI is entirely different. Although it does recognize zero blocks it does not rely on that. Instead it understands the guest filesystem, and hence is able to directly check whether a given block is used or not. If the block is not used then that block is discarded in the clone. There is a tiny possible way that this can go wrong: if an application "hides" data on the disk outside the filesystem then the hidden data will be lost (*). When CloneVDI discards a block it gets reset to UNALLOCATED. Only if a block is allocated but filled with zeros will CloneVDI set it to ZERO-BLOCK. Note that unallocated blocks and zero blocks take the same space, so when using CloneVDI there is no need to run sdelete - it will not improve compaction results.

(*) A nasty example of this is Microsoft's so-called "Dynamic Disk" feature (similar to Linux's LVM, and not to be confused with "Dynamically Allocated" as in a VDI), which hides partition information in an unpartitioned 1MB space at the end of the disk. I can't imagine why they didn't just put this data inside a legitimate partition, I assume it was a dumb attempt to hide how the feature worked. Of course people found it and reverse engineered the data almost immediately.
 Edit:  Incidentally, this doesn't mean that CloneVDI will corrupt Windows "Dynamic Disks". Obviously I know about the problem so I took steps to avoid it, specifically I make sure that the last 1MB of disk space is never discarded. 
Last edited by mpack on 15. Nov 2012, 19:04, edited 1 time in total.
noteirak
Site Moderator
Posts: 5231
Joined: 13. Jan 2012, 11:14
Primary OS: Debian other
VBox Version: OSE Debian
Guest OSses: Debian, Win 2k8, Win 7
Contact:

Re: Size of Snapshot-VDI

Post by noteirak »

mpack wrote:Incidentally, the compaction feature in CloneVDI is entirely different. Although it does recognize zero blocks it does not rely on that. Instead it understands the guest filesystem, and hence is able to directly check whether a given block is used or not. If the block is not used then that block is discarded in the clone. There is a tiny possible way that this can go wrong: if an application "hides" data on the disk outside the filesystem then the hidden data will be lost (*). When CloneVDI discards a block it gets reset to UNALLOCATED. Only if a block is allocated but filled with zeros will CloneVDI set it to ZERO-BLOCK. Note that unallocated blocks and zero blocks take the same space, so when using CloneVDI there is no need to run sdelete - it will not improve compaction results.

(*) A nasty example of this is Microsoft's so-called "Dynamic Disk" feature (similar to Linux's LVM, and not to be confused with "Dynamically Allocated" as in a VDI), which hides partition information in an unpartitioned 1MB space at the end of the disk. I can't imagine why they didn't just put this data inside a legitimate partition, I assume it was a dumb attempt to hide how the feature worked. Of course people found it and reverse engineered the data almost immediately.
That's good to know!
Hyperbox - Virtual Infrastructure Manager - https://apps.kamax.lu/hyperbox/
Manage your VirtualBox infrastructure the free way!
Tsso
Posts: 25
Joined: 9. Sep 2012, 19:32

Re: Size of Snapshot-VDI

Post by Tsso »

So teh reason for the dramatic increase might be that the new files are distributed in a lot of different 1MB-blocks , since the vdi "thinks" only in 1MB blocks. while this makes sense from a performance point of view I didn't expect the units of the vdi beeing that big. so the only thing one can do is use a defrag, which really puts all the files together (usually at the beginning of the disk). doing so for all files of the base image (and assuming you never delete a file which was already in the base image) it should be possible to make the snapshots as small as possible, right?
noteirak
Site Moderator
Posts: 5231
Joined: 13. Jan 2012, 11:14
Primary OS: Debian other
VBox Version: OSE Debian
Guest OSses: Debian, Win 2k8, Win 7
Contact:

Re: Size of Snapshot-VDI

Post by noteirak »

If you do a defrag that puts all the latest modified files at the end of your disk in the base image, then yes, it would be the best approach.
There is no guarantee on how much you will get back tho. You cannot predict which and how a disk section will be modified.

Once again, it's all a story of differences (being less, more or just different)...
Hyperbox - Virtual Infrastructure Manager - https://apps.kamax.lu/hyperbox/
Manage your VirtualBox infrastructure the free way!
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Mostly XP

Re: Size of Snapshot-VDI

Post by mpack »

Most modern filesystems keep file clusters together as much as possible anyway, though yes its possible that a lot of small files might cause the VDI to expand dramatically. It's often the case that if you find a free 4K cluster X, then X+1 will be free as well, so I think you are overstating the potential for fragmentation: there is no mechanism that tends to disperse 4K clusters across as many 1MB blocks as possible. IME people who spend time defragging before compacting usually benefit by no more than around 5%, and even that only if they've never defragged before (or not in the last year of constant use anyway). I personally do occasionally compact (using CloneVDI), but I never bother to defrag. Of course I never use snapshots either.
Post Reply