Page 1 of 1

[RFE] Please make use of Sparse files and Hole Punching

Posted: 4. Oct 2017, 04:10
by skoehler
This is a request to enhance VirtualBox so that it makes use of sparse files on Linux hosts. Sparse files have several advantages. First of all, at the time of creation, sparse files consume less space. Also, as blocks get discarded (e.g. TRIMed) by the guest, individual blocks of the sparse file may actually be released through what is AFAIK called hole-punching.

Consider the following example:
- Create a sparse file
- Create an ext4 filesystem inside the sparse file.
- Loop-mount the sparse file
- Copy data to the mounted filesystem (observe with the du command how the disk-usage of the sparse file grows)
- Delete the data from the mounted filesystem
- Run fstrim for the mounted filesystem (observe with the du command how the disk-usage of the sparse file shrinks)

QEMU seems to use hole-punching for sparse files as well. That is, as the guest runs fstrim, qemu releases blocks of the sparse file on the host. The option discard=unmap must be enabled for the virtual drive.

Currently, when I created a Fixed-size VDI of say 64GB, VirtualBox will writes 64GB of data to my SSD and does not use a sparse file. Also, a TRIM by the guest does not release space.
I have read that VirtualBox implements some feature for variable size VDI images, where it reduces the actual filesize my moving 1MB blocks from the end of the file to 1MB blocks which have been discarded by the guest. I appreciate that effort, however, using sparse files with hole-punching may allow to release space with a much lower granularity than 1MB. Also, for each 1MB block that is is moved to a TRIMed block, actually 1MB is written to the hosts drive. Discarding blocks from a sparse file should reduce the amount of data written to the hosts drive.

I am aware that hole-punching would increase the fragmentation of sparse files. So it should only be used if the disk image is stored on an SSD where fragmentation is less of an issue.

Re: [RFE] Please make use of Sparse files and Hole Punching

Posted: 4. Oct 2017, 09:29
by socratis
Since this is an RFE, moving to "Suggestions" from "Linux Hosts".

Re: [RFE] Please make use of Sparse files and Hole Punching

Posted: 4. Oct 2017, 10:23
by mpack
Isn't "sparse file" just another way of saying "dynamic VDI"?

A fixed size sparse file makes no sense to me. People choose fixed size because they heard it was faster, or more robust. And as mentioned, if you wanted a sparse representation of a fixed size VDI, thats... uh, that's dynamic VDI.

"Sparse file" is a term which has been known to me for decades. Are you perhaps referring to a particular implementation of sparse file, for a particular purpose, in a particular filesystem on a particular OS? If so it would be nice if the text said so - for those of us who don't use the same OS.

Re: [RFE] Please make use of Sparse files and Hole Punching

Posted: 4. Oct 2017, 13:17
by skoehler
mpack wrote:Isn't "sparse file" just another way of saying "dynamic VDI"?
VDI files allocate 1MB blocks. The blocks allocated for sparse files may be much smaller.
mpack wrote:A fixed size sparse file makes no sense to me. People choose fixed size because they heard it was faster, or more robust. And as mentioned, if you wanted a sparse representation of a fixed size VDI, thats... uh, that's dynamic VDI.
The point of my post is more about hole-punching than sparse files. However, hole punching is only available for sparse files, AFAIK.

As I pointed out in my post, VirtualBox currently supports to actually shrink dynamic VDI files if the guest TRIMs sectors. However, this is disabled by default and VirtualBox can only do so by moving 1MB blocks away from the end of the VDI file.
Thus, it can only reduce the size of the VDI file if the guest TRIMs a continuous aligned 1MB block. Using sparse files with hole-punching would allow to discard blocks from a VDI file without moving data inside the VDI file. Hole punching actually translates into a TRIM by the host. So hole punching allows for a much more direct way to pass TRIMs by the guest through to the host's disk drive.

As you noted, it doesn't make much sense to use sparse files for dynamic VDI files. So that's why I suggested using sparse files with hole punching for fixed size VDI files. Then again, hole punching may also allow to discard blocks of dynamic VDI files without the additional work of moving 1MB blocks.
mpack wrote:"Sparse file" is a term which has been known to me for decades. Are you perhaps referring to a particular implementation of sparse file, for a particular purpose, in a particular filesystem on a particular OS? If so it would be nice if the text said so - for those of us who don't use the same OS.
I do not know whether hole punching is available on Windows or OS X. On Linux however, hole punching is available if the sparse file resides in a ext4 file system. I did not test other filesystems besides ntfs and ntfs-3g (the ntfs driver primarily used on Linux) does not seem to support hole punching.