[RFE] Please make use of Sparse files and Hole Punching
Posted: 4. Oct 2017, 04:10
This is a request to enhance VirtualBox so that it makes use of sparse files on Linux hosts. Sparse files have several advantages. First of all, at the time of creation, sparse files consume less space. Also, as blocks get discarded (e.g. TRIMed) by the guest, individual blocks of the sparse file may actually be released through what is AFAIK called hole-punching.
Consider the following example:
- Create a sparse file
- Create an ext4 filesystem inside the sparse file.
- Loop-mount the sparse file
- Copy data to the mounted filesystem (observe with the du command how the disk-usage of the sparse file grows)
- Delete the data from the mounted filesystem
- Run fstrim for the mounted filesystem (observe with the du command how the disk-usage of the sparse file shrinks)
QEMU seems to use hole-punching for sparse files as well. That is, as the guest runs fstrim, qemu releases blocks of the sparse file on the host. The option discard=unmap must be enabled for the virtual drive.
Currently, when I created a Fixed-size VDI of say 64GB, VirtualBox will writes 64GB of data to my SSD and does not use a sparse file. Also, a TRIM by the guest does not release space.
I have read that VirtualBox implements some feature for variable size VDI images, where it reduces the actual filesize my moving 1MB blocks from the end of the file to 1MB blocks which have been discarded by the guest. I appreciate that effort, however, using sparse files with hole-punching may allow to release space with a much lower granularity than 1MB. Also, for each 1MB block that is is moved to a TRIMed block, actually 1MB is written to the hosts drive. Discarding blocks from a sparse file should reduce the amount of data written to the hosts drive.
I am aware that hole-punching would increase the fragmentation of sparse files. So it should only be used if the disk image is stored on an SSD where fragmentation is less of an issue.
Consider the following example:
- Create a sparse file
- Create an ext4 filesystem inside the sparse file.
- Loop-mount the sparse file
- Copy data to the mounted filesystem (observe with the du command how the disk-usage of the sparse file grows)
- Delete the data from the mounted filesystem
- Run fstrim for the mounted filesystem (observe with the du command how the disk-usage of the sparse file shrinks)
QEMU seems to use hole-punching for sparse files as well. That is, as the guest runs fstrim, qemu releases blocks of the sparse file on the host. The option discard=unmap must be enabled for the virtual drive.
Currently, when I created a Fixed-size VDI of say 64GB, VirtualBox will writes 64GB of data to my SSD and does not use a sparse file. Also, a TRIM by the guest does not release space.
I have read that VirtualBox implements some feature for variable size VDI images, where it reduces the actual filesize my moving 1MB blocks from the end of the file to 1MB blocks which have been discarded by the guest. I appreciate that effort, however, using sparse files with hole-punching may allow to release space with a much lower granularity than 1MB. Also, for each 1MB block that is is moved to a TRIMed block, actually 1MB is written to the hosts drive. Discarding blocks from a sparse file should reduce the amount of data written to the hosts drive.
I am aware that hole-punching would increase the fragmentation of sparse files. So it should only be used if the disk image is stored on an SSD where fragmentation is less of an issue.