[RFE] Please make use of Sparse files and Hole Punching

Here you can provide suggestions on how to improve the product, website, etc.
Post Reply
skoehler
Posts: 117
Joined: 1. Dec 2008, 12:12

[RFE] Please make use of Sparse files and Hole Punching

Post by skoehler »

This is a request to enhance VirtualBox so that it makes use of sparse files on Linux hosts. Sparse files have several advantages. First of all, at the time of creation, sparse files consume less space. Also, as blocks get discarded (e.g. TRIMed) by the guest, individual blocks of the sparse file may actually be released through what is AFAIK called hole-punching.

Consider the following example:
- Create a sparse file
- Create an ext4 filesystem inside the sparse file.
- Loop-mount the sparse file
- Copy data to the mounted filesystem (observe with the du command how the disk-usage of the sparse file grows)
- Delete the data from the mounted filesystem
- Run fstrim for the mounted filesystem (observe with the du command how the disk-usage of the sparse file shrinks)

QEMU seems to use hole-punching for sparse files as well. That is, as the guest runs fstrim, qemu releases blocks of the sparse file on the host. The option discard=unmap must be enabled for the virtual drive.

Currently, when I created a Fixed-size VDI of say 64GB, VirtualBox will writes 64GB of data to my SSD and does not use a sparse file. Also, a TRIM by the guest does not release space.
I have read that VirtualBox implements some feature for variable size VDI images, where it reduces the actual filesize my moving 1MB blocks from the end of the file to 1MB blocks which have been discarded by the guest. I appreciate that effort, however, using sparse files with hole-punching may allow to release space with a much lower granularity than 1MB. Also, for each 1MB block that is is moved to a TRIMed block, actually 1MB is written to the hosts drive. Discarding blocks from a sparse file should reduce the amount of data written to the hosts drive.

I am aware that hole-punching would increase the fragmentation of sparse files. So it should only be used if the disk image is stored on an SSD where fragmentation is less of an issue.
Last edited by skoehler on 4. Oct 2017, 13:06, edited 1 time in total.
socratis
Site Moderator
Posts: 27330
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: [RFE] Please make use of Sparse files and Hole Punching

Post by socratis »

Since this is an RFE, moving to "Suggestions" from "Linux Hosts".
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: [RFE] Please make use of Sparse files and Hole Punching

Post by mpack »

Isn't "sparse file" just another way of saying "dynamic VDI"?

A fixed size sparse file makes no sense to me. People choose fixed size because they heard it was faster, or more robust. And as mentioned, if you wanted a sparse representation of a fixed size VDI, thats... uh, that's dynamic VDI.

"Sparse file" is a term which has been known to me for decades. Are you perhaps referring to a particular implementation of sparse file, for a particular purpose, in a particular filesystem on a particular OS? If so it would be nice if the text said so - for those of us who don't use the same OS.
skoehler
Posts: 117
Joined: 1. Dec 2008, 12:12

Re: [RFE] Please make use of Sparse files and Hole Punching

Post by skoehler »

mpack wrote:Isn't "sparse file" just another way of saying "dynamic VDI"?
VDI files allocate 1MB blocks. The blocks allocated for sparse files may be much smaller.
mpack wrote:A fixed size sparse file makes no sense to me. People choose fixed size because they heard it was faster, or more robust. And as mentioned, if you wanted a sparse representation of a fixed size VDI, thats... uh, that's dynamic VDI.
The point of my post is more about hole-punching than sparse files. However, hole punching is only available for sparse files, AFAIK.

As I pointed out in my post, VirtualBox currently supports to actually shrink dynamic VDI files if the guest TRIMs sectors. However, this is disabled by default and VirtualBox can only do so by moving 1MB blocks away from the end of the VDI file.
Thus, it can only reduce the size of the VDI file if the guest TRIMs a continuous aligned 1MB block. Using sparse files with hole-punching would allow to discard blocks from a VDI file without moving data inside the VDI file. Hole punching actually translates into a TRIM by the host. So hole punching allows for a much more direct way to pass TRIMs by the guest through to the host's disk drive.

As you noted, it doesn't make much sense to use sparse files for dynamic VDI files. So that's why I suggested using sparse files with hole punching for fixed size VDI files. Then again, hole punching may also allow to discard blocks of dynamic VDI files without the additional work of moving 1MB blocks.
mpack wrote:"Sparse file" is a term which has been known to me for decades. Are you perhaps referring to a particular implementation of sparse file, for a particular purpose, in a particular filesystem on a particular OS? If so it would be nice if the text said so - for those of us who don't use the same OS.
I do not know whether hole punching is available on Windows or OS X. On Linux however, hole punching is available if the sparse file resides in a ext4 file system. I did not test other filesystems besides ntfs and ntfs-3g (the ntfs driver primarily used on Linux) does not seem to support hole punching.
Post Reply