autostop savestate VBoxSVC hung on reboot of host
Posted: 5. Feb 2015, 01:22
VirtualBox-4.3.21-97963
Host: OSX 10.10.2 on Xserve late 2009. 6GB RAM
Guest: OSX 10.10.2 1proc, 2GB RAM
no console session logged-on
This is a bit of a rabbit hole. I wanted to setup the VMs to automatically start at boot, and automatically savestate on shutdown or reboot of the host.
I started here /manual/ch09.html#autostart-osx
and I got the autostart working just fine by setting up /etc/vbox/autostart.cfg and changing the vm settings:
modifyvm --autostart-enabled on
modifyvm --autostart-delay 30
modifyvm --autostop-type savestate
Autostart works fine, but I noticed the autostop was simply not happening. It finally dawned on me that the LaunchDaemon was not setup properly, or, rather the shell script it executes is not a "daemon". If you look at the file VBoxAutostartDarwin.sh that the LaunchDaemon runs, you'll notice at the end there is a (possibly vestigial) trap:
trap vboxStopAllUserVms HUP KILL TERM
So, clearly, the intent is that if the shell script receives the TERM signal it will stop all the VMs according to the autostop-type, but this can't happen since the script immediately exists after starting the VMs. I modified this script to include a sleep loop, to make it a proper daemon:
trap vboxStopAllUserVms HUP KILL TERM
while true; do
sleep 10
done
So with this change you can now launchctl stop org.virtualbox.vboxautostart and it will actually savestate your VMs. So now when I reboot, launchd will send that signal to the script and shutdown/savestate all the VMs!! Not quite. It still wasn't happening. So, I thought maybe launchd wasn't waiting long enough. I added a key to: /Library/LaunchDaemons/org.virtualbox.vboxautostart.plist
<key>ExitTimeOut</key>
<integer>600</integer>
So now launchd will wait a full 10 minutes after sending SIGTERM before forcibly killing VBoxAutostartDarwin.sh. When I do a reboot now, it takes 10 minutes to reboot and it still doesn't savestate my VMs! VirtualBox is getting stuck.
So, I start another sshd on another port so I can remain logged in during all of this badness:
/usr/sbin/sshd -p 23
and now I reboot again and watch what happens:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
501 4448 100.0 0.2 2525860 14152 ?? R 2:46PM 0:18.75 /Applications/VirtualBox.app/Contents/MacOS/VBoxSVC --auto-shutdown
501 4470 8.5 24.4 4093064 1538088 ?? S 2:47PM 0:17.87 /Applications/VirtualBox.app/Contents/MacOS/VBoxHeadless --comment test --startvm 1a806c85-cff4-46d8-aef8-0f0a090a4103 --vrde config
501 4446 0.0 0.1 2452220 7112 ?? S 2:46PM 0:00.05 /Applications/VirtualBox.app/Contents/MacOS/VBoxXPCOMIPCD
root 3781 0.0 0.0 2445660 1052 ?? Ss 2:46PM 0:00.06 /bin/sh /Applications/VirtualBox.app/Contents/MacOS/VBoxAutostartDarwin.sh /etc/vbox/autostart.cfg
root 2332 0.0 0.0 2445660 1172 s001 S 2:29PM 0:00.01 bash
root 2331 0.0 0.0 2444420 2508 s001 S 2:29PM 0:00.01 sudo bash
501 2328 0.0 0.0 2452828 1148 s001 Ss 2:29PM 0:00.01 -bash
501 2327 0.0 0.0 2482656 1112 ?? S 2:29PM 0:00.02 sshd: tss@ttys001
root 2324 0.0 0.1 2481608 3976 ?? Ss 2:29PM 0:00.05 sshd: tss [priv]
root 2323 0.0 0.0 2443888 612 ?? Ss 2:29PM 0:00.00 /usr/sbin/sshd -p 23
root 935 0.0 0.2 2558304 12960 ?? S 2:25PM 0:00.11 /System/Library/CoreServices/ManagedClient.app/Contents/MacOS/ManagedClient -s
root 4488 0.0 0.0 2432788 584 s001 R+ 2:48PM 0:00.00 ps auwxww
root 1 0.0 0.1 2535612 6152 ?? Us 2:24PM 0:01.67 /sbin/launchd
root 4486 0.0 0.0 2432752 536 ?? S 2:48PM 0:00.00 sleep 10
These are ALL the processes alive on the box after the shutdown -r command is given. This entire time the VM guest is accessible and running normally, including network access, until the 10 minutes is up and it is abruptly killed:
Nothing indicates a savestate was initiated at all:
bash-3.2# tail VBox.log
00:00:06.650788 VUSB: attached 'HidKeyboard' to port 1
00:00:06.650807 VUSB: attached 'HidMouse' to port 2
00:00:09.805507 AIOMgr: Host limits number of active IO requests to 16. Expect a performance impact.
00:00:11.402026 NAT: link up
00:00:11.402332 NAT: DNS#0: 140.198.128.198
00:00:11.402341 NAT: DNS#1: 140.198.128.199
00:00:11.423178 NAT: IPv6 not supported
00:00:11.424196 NAT: DNS#0: 140.198.128.198
00:00:11.424208 NAT: DNS#1: 140.198.128.199
00:00:11.424215 NAT: DHCP offered IP address 10.0.2.15
So, I think the problem is that VBoxSVC gets hung up at 100% due to a dependency being yanked out. Since `launchd stop org.virtualbox.vboxautostart` works perfectly (without rebooting), I have to believe it has something to do with another service that VirtualBox depends on that is being killed too early by launchd. As you can see in the table, nearly ALL the processes on the box are killed almost immediately, presumably before VBox gets to do its thing. There doesn't seem to be any easy way to create dependencies or set any kind of shutdown "order" with launchd. Ideally you could force a script to run "last" on boot, and shutdown "1st" on shutdown. . . alas, launchd... perhaps somehow has written a C application that hooks into every dependency of OSX, and wedges good old init.d scripts back into the system. . . I can dream, can't I? Any other ideas out there? Thanks!
Host: OSX 10.10.2 on Xserve late 2009. 6GB RAM
Guest: OSX 10.10.2 1proc, 2GB RAM
no console session logged-on
This is a bit of a rabbit hole. I wanted to setup the VMs to automatically start at boot, and automatically savestate on shutdown or reboot of the host.
I started here /manual/ch09.html#autostart-osx
and I got the autostart working just fine by setting up /etc/vbox/autostart.cfg and changing the vm settings:
modifyvm --autostart-enabled on
modifyvm --autostart-delay 30
modifyvm --autostop-type savestate
Autostart works fine, but I noticed the autostop was simply not happening. It finally dawned on me that the LaunchDaemon was not setup properly, or, rather the shell script it executes is not a "daemon". If you look at the file VBoxAutostartDarwin.sh that the LaunchDaemon runs, you'll notice at the end there is a (possibly vestigial) trap:
trap vboxStopAllUserVms HUP KILL TERM
So, clearly, the intent is that if the shell script receives the TERM signal it will stop all the VMs according to the autostop-type, but this can't happen since the script immediately exists after starting the VMs. I modified this script to include a sleep loop, to make it a proper daemon:
trap vboxStopAllUserVms HUP KILL TERM
while true; do
sleep 10
done
So with this change you can now launchctl stop org.virtualbox.vboxautostart and it will actually savestate your VMs. So now when I reboot, launchd will send that signal to the script and shutdown/savestate all the VMs!! Not quite. It still wasn't happening. So, I thought maybe launchd wasn't waiting long enough. I added a key to: /Library/LaunchDaemons/org.virtualbox.vboxautostart.plist
<key>ExitTimeOut</key>
<integer>600</integer>
So now launchd will wait a full 10 minutes after sending SIGTERM before forcibly killing VBoxAutostartDarwin.sh. When I do a reboot now, it takes 10 minutes to reboot and it still doesn't savestate my VMs! VirtualBox is getting stuck.
So, I start another sshd on another port so I can remain logged in during all of this badness:
/usr/sbin/sshd -p 23
and now I reboot again and watch what happens:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
501 4448 100.0 0.2 2525860 14152 ?? R 2:46PM 0:18.75 /Applications/VirtualBox.app/Contents/MacOS/VBoxSVC --auto-shutdown
501 4470 8.5 24.4 4093064 1538088 ?? S 2:47PM 0:17.87 /Applications/VirtualBox.app/Contents/MacOS/VBoxHeadless --comment test --startvm 1a806c85-cff4-46d8-aef8-0f0a090a4103 --vrde config
501 4446 0.0 0.1 2452220 7112 ?? S 2:46PM 0:00.05 /Applications/VirtualBox.app/Contents/MacOS/VBoxXPCOMIPCD
root 3781 0.0 0.0 2445660 1052 ?? Ss 2:46PM 0:00.06 /bin/sh /Applications/VirtualBox.app/Contents/MacOS/VBoxAutostartDarwin.sh /etc/vbox/autostart.cfg
root 2332 0.0 0.0 2445660 1172 s001 S 2:29PM 0:00.01 bash
root 2331 0.0 0.0 2444420 2508 s001 S 2:29PM 0:00.01 sudo bash
501 2328 0.0 0.0 2452828 1148 s001 Ss 2:29PM 0:00.01 -bash
501 2327 0.0 0.0 2482656 1112 ?? S 2:29PM 0:00.02 sshd: tss@ttys001
root 2324 0.0 0.1 2481608 3976 ?? Ss 2:29PM 0:00.05 sshd: tss [priv]
root 2323 0.0 0.0 2443888 612 ?? Ss 2:29PM 0:00.00 /usr/sbin/sshd -p 23
root 935 0.0 0.2 2558304 12960 ?? S 2:25PM 0:00.11 /System/Library/CoreServices/ManagedClient.app/Contents/MacOS/ManagedClient -s
root 4488 0.0 0.0 2432788 584 s001 R+ 2:48PM 0:00.00 ps auwxww
root 1 0.0 0.1 2535612 6152 ?? Us 2:24PM 0:01.67 /sbin/launchd
root 4486 0.0 0.0 2432752 536 ?? S 2:48PM 0:00.00 sleep 10
These are ALL the processes alive on the box after the shutdown -r command is given. This entire time the VM guest is accessible and running normally, including network access, until the 10 minutes is up and it is abruptly killed:
Nothing indicates a savestate was initiated at all:
bash-3.2# tail VBox.log
00:00:06.650788 VUSB: attached 'HidKeyboard' to port 1
00:00:06.650807 VUSB: attached 'HidMouse' to port 2
00:00:09.805507 AIOMgr: Host limits number of active IO requests to 16. Expect a performance impact.
00:00:11.402026 NAT: link up
00:00:11.402332 NAT: DNS#0: 140.198.128.198
00:00:11.402341 NAT: DNS#1: 140.198.128.199
00:00:11.423178 NAT: IPv6 not supported
00:00:11.424196 NAT: DNS#0: 140.198.128.198
00:00:11.424208 NAT: DNS#1: 140.198.128.199
00:00:11.424215 NAT: DHCP offered IP address 10.0.2.15
So, I think the problem is that VBoxSVC gets hung up at 100% due to a dependency being yanked out. Since `launchd stop org.virtualbox.vboxautostart` works perfectly (without rebooting), I have to believe it has something to do with another service that VirtualBox depends on that is being killed too early by launchd. As you can see in the table, nearly ALL the processes on the box are killed almost immediately, presumably before VBox gets to do its thing. There doesn't seem to be any easy way to create dependencies or set any kind of shutdown "order" with launchd. Ideally you could force a script to run "last" on boot, and shutdown "1st" on shutdown. . . alas, launchd... perhaps somehow has written a C application that hooks into every dependency of OSX, and wedges good old init.d scripts back into the system. . . I can dream, can't I? Any other ideas out there? Thanks!