Zombie VMs - is this possible?

Discussions related to using VirtualBox on Linux hosts.
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Zombie VMs - is this possible?

Post by Emma2 »

I am using Virtualbox on a Linux host (Ubuntu 20.04) with 64 GB, and "someone" seems to eat up my memory. Until last night, I had 43% in use, but this morning, it was 74%.
Connecting to that host via SSH and checking my VMs gave the impression that only two of the usual six were running, so I "re"-started them.
Memory usage on that host increased to 85% immediately! Checking memory with Webmin, however, shows the SAME VMs running "several times" (see screenshot).

Is that at all possible? If yes, how can this happen? And what can I do against that?

Background:
1. I am shutting down the VMs by vboxmanage controlvm <vm> acpipowerbutton, wait until they're down, save them to backup, restart them. This seems to work well.
2. To start the VMs, I log on via SSH, call vboxmanage startvm <vm> --type headless and leave the host via Ctrl+D afterwards. This seems to be no problem either.
Attachments
svhdev-screen-2021031901.png
svhdev-screen-2021031901.png (125.7 KiB) Viewed 2515 times
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

I found out now, the problem seems to be:
[*]If I leave the SSH with Ctrl+D, the VMs continue running (what I want, but why is this so?).
[*]If I open a new SSH, I do not see the running VMs
But, has anyone an idea, how I could sort this out?
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

So... what does vboxmanage startvm <vm> --type headless actually do? Does it start the VM and then "send it to the background"? Otherwise, I would expect to get the VM killed, if I exit that shell.
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: Zombie VMs - is this possible?

Post by mpack »

Why would you expect dropping a communications connection to shut down the VM? The VM will shut down when you send it a shutdown command, internally or externally.
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

I thought sending Ctrl+D to the SSH is not only "dropping a communications connection", as far as I understand, this will close/logout this very session.

My "problem" is that a VM, started by a cronjob of my user "localadmin", is not visible afterwards if I SSH to the host with that same user. Why is that so?
And vice versa: if I SSH to the host, start a VM and log out from SSH, the nightly cronjob probably does not see this running VM. Why is that so?
"Normally" (from all I read about Linux), I would even expect the VM to "die" if I leave(exit the shell I started it from - but it keeps running. Why is that so?
Martin
Volunteer
Posts: 2561
Joined: 30. May 2007, 18:05
Primary OS: Fedora other
VBox Version: PUEL
Guest OSses: XP, Win7, Win10, Linux, OS/2

Re: Zombie VMs - is this possible?

Post by Martin »

Emma2 wrote:[*]If I open a new SSH, I do not see the running VMs
How did you check for running VMs?
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

Martin wrote:
Emma2 wrote:[*]If I open a new SSH, I do not see the running VMs
How did you check for running VMs?
vboxmanage list runningvms

... and even more strange, I just checked that a machine, say "vm1" is running (I see it in Webmin, and I can connect to it).
But when I tried vboxmanage controlvm vm1 acpipowerbutton I get "VBoxManage: error: Machine 'vm1' is not currently running"

NB: This does not happen if I start a VM now, quit the SSH, and log in again. In this case, I do see that machine running

Curiously enough, this "not seeing a running VM" does only happen on one distinct host, but the only difference I see between this one and two other hosts, is their CPUs: the host causing these problems has an AMD Ryzen 5, whereas the others are running on Intel processors.
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Zombie VMs - is this possible?

Post by fth0 »

Please read 10.2. Oracle VM VirtualBox Executables and Components, and especially pay attention to the VBoxSVC service daemon, which is key to understand what's happening.
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

I have read this now, but I actually do not fully understand it, neither do I understand why this explains the behaviour I experience as problematic...

At first, I thought that, after starting a VM with vboxmanage startvm <vm> --type headless, its process would be "owned" afterwards by the VBoxSVC service, but this can't be true because the actual process is still owned by the user having launched the VM. So, would you mind to tell me why a VM keeps running even if I quit the shell I launched it from?

And, more important, I still have no idea how I can "get hold of" such a VM that I do not see in vboxmanage list runningvms. Could you possibly give me a hint there, too?
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Zombie VMs - is this possible?

Post by fth0 »

Ok, let's start with some basics:

Linux processes have a parent-child relationship, with the init process as the common ancestor. Connect to the host via SSH, and execute ps -elfH in the secure shell. Look for the ps -elfH command itself in the output, and note the (chain of) parents in the lines above it (e.g. sshd -> bash -> ps).

When you start a VM using the VBoxManage command, the VBoxManage process is executed like the ps command above as a child of the shell (e.g. bash). Now comes the tricky part: The VBoxSVC process is started as a daemonized process, which means that it becomes a child of the init process and gets rid of its standard input, standard output and standard error files. The VBoxHeadless processes (VMs) are then started as children of the VBoxSVC process. Verify this yourself with the ps -elfH command. In consequence, you can close the SSH connection and all VMs keep running.

In the output of the ps -elfH command, you may have noticed that there are two VBoxSVC processes running. I'll make an educated guess that the reason for that is how you make use of the cron service, which is run by the root user. VirtualBox needs at least the HOME environment variable to find its (per user) global configuration files in ~/.config/VirtualBox, and maybe other parts of the user environment. You'll probably find a /root/.config/VirtualBox directory with your unwanted "second configuration". So you have to ensure that the environment of the localadmin user gets read in by your cron jobs ...
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

Hi, again. And thanks for these explanations, but I still do not understand this neither can I "repair" it. :(

My cronjobs are run as the same user I am working with (localadmin), as you can see here:

Code: Select all

localadmin@svh-dev:~$ sudo crontab -l
no crontab for root
localadmin@svh-dev:~$ crontab -l
(...)
0 4 * * 2-6 sh /home/localadmin/backup-vm.sh svr-sql /VMs/1819 /svr-bak/day >> "/home/localadmin/backuplog.txt"
localadmin@svh-dev:~$ 
(My script (appended for information) takes three parameters: VM, VM directory, backup target directory.)

And even if it were so that my cronjobs were run as root, I do not see the (re)started/duplicated VMs by sudo vboxmanage list runningsvms either.

Or, do you mean that my cronjobs (running as my localadmin user) don't have access to localadmin's environment?
I just saw that I can prepend an environment path before the command. WIll try this and report.

Fact is, that this very script does work as expected (expected by me, at least) on two other hosts, it is only one host I am experiencing these problems on.
All three hosts were installed two weeks ago, one after the other, by copy&paste-ing all commands, i.e. I really expect them to be "identically installed".
The only real difference is that the "problematic" one has an AMD Ryzen 5 CPU, whereas the others rely on Intel CPUs.

However, I now found out that the VMs, if I check them with vboxmanage showvminfo <vm> | grep -i state are either paused or power off.
And if they are powered off, they do not resume on vboxmanage controlvm <vm> resume. And above all, their processes are still "visible" (in Webmin).

So, how can I "reconnect to" or at least influence these running or paused or powered off VMs?
How can i find out why it is just this one host which obviously is working different than the others?
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

So...
there actually is a .config directory in /root, and it has changes from today. But how can I explain this?
Does my crontab use root's environment if started without any environment specification?

However... the VBoxHeadless proesses I can see in Webmin are very well owned by my localadmin - not by root
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

Just pondering... is it possible that my scripts - being started with sh instead of bash - are using a different environment than my SSH using bash?
If this is so, everything fits and makes sense...
... everything? Not really, because I still don't have that same problem on my two other hosts (using the same "wrong" crontabs with sh)... I simply don't get it.
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Zombie VMs - is this possible?

Post by fth0 »

Emma2 wrote:My cronjobs are run as the same user I am working with (localadmin)
Emma2 wrote:Or, do you mean that my cronjobs (running as my localadmin user) don't have access to localadmin's environment?
fth0 wrote:So you have to ensure that the environment of the localadmin user gets read in by your cron jobs ...
What I previously wrote about the cron service, I recalled from my memory from some decades ago. In the beginning, there was only one cron service using the global /etc/crontab file, all commands would be executed by the root user account, and the commands inside the cron table would begin with su - <user> if so wanted. Some admins would forget the "-", and end up with switching to the correct user but keeping the root environment. Additionally, the cron service itself would only set very few environment variables.

Nowadays, there are a lot of cron services with a lot of possibilities. Also, I'm not sure if sudo crontab -l shows the content of /etc/crontab. To verify what's true, you could output $HOME, $USER and $PWD inside your script.

<Philosophical break begin>
Emma2 wrote:Fact is, that this very script does work as expected (expected by me, at least) on two other hosts
When you're investigating a problem that's difficult to understand, you should not take anything for granted, but verify everything. If the outcome of your setup is as expected, that doesn't necessarily mean that the way to reach that outcome is as expected. Many people have fallen into this trap before, because they usually did not examine a working setup. ;)
<Philosophical break end>
Emma2 wrote:there actually is a .config directory in /root, and it has changes from today. But how can I explain this?
Every user incl. the root user can have a .config directory. Or did you mean to say that there is a /root/.config/VirtualBox directory?
Emma2 wrote:So, how can I "reconnect to" or at least influence these running or paused or powered off VMs?
That's difficult to answer generally when the state of the VMs is not known. Shut down all running VMs and reboot the host. And keep your backups ready, in case some VMs are corrupted.
Emma2 wrote:How can i find out why it is just this one host which obviously is working different than the others?
Whatever you examine, do it on all hosts. It's getting philosophical again ... ;)
Emma2
Posts: 51
Joined: 16. Feb 2021, 11:59

Re: Zombie VMs - is this possible?

Post by Emma2 »

fth0 wrote:To verify what's true, you could output $HOME, $USER and $PWD inside your script.
For now, I changed sh in my crontab to bash and will observe the outcome. I did check it, and there actually are two .config/VirtualBox directories, one in /root and one in /home/localadmin. And the log file in root's VBox config was written to last. So, it actually looks as if having sh in my crontab uses root's environment - although I did not configure my crontab with sudo.
Anyway, I will see what happens, and if the problem persists, I will log the variables as suggested.
fth0 wrote:Or did you mean to say that there is a /root/.config/VirtualBox directory?
Yes. See above.
fth0 wrote:Whatever you examine, do it on all hosts. It's getting philosophical again ... ;)
"Of course", I do so. That's why I am so disturbed by having this "effect" on only one of my three hosts. Although this sounds (and probably is) ridiculous, the only difference between these three is their CPU. :(
Is there any tool to "compare" two different Linux installations? As you said, it could be wise to check everything (only being confident to have installed all three identically does not entirely ensure this...
Post Reply