BOINC LHC@Home CentOS vBox crash

Discussions about using Linux guests in VirtualBox.
Post Reply
skydivingnerd
Posts: 4
Joined: 16. Nov 2021, 00:47

BOINC LHC@Home CentOS vBox crash

Post by skydivingnerd »

I have a Win10 host with VirtualBox 6.1.28 and its companion 6.1.28 Extension pack installed for running CentOS VMs from the LHC@Home project (h t t p s : / / lhcathome.cern.ch/lhcathome/). The project receives work units from the project servers which creates a new .vbox VM on demand for that individual work unit. I have been experiencing failures where the VM quits and the task for the project is listed as a "Computation error" after about 20 minutes. This approximate 20 min time frame is a timeout and has been consistent. I've been discussing the issue on the LHC@Home forums and it is not any issue with the higher project or my BOINC Client.

I have several of the log files from the last two most recent failures, but I have no idea what I'm looking for and need help in troubleshooting the logs. I do know that the VBoxHardening.log does not exit with 0. Attached are the VBox, VBoxHardening, VBoxUI, and vbox_trace logs along with the LHC@Home Task log.
Attachments
VBox logs.zip
VBox, VBoxHardening, VBoxUI, and vbox_trace logs with LHC@Home Task log
(45.46 KiB) Downloaded 17 times
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: BOINC LHC@Home CentOS vBox crash

Post by mpack »

skydivingnerd wrote:I do know that the VBoxHardening.log does not exit with 0.
That would be, it appears, because it doesn't include the exit phase at all. It looks like the VM was still running when this and all the other logs was grabbed. Perhaps it is true that the VM aborted, but that is incredibly unusual, especially in a VM that ran for 15 minutes. If it's going to abort it usually does so in the first few seconds.

Why is the VM installed in ""C:\ProgramData\BOINC"? That is not a folder that VirtualBox creates.

Your graphics settings badly need attention. Did you actually reduce the graphics RAM allocation? You have plenty of RAM, I would increase this to 128MB :-
00:00:00.656363 VRamSize <integer> = 0x0000000000800000 (8 388 608, 8 MB)
In fact all of the VM settings badly need attention. First the OS template is generic: unfortunately I believe BOINC is Berkeley Unix right, that means it isn't actually a supported guest OS.

Number of VM cores should be 2. VM RAM at 2048MB looks low for a 64bit OS (based on the fact that you chose a 64-bit Linux template), I would increase to 8192MB.

You currently have Guest Additions 5.2.6 installed. This is of course horrifically out of date. Unfortunately I don't know that any GAs will work in a Berkeley Unix guest.
skydivingnerd
Posts: 4
Joined: 16. Nov 2021, 00:47

Re: BOINC LHC@Home CentOS vBox crash

Post by skydivingnerd »

mpack,
Why is the VM installed in ""C:\ProgramData\BOINC"? That is not a folder that VirtualBox creates.
The VM runs within the structure of the BOINC client. Each different project attached to BOINC gets its own project folder for initial data and/or executables. All work-units being actively run are done so from their own individual folder within the "C:\ProgramData\BOINC\slots\" folder. This is by design of the BOINC client and not configurable for an end user.
Your graphics settings badly need attention. Did you actually reduce the graphics RAM allocation? You have plenty of RAM, I would increase this to 128MB :-

00:00:00.656363 VRamSize <integer> = 0x0000000000800000 (8 388 608, 8 MB)
The vramsize is specified at that size by the LHC@Home project administrators. They designed the project's VM to do the calculations necessary and return the results to the project servers. Normally these VMs run in a headless configuration so there is no need for any graphics beyond the simple CLI interface output from VBox while the VM is running, if you open it at all.
First the OS template is generic: unfortunately I believe BOINC is Berkeley Unix right, that means it isn't actually a supported guest OS.
BOINC is Berkeley Open Infrastructure for Network Computing https://boinc.berkeley.edu/. It is not an operating system but an open source middle-ware client for distributed computing work-units for computation on a participants host. I believe the guest OS for the LHC@home project is CentOS. As an end user/participant, I don't have viability into the guest OS, its version, or installed software.

From your look over the logs, there is nothing glaringly wrong that could lead to the shutdown behavior?
mpack
Site Moderator
Posts: 39156
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Mostly XP

Re: BOINC LHC@Home CentOS vBox crash

Post by mpack »

skydivingnerd wrote: From your look over the logs, there is nothing glaringly wrong that could lead to the shutdown behavior?
Everything I noted in the logs was revealed in my previous post. I didn't keep anything in reserve.

I'm glad to hear that the OS is CentOS Linux, not Berkeley Unix. That should make it a supported guest. However in that case the OS template in the VM settings should be Red Hat (64bit), not a generic Linux.

At the very least you should increase graphics RAM. I don't care that this is how the VM was supplied to you. It's wrong, and easily the most likely cause of the crash. By all means approach BOINC support for their take on the multiple configuration errors I have raised.
scottgus1
Site Moderator
Posts: 20965
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: BOINC LHC@Home CentOS vBox crash

Post by scottgus1 »

The Virtualbox logs in the zip were saved while the VM was running, so may not show all the data which might show the problem.

This VM setup seems heavily scripted by a 3rd-party supervisory program. Vagrant-style (though not apparently actually Vagrant) This may explain the seemingly unusual setup shown in the log.

Since 6.1.28 is new and has shown a couple bugs, could you please roll back to 6.1.26 Virtualbox and Extension Pack?

Also, I see that the Virtualbox RDP server is enabled, on port 53721, shown in vbox_trace.txt. If you use the host Remote Desktop client app and RDP into 127.0.0.1:53721 while the VM is running, or 'Show' the VM's window in the Virtualbox Manager while the VM is running, both of which are possible despite the 'headless' VM, can you see what the VM OS is doing?
skydivingnerd
Posts: 4
Joined: 16. Nov 2021, 00:47

Re: BOINC LHC@Home CentOS vBox crash

Post by skydivingnerd »

This VM setup seems heavily scripted by a 3rd-party supervisory program. Vagrant-style (though not apparently actually Vagrant) This may explain the seemingly unusual setup shown in the log.
Yes, it is. This is done by design of the LHC@home project. It is not something that I, as an end user participant, can see.
Since 6.1.28 is new and has shown a couple bugs, could you please roll back to 6.1.26 Virtualbox and Extension Pack?
In troubleshooting this issue with the LHC@Home forums, I've duplicated this error on VirtualBox 6.1.12 and 6.1.16 with extension packs before updating to 6.1.28. The issue persists with the VM crashing.
The Virtualbox logs in the zip were saved while the VM was running, so may not show all the data which might show the problem.
The log files are packaged and removed from the folder structure when the VM ends. This is BOINC packing the results of the work-unit for transmission back to the LHC@home project servers. I cannot stop it, nor have a way to have them duplicated. The logs in the .zip I provided were copied about 20-30 seconds from the VM ceasing. Copying them is a manual effort after waiting for the VM to end. Right now, my Win10 client only gets one work-unit task about every 24 hours cause of the high failure rate of my host. I've watched the VM console display when the VM ends and I can see about getting a screenshot of it before it stops, but I don't see anything useful on the console when it ends. I'll upload a screenshot if I can catch one this weekend. I'll also try to SSH into it via 53721 when I see a work-unit running.
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: BOINC LHC@Home CentOS vBox crash

Post by fth0 »

According to the VBox.log file, the VM needed less than 2 minutes to start the guest OS and the old VirtualBox Guest Additions. Approximately 14 minutes later, it tried to mount the shared folder. The Task Report indicates a checkpoint interval of 600 seconds (10 minutes) and stops the VM after exactly 20 minutes. My educated guess would be that the LHC@Home calculation finished within 14 minutes, the mounting or something else around that time failed, and the situation was cleared up after 20 minutes.
scottgus1
Site Moderator
Posts: 20965
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: BOINC LHC@Home CentOS vBox crash

Post by scottgus1 »

A possible test: Since Virtualbox is installed on your computer, you should be able to make your own VM. Please try to make a new VM with some Linux OS in it, and see if the VM runs and remains stable. Run a long video on repeat in it to see if it locks up. This will help determine if Virtualbox in your host is having issues, or perhaps only the BOINC VM.

Also, a VM's full logs are stored in the Logs subfolder in the VM's folder. If you go to "C:\ProgramData\BOINC\slots\0\boinc_{vmcode}\" you should see a Logs subfolder, if BOINC doesn't delete it.
skydivingnerd
Posts: 4
Joined: 16. Nov 2021, 00:47

Re: BOINC LHC@Home CentOS vBox crash

Post by skydivingnerd »

I've found that I had two issues. The heartbeat file not found seems to have been resolved with my upgrade to VBox 6.1.28, after which glidein, a distributed filesystem part of the LHC@Home CMS task, could not mount its folder. This was due to me not having TCP/1094 allowed outbound through my firewall. I'd previously had it open but was told several months ago that it was not needed. Now I have CMS work-unit tasks running on my Win10 host. I'm waiting for a few of those to complete now and will discuss with the LHC@Home forum on whether or not they are completing correctly. They are expected to take 11-13 hours each to complete.
Also, a VM's full logs are stored in the Logs subfolder in the VM's folder. If you go to "C:\ProgramData\BOINC\slots\0\boinc_{vmcode}\" you should see a Logs subfolder, if BOINC doesn't delete it.
Yes, the BOINC Client puts the LHC@Home task data within it's own folder structure under .\slots\. The log data I included in the .zip file at the start of the thread is from that running task folder. Trouble was with getting it as after the task failed, the BOINC Client packaged the results, deleted the files from the running task folder i.e ".\slots\0\...", and sent the results back to the project server.
Post Reply