HOWTO: Use Red Hat Cluster Suite to manage your VMs

Locked
Rainmaker
Posts: 3
Joined: 17. Oct 2010, 00:20
Primary OS: Debian other
VBox Version: PUEL
Guest OSses: CentOS

HOWTO: Use Red Hat Cluster Suite to manage your VMs

Post by Rainmaker »

Hi all, first post for me, so please go easy on me.

I was looking for ways to have VirtualBox run in a cluster (not a cluster on guests), so guests could automaticlly failover in the event a host goes down.

After searching Google and these forums for a while, I had to come to the conclusion, that such a thing does not exist. So I tried to create this.

I have only 1 machine to test with, so I haven't tested the howto on a "real" cluster. For me, the real requirement was redhat cluster suite (RHCS) restarting the VM when it crashed. This works.

My host OS here is Debian, guest OS is CentOS. Though any distribution capable of running RHCS and VirtualBox should be able to use the script below to set this up.

First of all, apt-get / yum install / aptitude RHCS
On Debian, the package needed is called redhat-cluster-suite.

Multiple node only:
If you use multiple nodes, you should setup a GFS2 filesystem now, so both nodes can access the filesystem. There are multple ways to set this up; shared disk on SAN, DRBD etc. I am unfortunatly unable to test this, but there are plenty of good howtos on the net. I would recommend this one or this site.

Now, you need to create a user, which has the ability to run VirtualBox (be member of vboxusers group), and has it's home directory on the GFS. This is because the user needs access to the ~/.VirtualBox directory on all nodes.
------------------------ (end multiple node) -----------------------

Now, RHCS supports a number of tags in it's cluster.conf file. Unfortunatly, there is no <vbox> tag. So, use this script to make a <vbox> tag possible. Place this in /usr/share/cluster/vbox.sh:

Code: Select all

#!/bin/bash

#
# Definition for <vbox> tag, for launching
# VirtualBox VM's as resources
#

LC_ALL=C
LANG=C
PATH=/bin:/sbin:/usr/bin:/usr/sbin
export LC_ALL LANG PATH

# Grab nfs lock tricks if available

. $(dirname $0)/ocf-shellfuncs

meta_data()
{
    cat <<EOT
<?xml version="1.0" ?>
<resource-agent version="rgmanager 2.0" name="vbox">
    <version>1.0</version>

    <longdesc lang="en">
        This is a Oracle VirtualBox VM. All types of guests which
        VirtualBox supports, should work.
    </longdesc>
    <shortdesc lang="en">
        Oracle VirtualBox VM
    </shortdesc>

    <parameters>
        <parameter name="vmname" unique="1" primary="1">
            <longdesc lang="en">
                The name or UUID for the VM, as passed to VBoxHeadless.
            </longdesc>

            <shortdesc lang="en">
                VM name or UUID
            </shortdesc>

            <content type="string"/>
        </parameter>

        <parameter name="vrdpport">
            <longdesc lang="en">
                The port the VRDP server should listen on.
            </longdesc>

            <shortdesc lang="en">
                VRDP port number
            </shortdesc>
        </parameter>

         <parameter name="shutdowntime" default="120">
            <longdesc lang="en">
                The time in seconds a VM gets to shutdown after an ACPI event,
                before forceful shutdown is used. The maximum is 600 seconds
                (10 minutes).
            </longdesc>

            <shortdesc lang="en">
                Maximum time for VM shutdown after ACPI (seconds).
            </shortdesc>
        </parameter>
    </parameters>

    <actions>
        <action name="start" timeout="20"/>
        <action name="stop" timeout="600"/>
        <action name="status" interval="20" timeout="10"/>
        <action name="monitor" interval="20" timeout="10"/>
        <action name="status" depth="10" interval="60" timeout="20"/>
        <action name="monitor" depth="10" interval="60" timeout="20"/>

        <action name="meta-data" timeout="5"/>
        <action name="validate-all" timeout="20"/>
    </actions>
</resource-agent>
EOT
}

#Check the state of the specified VM
function status_of_vm {
        ocf_log debug "Getting state for VM $1"
        STATE=$(VBoxManage showvminfo $1 --machinereadable | grep -E "^VMState=")

        if [ $STATE = 'VMState="running"' ]; then
                ocf_log debug "VM $1 is running"
                return 0
        else
                ocf_log warning "VM $1 is NOT running!"
                return 1
        fi
}

function wait_for_start {
        I=0
        until status_of_vm $1; do
                ocf_log info "VM $1 is not yet started!"
                sleep 1
                I=$((${I}+1))
                if [ $I -gt $2 ]; then
                        ocf_log err "VM $1 was not started after $I itterations!"
                        return 1
                fi
        done
        ocf_log info "VM $1 has been started"
        return 0
}

#Try to get the VM to shutdown itself
function shutdown_vm {
        if ! status_of_vm $1; then
                ocf_log info "No shutdown neccessery, VM $1 not running"
                return 0
        else
                ocf_log info "Sending $1 ACPI event"
                VBoxManage controlvm $1 acpipowerbutton
                return $?
        fi
}

#Poweroff the specified VM (force)
function poweroff_vm {
        ocf_log warning "Forcefully shutting down VM $1"
        VBoxManage controlvm $1 poweroff
        return $?
}

#Function which tries to shutdown the VM, or use forceful shutdown
#if the timeout expires
function stop_vm {
        I=0
        shutdown_vm $1
        until ! status_of_vm $1; do
                ocf_log debug "Waiting for $1 to shutdown (waited $I seconds)"
                sleep 1
                I=$(($I+1))
                if [ $I -gt 10 ]; then
                        #Shutdown timer expired.
                        ocf_log warning "Shutdown timer for VM $1 expired! Shutting down forcefully!"
                        poweroff_vm $1
                        return $?
                fi
        done
        return 0
}

#Start the specified VM. Mind the &, VBoxHeadless does not return.
function poweron_vm {
        ocf_log info "Starting VM $1"
        VBoxHeadless -p $2 -s $1 &
        CODE=$?
        ocf_log info "VBoxHeadless returned $CODE"
        ocf_log info "VM $1 started"
}

case $1 in
start)
        poweron_vm ${OCF_RESKEY_vmname} ${OCF_RESKEY_vrdpport}
        wait_for_start ${OCF_RESKEY_vmname}
        exit $?
        ;;
stop)
        stop_vm ${OCF_RESKEY_vmname} ${OCF_RESKEY_shutdowntime}
        exit $?
        ;;
status|monitor|validate-all|verify_all)
        status_of_vm ${OCF_RESKEY_vmname}
        exit $?
        ;;
restart)
        stop_vm ${OCF_RESKEY_vmname} ${OCF_RESKEY_shutdowntime}
        poweron_vm ${OCF_RESKEY_vmname} ${OCF_RESKEY_vrdpport}
        wait_for_start ${OCF_RESKEY_vmname}
        exit $?
        ;;
meta-data)
        meta_data
        exit 0
        ;;
*)
        echo "usage: $0 {start|stop|status|monitor|restart|meta-data|validate-all}"
        exit $OCF_ERR_UNIMPLEMENTED
        ;;
esac
After placing this script in /usr/share/cluster and restarting cman, rgmanager will recognize the <vbox> tag.

The <vbox> tag takes 3 arguments:
vboxname (required): The VM name or UUID for the guest
vrdpport (required): The VRDP port number
shutdowntime (optional, default 120 seconds): time in seconds the script will wait for a guest to shutdown, before using VBoxManager's poweroff function on it.

Here is the cluster.conf I use. Keep in mind this has no GFS, only single node, hence no fencing. When using GFS, always configure fencing.

My cluster.conf:

Code: Select all

<?xml version="1.0"?>
<cluster name="vboxcluster" config_version="6">
<cman expected_votes="1">
</cman>
<fence_daemon post_fail_delay="1" post_join_delay="3" clean_start="0"/>
<fencedevices>
        <fencedevice name="human" agent="fence_manual"/>
</fencedevices>

<clusternodes>
        <clusternode name="oread" nodeid="1">
                 <fence>
                        <method name="fence1">
                                <device name="human"/>
                        </method>
                </fence>
        </clusternode>
</clusternodes>

<rm>
        <failoverdomains>
                <failoverdomain name="vboxdom">
                        <failoverdomainnode name="oread" priority="1"/>
                </failoverdomain>
        </failoverdomains>
        <service autostart="1" recovery="restart" domain="vboxdom" name="LDAP1_service">
                <vbox vmname="LDAPMaster1" vrdpport="8001" shutdowntime="60" name="VM1"/>
        </service>
        <service autostart="1" recovery="restart" domain="vboxdom" name="LDAP2_service">
                <vbox vmname="LDAPMaster2" vrdpport="8002" name="VM2"/>
        </service>
        <service autostart="1" recovery="restart" domain="vboxdom" name="IPA1">
                <vbox vmname="IPA1" vrdpport="8003" name="VM3"/>
        </service>
        <service autostart="1" recovery="restart" domain="vboxdom" name="IPAC_service">
                <vbox vmname="IPAClient" vrdpport="8004" name="VM4"/>
        </service>
</rm>

</cluster>
As said, this is pretty minimal. But it does work as expected.

Have fun with it.
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

Re: HOWTO: Use redhat cluster suite to manage your VM's

Post by Technologov »

I will move this to "HOW-TOs" section of the forum.

In addition, it would be nice to test with several hosts and on RHEL host.

Please discuss this topic in separate thread.
Locked