Is VirtualBox good for my application - multiple instances .
Posted: 22. May 2009, 00:20
Hello,
Is VirtualBox good for my application - multiple instances of a large model running in parallel?
I'm enticed by Virtualbox, but unfortunately have little background on the IT side. I wonder if anyone on this forum can help me out, including with helping me figure out the right questions to ask and how to ask them. My situation follows (hopefully not in too much detail).
I want to invest some unfortunately very limited grant money in hardware ($10,000 or so), and I’m trying to figure out whether I can extend my effective computational power (model runs/dollar) using Virtualbox.
I am running a hydrological model (called WEAP) that runs on Windows XP. I want to use it in a Monte Carlo analysis, which means I’ll need to run it thousands of times, collecting the data from each run. I have a program to call my model and harvest the data. The model is quite large (takes 1hr to run on a my current workstation, a new dell T3400 with 2 core 3.XX ghz processor, RAID 5, 8GB RAM).
I have not formally evaluated the current computational bottleneck. I don’t have the skills to even know where to begin on really doing that. But below is what information I have, some of which comes from the developer of the modeling software:
-The model code is not parallelized, so can only use a single processor core for any instance of the model. Dual core machines are possibly helpful because they can allocate the other core for other processes than the model. Beyond this, probably not helpful.
-One can only run a single instance of the model on each instance of Windows at any one time.
-The model appears to be RAM-limited. To be more specific, using a RAMDisk of 3-6 GB speeds it up tremendously, and it is effectively unrunnable without a ramdisk (set up as a virtual hard drive). So I guess it’s the reading and writing to the model database that is the limiting factor, as opposed to the mathematical calculations, but I’m not sure.
-I’m using Superspeed RamDisk Plus software now, which enables me to make a >4GB RamDisk on a 32-bit Windows XP system.
-Using a RAID also speeds the model up.
-I’ve been told that processor speed is helpful, but not the critical bottleneck, but have not verified that myself.
Given all that, how do I figure out if I’d be better off buying fewer multi-core workstations and running them with several Virtualbox machines on each, or more (cheaper) 2 core machines and running Windows XP natively on them? I don’t have access to a multi-core machine to try this out beforehand (unless I can get useful information from experiments on one dual-core machine?).
My concern is that if I put all my eggs in the basket of big machine, and this doesn’t work out, then I’m stuck with just one expensive machine that doesn’t run the model any faster, and my modeling work will suddenly be intractable. I’m looking at months of computing time, so it’s a significant question for me.
Any thoughts would be much appreciated, including suggestions of specific questions I am not asking, but should be.
Thanks in advance,
Mike
Is VirtualBox good for my application - multiple instances of a large model running in parallel?
I'm enticed by Virtualbox, but unfortunately have little background on the IT side. I wonder if anyone on this forum can help me out, including with helping me figure out the right questions to ask and how to ask them. My situation follows (hopefully not in too much detail).
I want to invest some unfortunately very limited grant money in hardware ($10,000 or so), and I’m trying to figure out whether I can extend my effective computational power (model runs/dollar) using Virtualbox.
I am running a hydrological model (called WEAP) that runs on Windows XP. I want to use it in a Monte Carlo analysis, which means I’ll need to run it thousands of times, collecting the data from each run. I have a program to call my model and harvest the data. The model is quite large (takes 1hr to run on a my current workstation, a new dell T3400 with 2 core 3.XX ghz processor, RAID 5, 8GB RAM).
I have not formally evaluated the current computational bottleneck. I don’t have the skills to even know where to begin on really doing that. But below is what information I have, some of which comes from the developer of the modeling software:
-The model code is not parallelized, so can only use a single processor core for any instance of the model. Dual core machines are possibly helpful because they can allocate the other core for other processes than the model. Beyond this, probably not helpful.
-One can only run a single instance of the model on each instance of Windows at any one time.
-The model appears to be RAM-limited. To be more specific, using a RAMDisk of 3-6 GB speeds it up tremendously, and it is effectively unrunnable without a ramdisk (set up as a virtual hard drive). So I guess it’s the reading and writing to the model database that is the limiting factor, as opposed to the mathematical calculations, but I’m not sure.
-I’m using Superspeed RamDisk Plus software now, which enables me to make a >4GB RamDisk on a 32-bit Windows XP system.
-Using a RAID also speeds the model up.
-I’ve been told that processor speed is helpful, but not the critical bottleneck, but have not verified that myself.
Given all that, how do I figure out if I’d be better off buying fewer multi-core workstations and running them with several Virtualbox machines on each, or more (cheaper) 2 core machines and running Windows XP natively on them? I don’t have access to a multi-core machine to try this out beforehand (unless I can get useful information from experiments on one dual-core machine?).
My concern is that if I put all my eggs in the basket of big machine, and this doesn’t work out, then I’m stuck with just one expensive machine that doesn’t run the model any faster, and my modeling work will suddenly be intractable. I’m looking at months of computing time, so it’s a significant question for me.
Any thoughts would be much appreciated, including suggestions of specific questions I am not asking, but should be.
Thanks in advance,
Mike