[gmx-users] mdrun_mpi stops at random

Jason O'Young joyoung at uwo.ca
Thu May 11 20:51:14 CEST 2006


Hi all,

I have an issue doing parallel runs where the simulation would just  
hang at seemingly random intervals anywhere from an hour to a day.  
There are no error messages reported in the logs and nothing funny  
from dmesg.

My set up is two dual-core Pentium D. I run with -np 4 to take  
advantage of all cores.

When I issue a top command when the run is frozen, I notice that  
mdrun_mpi is at 0% CPU usage and sometimes sleeping on the "slave"  
node. On the "master" node, CPU usage is close to max. There is no  
network activity according to the blinking lights on my switch as well.

When I do a run with -np 2 where one process is run on each computer,  
the run seems to carry on stably.

I am using:
Gromacs 3.3.1
Lam 7.1.2
Parallel Knoppix Kernel version 2.6.12

Gromacs was compiled from source.

I understand from searching the archives that a few people have had a  
problem similar to mine, but I could not fine a straight answer.

Thanks!
Jason





More information about the gromacs.org_gmx-users mailing list