[gmx-users] mdrun_mpi stops at random
Jason O'Young
joyoung at uwo.ca
Thu May 11 20:51:14 CEST 2006
Hi all,
I have an issue doing parallel runs where the simulation would just
hang at seemingly random intervals anywhere from an hour to a day.
There are no error messages reported in the logs and nothing funny
from dmesg.
My set up is two dual-core Pentium D. I run with -np 4 to take
advantage of all cores.
When I issue a top command when the run is frozen, I notice that
mdrun_mpi is at 0% CPU usage and sometimes sleeping on the "slave"
node. On the "master" node, CPU usage is close to max. There is no
network activity according to the blinking lights on my switch as well.
When I do a run with -np 2 where one process is run on each computer,
the run seems to carry on stably.
I am using:
Gromacs 3.3.1
Lam 7.1.2
Parallel Knoppix Kernel version 2.6.12
Gromacs was compiled from source.
I understand from searching the archives that a few people have had a
problem similar to mine, but I could not fine a straight answer.
Thanks!
Jason
More information about the gromacs.org_gmx-users
mailing list