[gmx-users] mpich job hangs on exit
feenstra at chem.vu.nl
Wed Aug 4 10:33:56 CEST 2004
I've encountered a problem with multi-node jobs (e.g. 6 cpu's devided over
3 nodes). The symptom is that the accounting info at the end of the logfiles
is only written partially. The trr, xtc and edr files are fine, they have
been closed and the final frame was written, and also confout.gro is present.
However, all mdrun processes are still using 100% CPU and fail to exit. For
example, my confout.gro was written yesterday at 10.30, but today at 10.21,
the mdruns are still 'active'. The solution is pretty simple - kill them,
but this may point to a deeper problem.
I've been able to find reports of similar problems in the maillist, but no
followups with a solution... For now, the job is still hanging so today I
will be able to have a look at some specifics of the jobs if necessary.
Oh, almost forgot. I'm running on a 3GHz dual Xeon cluster, with Gb ethernet
connect, MPICH 184.108.40.206, Gromacs 3.2.1, Intel cc/fc 7.1 and RedHat Enterprise 3.
It happens with a 6 CPU job on GroEL/ES (14+7 subunits, plus water = 75k atoms),
but also with a 450 residue protein in water (40k atoms) at 8 CPU's and above.
| | |
| _ _ ___,| K. Anton Feenstra |
| / \ / \'| | | Dept. of Pharmacochem. - Vrije Universiteit Amsterdam |
|( | )| | | De Boelelaan 1083 - 1081 HV Amsterdam - Netherlands |
| \_/ \_/ | | | Tel: +31 20 44 47608 - Fax: +31 20 44 47610 |
| | Feenstra at chem.vu.nl - www.chem.vu.nl/~feenstra/ |
| | "If You See Me Getting High, Knock Me Down" |
| | (Red Hot Chili Peppers) |
More information about the gromacs.org_gmx-users