[gmx-users] mpich job hangs on exit
feenstra at chem.vu.nl
Wed Aug 4 14:21:22 CEST 2004
David van der Spoel wrote:
> See where it hangs in the mdrunXX.log files.
> The question is whether the nodes ever reach the mpi_finalize call, or
> whether they are killed by the queue system.
All md?.log files now end in 'Finished mdrun on node # ...'
On some of the nodes the load is > 0, so it seems that some of
the mdrun processes are still in limbo, but not all. The job is gone
now, so I'll rerun it to see what it all looks like...
Another test without -debug, but with LOG_BUFS=0, gives a 'normal' md0.log
('Finished...') and stderr output ('Performance:...'), but the other
logfiles end somewhere during writing accounting for 'Dummy4fd'.
mdrun processes 0 to 5 are hanging, while 6 and 7 (the fourth node) have
terminated. In addition, there is an error in the stdout of the job:
p6_18643: p4_error: Timeout in establishing connection to remote process: 0
p7_18693: p4_error: Timeout in establishing connection to remote process: 0
This is probably MPICH talking. Note it mentions 'p6' and 'p7' which may
explain why mdruns #6 and #7 have terminated.
> Maybe you have to put in extra debug statements.
| | |
| _ _ ___,| K. Anton Feenstra |
| / \ / \'| | | Dept. of Pharmacochem. - Vrije Universiteit Amsterdam |
|( | )| | | De Boelelaan 1083 - 1081 HV Amsterdam - Netherlands |
| \_/ \_/ | | | Tel: +31 20 44 47608 - Fax: +31 20 44 47610 |
| | Feenstra at chem.vu.nl - www.chem.vu.nl/~feenstra/ |
| | "If You See Me Getting High, Knock Me Down" |
| | (Red Hot Chili Peppers) |
More information about the gromacs.org_gmx-users