[gmx-users] MPI at cluster
taeho.kim at utoronto.ca
Fri May 30 15:10:01 CEST 2003
First of all, thank you for my questions: strategy of clustering 10node.
I have done some jobs on 10-node cluster and noticed that the system showed quite different running time on 2cpu node. I think since each cpu has different calculation task and latency (between machines), the time can be different.
If so, can I regard the time gap between nodes as indirect info of scaling ? (The smaller time gap the better performance.) Does it cause any problem like job crash in the end ?
It became larger and larger, one cpu became idle (no further calculation saved), finally the system crashed. This is what I experienced, but I don't have the data with me. Please find related data from current job below.
If such differences could relate to the cause of the crash, how can I avoid it ?
The following job is running as of now..(Gmx3.1.4, lammpi5.6.8, fftw2.1.3: -AMD cluster)
% top from 3nodes
1. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
2864 yipg2 19 19 26660 25M 2832 R N 99.0 5.1 6282m mdrun_mpi
2865 yipg2 19 19 22592 21M 2688 S N 78.2 4.3 5339m mdrun_mpi
2. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
2070 yipg2 19 19 19844 19M 2632 R N 98.3 3.7 6223m mdrun_mpi
2071 yipg2 19 19 19664 18M 2620 R N 89.4 3.7 6040m mdrun_mpi
3. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
14192 yipg2 19 19 19420 18M 2620 S N 80.6 3.7 4674m mdrun_mpi
14191 yipg2 19 19 19424 18M 2632 S N 36.3 3.7 3322m mdrun_mpi
More information about the gromacs.org_gmx-users