[gmx-users] Gromacs 4.6.7 with MPI and OpenMP
Malcolm Tobias
mtobias at wustl.edu
Fri May 8 16:45:38 CEST 2015
Szilárd,
On Friday 08 May 2015 15:56:12 Szilárd Páll wrote:
> What's being utilized vs what's being started are different things. If
> you don't believe the mdrun output - which is quite likely not wrong
> about the 2 ranks x 4 threads -, use your favorite tool to check the
> number of ranks and threads started and their placement. That will
> explain what's going on...
Good point. If I use 'ps -L' I can see the OpenMP threads:
[root at gpu21 ~]# ps -Lfu mtobias
UID PID PPID LWP C NLWP STIME TTY TIME CMD
mtobias 9830 9828 9830 0 1 09:28 ? 00:00:00 sshd: mtobias at pts/0
mtobias 9831 9830 9831 0 1 09:28 pts/0 00:00:00 -bash
mtobias 9989 9831 9989 0 2 09:33 pts/0 00:00:00 mpirun -np 2 mdrun_mp
mtobias 9989 9831 9991 0 2 09:33 pts/0 00:00:00 mpirun -np 2 mdrun_mp
mtobias 9990 9831 9990 0 1 09:33 pts/0 00:00:00 tee mdrun.out
mtobias 9992 9989 9992 38 7 09:33 pts/0 00:00:02 mdrun_mpi -ntomp 4 -v
mtobias 9992 9989 9994 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9992 9989 9998 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9992 9989 10000 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9992 9989 10001 16 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9992 9989 10002 16 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9992 9989 10003 16 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 9993 73 7 09:33 pts/0 00:00:05 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 9995 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 9999 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 10004 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 10005 12 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 10006 12 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
mtobias 9993 9989 10007 12 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 -v
but top only shows 2 CPUs being utilized:
top - 09:33:42 up 37 days, 19:48, 2 users, load average: 2.13, 1.05, 0.68
Tasks: 517 total, 3 running, 514 sleeping, 0 stopped, 0 zombie
Cpu0 : 98.7%us, 1.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 98.7%us, 1.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132053748k total, 23817664k used, 108236084k free, 268628k buffers
Swap: 4095996k total, 1884k used, 4094112k free, 15600572k cached
> Very likely that's exactly what's screwing things up. We try to be
> nice and back off (mdrun should note that on the output) when
> affinities are set externally assuming that they are set for a good
> reason and to correct values. Sadly, that assumption often proves to
> be wrong. Try running with "-pin on" or turn off the CPUSET-ing (or
> double-check if it's right).
I wouldn't expect the CPUSETs to be problematic, I've been using them with Gromacs for over a decade now ;-)
If I use '-pin on' it appears to be utilizing 8 CPU-cores as expected:
[mtobias at gpu21 Gromacs_Test]$ mpirun -np 2 mdrun_mpi -ntomp 4 -pin on -v -deffnm PolyA_Heli_J_hi_equil
top - 09:36:26 up 37 days, 19:50, 2 users, load average: 1.00, 1.14, 0.78
Tasks: 516 total, 4 running, 512 sleeping, 0 stopped, 0 zombie
Cpu0 : 78.9%us, 2.7%sy, 0.0%ni, 18.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 63.7%us, 0.3%sy, 0.0%ni, 36.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 65.6%us, 0.3%sy, 0.0%ni, 33.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 64.9%us, 0.3%sy, 0.0%ni, 34.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 80.7%us, 2.7%sy, 0.0%ni, 16.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 64.0%us, 0.3%sy, 0.0%ni, 35.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 62.0%us, 0.3%sy, 0.0%ni, 37.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 60.3%us, 0.3%sy, 0.0%ni, 39.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Weird. I wonder if anyone else has experience using pin'ing with CPUSETs?
Malcolm
--
Malcolm Tobias
314.362.1594
More information about the gromacs.org_gmx-users
mailing list