[gmx-users] Problems with REMD in Gromacs 4.6.3

Wed Jul 31 16:52:18 CEST 2013

On Fri, Jul 19, 2013 at 6:59 PM, gigo <gigo at ibb.waw.pl> wrote:
> Hi!
>
>
> On 2013-07-17 21:08, Mark Abraham wrote:
>>
>> You tried ppn3 (with and without --loadbalance)?
>
>
> I was testing on 8-replicas simulation.
>
> 1) Without --loadbalance and -np 8.
> Excerpts from the script:
> #PBS -l nodes=8:ppn=3
> setenv OMP_NUM_THREADS 4
> mpiexec mdrun_mpi -v -cpt 20 -multi 8 -ntomp 4 -replex 2500 -cpi -pin on
>
> Excerpts from logs:
> Using 3 MPI processes
> Using 4 OpenMP threads per MPI process
> (...)
> Overriding thread affinity set outside mdrun_mpi
>
> Pinning threads with an auto-selected logical core stride of 1
>
> WARNING: In MPI process #0: Affinity setting for 1/4 threads failed.
>          This can cause performance degradation! If you think your setting
> are
>          correct, contact the GROMACS developers.
>
>
> WARNING: In MPI process #2: Affinity setting for 4/4 threads failed.
>
> Load: The job was allocated 24 cores (3 cores on 8 different nodes). Each
> OpenMP thread uses ~1/3 of a CPU core on average.
> Conclusions: MPI runs as many processes as cores requested (nnodes*ppn=24),
> it ignores OMP_NUM_THREADS env ==> this is wrong and is not Gromacs issue.
> Each MPI process forks to 4 threads as requested. The 24-core limit granted
> by Torque is not violated.
>
> 2) The same script, but with -np 8, to limit the number of MPI processes to
> the number of replicas
>
> Logs:
> Using 1 MPI process
> Using 4 OpenMP threads
> (...)
>
> Replicas 0,3 and 6: WARNING: Affinity setting for 1/4 threads failed.
> Replicas 1,2,4,5,7: WARNING: Affinity setting for 4/4 threads failed.
>
>
> Load: The job was allocated 24 cores on 8 nodes. Only on first 3 nodes
> mpiexec was run. Each OpenMP thread uses ~20% of a CPU core.
>
> 3) -np 8 --loadbalance
> Excerpts from logs:
>
> Using 1 MPI process
> Using 4 OpenMP threads
> (...)
> Each replica says: WARNING: Affinity setting for 3/4 threads failed.
>
> Load: MPI processes spread evenly on all 8 nodes. Each OpenMP thread uses
> ~50% of a CPU core.
>
> 4) -np 8 --loadbalance, #PBS -l nodes=8:ppn=4 <== this worked ~OK with
> gromacs 4.6.2
> Logs:
> WARNING: Affinity setting for 2/4 threads failed.
>
> Load: 32 cores allocated on 8 nodes. MPI processes spread evenly, each
> OpenMP thread uses ~70% of a CPU core.
> With 144 replicas the simulation did not produce any results, just got
> stuck.
>
>
> Some thoughts: the main problem is most probably in the way MPI interprets
> the information from torque, it is not Gromacs related. MPI ignores
> OMP_NUM_THREADS. The environment is just broken. Since gromacs-4.6.2 behaved
> better than 4.6.3 there, I am coming back to it.

FYI: unless you are setting thread affinities manually/through the job
scheduler, as the mdrun internal affinity setting has a bug in 4.6.2,
you are advised to use 4.6.3 (and the "better" behavior may actually
be caused by the non-functional affinity setting).

> Best,
>
> G
>
>>
>> Mark
>>
>> On Wed, Jul 17, 2013 at 6:30 PM, gigo <gigo at ibb.waw.pl> wrote:
>>>
>>> On 2013-07-13 11:10, Mark Abraham wrote:
>>>>
>>>>
>>>> On Sat, Jul 13, 2013 at 1:24 AM, gigo <gigo at ibb.waw.pl> wrote:
>>>>>
>>>>>
>>>>> On 2013-07-12 20:00, Mark Abraham wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 12, 2013 at 4:27 PM, gigo <gigo at ibb.waw.pl> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> On 2013-07-12 11:15, Mark Abraham wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What does --loadbalance do?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It balances the total number of processes across all allocated nodes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> OK, but using it means you are hostage to its assumptions about
>>>>>> balance.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thats true, but as long as I do not try to use more resources that the
>>>>> torque gives me, everything is OK. The question is, what is a proper
>>>>> way
>>>>> of
>>>>> running multiple simulations in parallel with MPI that are further
>>>>> parallelized with OpenMP, when pinning fails? I could not find any
>>>>> other.
>>>>
>>>>
>>>>
>>>> I think pinning fails because you are double-crossing yourself. You do
>>>> not want 12 MPI processes per node, and that is likely what ppn is
>>>> setting. AFAIK your setup should work, but I haven't tested it.
>>>>
>>>>>>
>>>>>>> The
>>>>>>> thing is that mpiexec does not know that I want each replica to fork
>>>>>>> to
>>>>>>> 4
>>>>>>> OpenMP threads. Thus, without this option and without affinities (in
>>>>>>> a
>>>>>>> sec
>>>>>>> about it) mpiexec starts too many replicas on some nodes - gromacs
>>>>>>> complains
>>>>>>> about the overload then - while some cores on other nodes are not
>>>>>>> used.
>>>>>>> It
>>>>>>> is possible to run my simulation like that:
>>>>>>>
>>>>>>> mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without
>>>>>>> --loadbalance for mpiexec and without -ntomp for mdrun)
>>>>>>>
>>>>>>> Then each replica runs on 4 MPI processes (I allocate 4 times more
>>>>>>> cores
>>>>>>> then replicas and mdrun sees it). The problem is that it is much
>>>>>>> slower
>>>>>>> than
>>>>>>> using OpenMP for each replica. I did not find any other way than
>>>>>>> --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to use
>>>>>>> MPI
>>>>>>> and OpenMP at the same time on the torque-controlled cluster.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> That seems highly surprising. I have not yet encountered a job
>>>>>> scheduler that was completely lacking a "do what I tell you" layout
>>>>>> scheme. More importantly, why are you using #PBS -l nodes=48:ppn=12?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I thing that torque is very similar to all PBS-like resource managers
>>>>> in
>>>>> this regard. It actually does what I tell it to do. There are 12-core
>>>>> nodes,
>>>>> I ask for 48 of them - I get them (simple #PBS -l ncpus=576 does not
>>>>> work),
>>>>> end of story. Now, the program that I run is responsible for populating
>>>>> resources that I got.
>>>>
>>>>
>>>>
>>>> No, that's not the end of the story. The scheduler and the MPI system
>>>> typically cooperate to populate the MPI processes on the hardware, set
>>>> OMP_NUM_THREADS, set affinities, etc. mdrun honours those if they are
>>>> set.
>>>
>>>
>>>
>>> I was able to run what I wanted flawlessly on another cluster with
>>> PBS-Pro.
>>> The torque cluster seem to work like I said ("the end of story"
>>> behaviour).
>>> REMD runs well on torque when I give a whole physical node to one
>>> replica.
>>> Otherwise the simulation does not go or the pinning fails (sometimes
>>> partially). I run out of options, I did not find any working
>>> example/documentation on running hybrid MPI/OpenMP jobs in torque. It
>>> seems
>>> that I stumbled upon limitations of this resource manager, and it is not
>>> really the Gromacs issue.
>>> Best Regards,
>>> Grzegorz
>>>
>>>
>>>>
>>>> You seem to be using 12 because you know there are 12 cores per node.
>>>> The scheduler should know that already. ppn should be a command about
>>>> what to do with the hardware, not a description of what it is. More to
>>>> the point, you should read the docs and be sure what it does.
>>>>
>>>>>> Surely you want 3 MPI processes per 12-core node?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yes - I want each node to run 3 MPI processes. Preferably, I would like
>>>>> to
>>>>> run each MPI process on separate node (spread on 12 cores with OpenMP)
>>>>> but I
>>>>> will not get as much of resources. But again, without the --loadbalance
>>>>> hack
>>>>> I would not be able to properly populate the nodes...
>>>>
>>>>
>>>>
>>>> So try ppn 3!
>>>>
>>>>>>
>>>>>>>> What do the .log files say about
>>>>>>>> OMP_NUM_THREADS, thread affinities, pinning, etc?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Each replica logs:
>>>>>>> "Using 1 MPI process
>>>>>>> Using 4 OpenMP threads",
>>>>>>> That is is correct. As I said, the threads are forked, but 3 out of 4
>>>>>>> don't
>>>>>>> do anything, and the simulation does not go at all.
>>>>>>>
>>>>>>> About affinities Gromacs says:
>>>>>>> "Can not set thread affinities on the current platform. On NUMA
>>>>>>> systems
>>>>>>> this
>>>>>>> can cause performance degradation. If you think your platform should
>>>>>>> support
>>>>>>> setting affinities, contact the GROMACS developers."
>>>>>>>
>>>>>>> Well, the "current platform" is normal x86_64 cluster, but the whole
>>>>>>> information about resources is passed by Torque to OpenMPI-linked
>>>>>>> Gromacs.
>>>>>>> Can it be that mdrun sees the resources allocated by torque as a big
>>>>>>> pool
>>>>>>> of
>>>>>>> cpus and misses the information about nodes topology?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> mdrun gets its processor topology from the MPI layer, so that is where
>>>>>> you need to focus. The error message confirms that GROMACS sees things
>>>>>> that seem wrong.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thank you, I will take a look. But the first thing I want to do is
>>>>> finding
>>>>> the reason why Gromacs 4.6.3 is not able to run on my (slightly weird,
>>>>> I
>>>>> admit) setup, while 4.6.2 does it very well.
>>>>
>>>>
>>>>
>>>> 4.6.2 had a bug that inhibited any MPI-based mdrun from attempting to
>>>> set affinities. It's still not clear why ppn 12 worked at all.
>>>> Apparently mdrun was able to float some processes around to get
>>>> something that worked. The good news is that when you get it working
>>>> in 4.6.3, you will see a performance boost.
>>>>
>>>> Mark
>>>
>>>
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists