[gmx-users] Problems with REMD in Gromacs 4.6.3

Mark Abraham mark.j.abraham at gmail.com
Fri Jul 12 19:00:24 CEST 2013


On Fri, Jul 12, 2013 at 4:27 PM, gigo <gigo at ibb.waw.pl> wrote:
> Hi!
>
> On 2013-07-12 11:15, Mark Abraham wrote:
>>
>> What does --loadbalance do?
>
>
> It balances the total number of processes across all allocated nodes.

OK, but using it means you are hostage to its assumptions about balance.

> The
> thing is that mpiexec does not know that I want each replica to fork to 4
> OpenMP threads. Thus, without this option and without affinities (in a sec
> about it) mpiexec starts too many replicas on some nodes - gromacs complains
> about the overload then - while some cores on other nodes are not used. It
> is possible to run my simulation like that:
>
> mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without
> --loadbalance for mpiexec and without -ntomp for mdrun)
>
> Then each replica runs on 4 MPI processes (I allocate 4 times more cores
> then replicas and mdrun sees it). The problem is that it is much slower than
> using OpenMP for each replica. I did not find any other way than
> --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to use MPI
> and OpenMP at the same time on the torque-controlled cluster.

That seems highly surprising. I have not yet encountered a job
scheduler that was completely lacking a "do what I tell you" layout
scheme. More importantly, why are you using #PBS -l nodes=48:ppn=12?
Surely you want 3 MPI processes per 12-core node?

>> What do the .log files say about
>> OMP_NUM_THREADS, thread affinities, pinning, etc?
>
>
> Each replica logs:
> "Using 1 MPI process
> Using 4 OpenMP threads",
> That is is correct. As I said, the threads are forked, but 3 out of 4 don't
> do anything, and the simulation does not go at all.
>
> About affinities Gromacs says:
> "Can not set thread affinities on the current platform. On NUMA systems this
> can cause performance degradation. If you think your platform should support
> setting affinities, contact the GROMACS developers."
>
> Well, the "current platform" is normal x86_64 cluster, but the whole
> information about resources is passed by Torque to OpenMPI-linked Gromacs.
> Can it be that mdrun sees the resources allocated by torque as a big pool of
> cpus and misses the information about nodes topology?

mdrun gets its processor topology from the MPI layer, so that is where
you need to focus. The error message confirms that GROMACS sees things
that seem wrong.

Mark

>
> If you have any suggestions how to debug or trace this issue, I would be
> glad to participate.
> Best,
>
> G
>
>
>
>
>
>
>>
>> Mark
>>
>> On Fri, Jul 12, 2013 at 3:46 AM, gigo <gigo at poczta.ibb.waw.pl> wrote:
>>>
>>> Dear GMXers,
>>> With Gromacs 4.6.2 I was running REMD with 144 replicas. Replicas were
>>> separate MPI jobs of course (OpenMPI 1.6.4). Each replica I run on 4
>>> cores
>>> with OpenMP. There is Torque installed on the cluster build of 12-cores
>>> nodes, so I used the following script:
>>>
>>> #!/bin/tcsh -f
>>> #PBS -S /bin/tcsh
>>> #PBS -N test
>>> #PBS -l nodes=48:ppn=12
>>> #PBS -l walltime=300:00:00
>>> #PBS -l mem=288Gb
>>> #PBS -r n
>>> cd $PBS_O_WORKDIR
>>> mpiexec -np 144 --loadbalance mdrun_mpi -v -cpt 20 -multi 144 -ntomp 4
>>> -replex 2000
>>>
>>> It was working just great with 4.6.2. It does not work with 4.6.3. The
>>> new
>>> version was compiled with the same options in the same environment.
>>> Mpiexec
>>> spreads the replicas evenly over the cluster. Each replica forks 4
>>> threads,
>>> but only one of them uses any cpu. Logs end at the citations. Some empty
>>> energy and trajectory files are created, nothing is written to them.
>>> Please let me know if you have any immediate suggestion on how to make it
>>> work (maybe based on some differences between versions), or if I should
>>> fill
>>> the bug report with all the technical details.
>>> Best Regards,
>>>
>>> Grzegorz Wieczorek
>>>
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list