[gmx-users] Problems with REMD in Gromacs 4.6.3
gigo at ibb.waw.pl
Sat Jul 13 02:24:04 CEST 2013
On 2013-07-12 20:00, Mark Abraham wrote:
> On Fri, Jul 12, 2013 at 4:27 PM, gigo <gigo at ibb.waw.pl> wrote:
>> On 2013-07-12 11:15, Mark Abraham wrote:
>>> What does --loadbalance do?
>> It balances the total number of processes across all allocated nodes.
> OK, but using it means you are hostage to its assumptions about
Thats true, but as long as I do not try to use more resources that the
torque gives me, everything is OK. The question is, what is a proper way
of running multiple simulations in parallel with MPI that are further
parallelized with OpenMP, when pinning fails? I could not find any
>> thing is that mpiexec does not know that I want each replica to fork
>> to 4
>> OpenMP threads. Thus, without this option and without affinities (in
>> a sec
>> about it) mpiexec starts too many replicas on some nodes - gromacs
>> about the overload then - while some cores on other nodes are not
>> used. It
>> is possible to run my simulation like that:
>> mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without
>> --loadbalance for mpiexec and without -ntomp for mdrun)
>> Then each replica runs on 4 MPI processes (I allocate 4 times more
>> then replicas and mdrun sees it). The problem is that it is much
>> slower than
>> using OpenMP for each replica. I did not find any other way than
>> --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to use
>> and OpenMP at the same time on the torque-controlled cluster.
> That seems highly surprising. I have not yet encountered a job
> scheduler that was completely lacking a "do what I tell you" layout
> scheme. More importantly, why are you using #PBS -l nodes=48:ppn=12?
I thing that torque is very similar to all PBS-like resource managers
in this regard. It actually does what I tell it to do. There are 12-core
nodes, I ask for 48 of them - I get them (simple #PBS -l ncpus=576 does
not work), end of story. Now, the program that I run is responsible for
populating resources that I got.
> Surely you want 3 MPI processes per 12-core node?
Yes - I want each node to run 3 MPI processes. Preferably, I would like
to run each MPI process on separate node (spread on 12 cores with
OpenMP) but I will not get as much of resources. But again, without the
--loadbalance hack I would not be able to properly populate the nodes...
>>> What do the .log files say about
>>> OMP_NUM_THREADS, thread affinities, pinning, etc?
>> Each replica logs:
>> "Using 1 MPI process
>> Using 4 OpenMP threads",
>> That is is correct. As I said, the threads are forked, but 3 out of 4
>> do anything, and the simulation does not go at all.
>> About affinities Gromacs says:
>> "Can not set thread affinities on the current platform. On NUMA
>> systems this
>> can cause performance degradation. If you think your platform should
>> setting affinities, contact the GROMACS developers."
>> Well, the "current platform" is normal x86_64 cluster, but the whole
>> information about resources is passed by Torque to OpenMPI-linked
>> Can it be that mdrun sees the resources allocated by torque as a big
>> pool of
>> cpus and misses the information about nodes topology?
> mdrun gets its processor topology from the MPI layer, so that is where
> you need to focus. The error message confirms that GROMACS sees things
> that seem wrong.
Thank you, I will take a look. But the first thing I want to do is
finding the reason why Gromacs 4.6.3 is not able to run on my (slightly
weird, I admit) setup, while 4.6.2 does it very well.
More information about the gromacs.org_gmx-users