[gmx-users] Efficiently running multiple simulations

Mark Abraham mark.j.abraham at gmail.com
Wed Sep 16 23:44:19 CEST 2015


Hi,

The log files tell you that you should compile for AVX2_256 SIMD for the
Haswell CPUs you have. Do that. Your runs are wasting a fair chunk of the
value of the CPU hardware, and your setup absolutely needs to extract every
last drop from the CPUs. That means you need to follow the instructions in
the GROMACS install guide, which suggest you use a recent compiler. Your
GROMACS was compiled with gcc 4.4.7, which was about two years old before a
Haswell was sold! Why HPC clusters buy the latest hardware and continue to
default to the "stable" 5-year old compiler suite shipped with the
"enterprise" distribution remains a total mystery to me. :-)

The log file also says that your MPI system is starting four OpenMP threads
per rank in the multi-simulation case, so the comparison is not valid.
Starting 8*4 OpenMP threads on your node oversubscribes the actual cores,
and this is terrible for GROMACS. You need to find out how many actual
cores you have (each of which can have two hyperthreads, which is usually
worth using on such Haswell machines). You want either one thread per core,
or two threads per core (try both). If you don't know how many actual cores
there are, consult your local docs/admins.

"Mapping of GPUs to the 8 PP ranks in this node: #0, #1, #2, #3, #4, #5,
#6, #7" is actually correct and unambiguous. There's 8 simulations, each
with 1 domain, so 8 PP ranks, and each is mapped to one of 8 GPUs *in this
node*. You've been reading "node" and thinking "simulation."

Mark


On Wed, Sep 16, 2015 at 9:23 PM Zimmerman, Maxwell <mizimmer at wustl.edu>
wrote:

> Hi Mark,
>
> Here are two links to .log files for running 1 simulation on 1 GPU and 2
> CPUs and 8 simulations across all 8 GPUs and 16 CPUs respectively:
>
> https://www.dropbox.com/s/ko2l0qlr4kdpt51/md_1GPU.log?dl=0
> https://www.dropbox.com/s/chtcv4nqxof64p8/md_8GPUs.log?dl=0
>
> Regards,
> -Maxwell
>
>
> ________________________________________
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark
> Abraham <mark.j.abraham at gmail.com>
> Sent: Wednesday, September 16, 2015 1:39 PM
> To: gmx-users at gromacs.org; gromacs.org_gmx-users at maillist.sys.kth.se
> Subject: Re: [gmx-users] Efficiently running multiple simulations
>
> Hi,
>
> On Wed, Sep 16, 2015 at 5:46 PM Zimmerman, Maxwell <mizimmer at wustl.edu>
> wrote:
>
> > Hi Mark,
> >
> > Thank you for the feedback.
> >
> > To ensure that I am making a proper comparison, I tried running:
> > mpirun -np 1 mdrun_mpi -ntomp 2 -gpu_id 0 -pin on
> > and I still see the same pattern; running a single simulation with 1 GPU
> > and 2 CPUs performs nearly twice as well as running 8 simulations with
> > "-multi" using 8 GPUs and 16 CPUs.
> >
>
> OK. In that case, please share some links to .log files on a file-sharing
> service, so we might be able to see where the issue arises. The list does
> not accept attachments.
>
> Just to clarify, when I use "-multi" all 8 of the .log files show that 8
> > GPUs are selected for the run. If a single GPU were being used, wouldn't
> it
> > only show mapping to one GPU ID per .log file?
> >
>
> I forget the details here, but organizing the mapping has to be done on a
> per-node basis. It would not surprise me if the reporting was not strictly
> valid on a per-simulation basis, but it ought to mention that the 8 GPUs
> are asserts of the node, and not necessarily of the simulation.
>
> There is absolutely no way that any simulation with a single domain can
> share 8 GPUs.
>
> Mark
>
>
> > Regards,
> > -Maxwell
> >
> > ________________________________________
> > From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark
> > Abraham <mark.j.abraham at gmail.com>
> > Sent: Wednesday, September 16, 2015 10:08 AM
> > To: gmx-users at gromacs.org; gromacs.org_gmx-users at maillist.sys.kth.se
> > Subject: Re: [gmx-users] Efficiently running multiple simulations
> >
> > Hi,
> >
> >
> > On Wed, Sep 16, 2015 at 4:41 PM Zimmerman, Maxwell <mizimmer at wustl.edu>
> > wrote:
> >
> > > Hi Mark,
> > >
> > > Sorry for the confusion, what I meant to say was that each node on the
> > > cluster has 8 GPUs and 16 CPUs.
> > >
> >
> > OK. Please note that "CPU" is ambiguous, so you should prefer not to use
> it
> > without clarification.
> >
> > Unless the GPUs are weak and the CPU is strong, 2 CPU cores per GPU will
> > likely be under-powered for PME simulations in GROMACS.
> >
> > When I attempt to specify the GPU IDs for running 8 simulations on a node
> > > using the "-multi" and "-gpu_id", each .log file has the following:
> > >
> > > "8 GPUs user-selected for this run.
> > > Mapping of GPUs to the 8 PP ranks in this node: #0, #1, #2, #3, #4, #5,
> > > #6, #7"
> > >
> > > This makes me think that each simulation is competing for each of the
> > GPUs
> >
> >
> > You are running 8 simulations, each of which has a single domain, each of
> > which is mapped to a single PP rank, each of which is mapped to a
> different
> > single GPU. Perfect.
> >
> > explaining my performance loss per simulation compared to running 1
> > > simulation on 1 GPU and 2 CPUs.
> >
> >
> > Very likely you are not comparing with what you think you are, e.g. you
> > need to compare with an otherwise empty node running something like
> >
> > mpirun -np 1 mdrun_mpi -ntomp 2 -gpu_id 0 -pin on
> >
> > so that you actually have a single process running on two pinned CPU
> cores
> > and a single GPU. This should be fairly comparable with the mdrun -multi
> > setup
> >
> > A side-by-side diff of that log file and the log file of the 0th member
> of
> > the multi-sim should show very few differences until the simulation
> starts,
> > and comparable performance. If not, please share your .log files on a
> > file-sharing service.
> >
> > If this interpretation is correct, is there a better way to pin each
> > > simulation to a single GPU and 2 CPUs? If my interpretation is
> incorrect,
> > > is there a more efficient way to use the "-multi" option to match the
> > > performance I see of running a single simulation * 8?
> > >
> >
> > mdrun will handle all of that correctly if it hasn't been crippled by how
> > the MPI library has organized life. You want it to assign ranks to cores
> > that are close to each other and their matching GPU. That tends to be the
> > default behaviour, but clusters intended for node sharing can do weird
> > things. (It is not yet clear that any of this is a problem.)
> >
> > Mark
> >
> >
> > > Regards,
> > > -Maxwell
> > >
> > >
> > > ________________________________________
> > > From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark
> > > Abraham <mark.j.abraham at gmail.com>
> > > Sent: Wednesday, September 16, 2015 3:52 AM
> > > To: gmx-users at gromacs.org; gromacs.org_gmx-users at maillist.sys.kth.se
> > > Subject: Re: [gmx-users] Efficiently running multiple simulations
> > >
> > > Hi,
> > >
> > > I'm confused by your description of the cluster as having 8 GPUs and 16
> > > CPUs. The relevant parameters are the number of GPUs and CPU cores per
> > > node. See the examples at
> > >
> > >
> >
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-features.html#running-multi-simulations
> > >
> > > Mark
> > >
> > > On Tue, Sep 15, 2015 at 11:38 PM Zimmerman, Maxwell <
> mizimmer at wustl.edu>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >
> > > > I am having some troubles efficiently running simulations in parallel
> > on
> > > a
> > > > gpu-cluster. The cluster has 8 GPUs and 16 CPUs. Currently, the
> command
> > > > that I am using is:
> > > >
> > > >
> > > > mpirun -np 8 mdrun_mpi -multi 8 -nice 4 -s md -o md -c after_md -v -x
> > > > frame -pin on
> > > >
> > > >
> > > > Per-simulation, the performance I am getting with this command is
> > > > significantly lower than running 1 simulation that uses 1 GPU and 2
> > CPUs
> > > > alone. This command seems to use all 8 GPUs and 16 CPUs on the 8
> > parallel
> > > > simulations, although I think this would be faster if I could pin
> each
> > > > simulation to a specific GPU and pair of CPUs. The -gpu_id option
> does
> > > not
> > > > seem to change anything when I am using the mpirun. Is there a way
> > that I
> > > > can efficiently run the 8 simulations on the cluster by specifying
> the
> > > GPU
> > > > and CPUs to run with each simulation?
> > > >
> > > >
> > > > Thank you in advance!
> > > >
> > > >
> > > > Regards,
> > > >
> > > > -Maxwell
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list