[gmx-users] Efficiently running multiple simulations
Mark Abraham
mark.j.abraham at gmail.com
Wed Sep 16 20:39:43 CEST 2015
Hi,
On Wed, Sep 16, 2015 at 5:46 PM Zimmerman, Maxwell <mizimmer at wustl.edu>
wrote:
> Hi Mark,
>
> Thank you for the feedback.
>
> To ensure that I am making a proper comparison, I tried running:
> mpirun -np 1 mdrun_mpi -ntomp 2 -gpu_id 0 -pin on
> and I still see the same pattern; running a single simulation with 1 GPU
> and 2 CPUs performs nearly twice as well as running 8 simulations with
> "-multi" using 8 GPUs and 16 CPUs.
>
OK. In that case, please share some links to .log files on a file-sharing
service, so we might be able to see where the issue arises. The list does
not accept attachments.
Just to clarify, when I use "-multi" all 8 of the .log files show that 8
> GPUs are selected for the run. If a single GPU were being used, wouldn't it
> only show mapping to one GPU ID per .log file?
>
I forget the details here, but organizing the mapping has to be done on a
per-node basis. It would not surprise me if the reporting was not strictly
valid on a per-simulation basis, but it ought to mention that the 8 GPUs
are asserts of the node, and not necessarily of the simulation.
There is absolutely no way that any simulation with a single domain can
share 8 GPUs.
Mark
> Regards,
> -Maxwell
>
> ________________________________________
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark
> Abraham <mark.j.abraham at gmail.com>
> Sent: Wednesday, September 16, 2015 10:08 AM
> To: gmx-users at gromacs.org; gromacs.org_gmx-users at maillist.sys.kth.se
> Subject: Re: [gmx-users] Efficiently running multiple simulations
>
> Hi,
>
>
> On Wed, Sep 16, 2015 at 4:41 PM Zimmerman, Maxwell <mizimmer at wustl.edu>
> wrote:
>
> > Hi Mark,
> >
> > Sorry for the confusion, what I meant to say was that each node on the
> > cluster has 8 GPUs and 16 CPUs.
> >
>
> OK. Please note that "CPU" is ambiguous, so you should prefer not to use it
> without clarification.
>
> Unless the GPUs are weak and the CPU is strong, 2 CPU cores per GPU will
> likely be under-powered for PME simulations in GROMACS.
>
> When I attempt to specify the GPU IDs for running 8 simulations on a node
> > using the "-multi" and "-gpu_id", each .log file has the following:
> >
> > "8 GPUs user-selected for this run.
> > Mapping of GPUs to the 8 PP ranks in this node: #0, #1, #2, #3, #4, #5,
> > #6, #7"
> >
> > This makes me think that each simulation is competing for each of the
> GPUs
>
>
> You are running 8 simulations, each of which has a single domain, each of
> which is mapped to a single PP rank, each of which is mapped to a different
> single GPU. Perfect.
>
> explaining my performance loss per simulation compared to running 1
> > simulation on 1 GPU and 2 CPUs.
>
>
> Very likely you are not comparing with what you think you are, e.g. you
> need to compare with an otherwise empty node running something like
>
> mpirun -np 1 mdrun_mpi -ntomp 2 -gpu_id 0 -pin on
>
> so that you actually have a single process running on two pinned CPU cores
> and a single GPU. This should be fairly comparable with the mdrun -multi
> setup
>
> A side-by-side diff of that log file and the log file of the 0th member of
> the multi-sim should show very few differences until the simulation starts,
> and comparable performance. If not, please share your .log files on a
> file-sharing service.
>
> If this interpretation is correct, is there a better way to pin each
> > simulation to a single GPU and 2 CPUs? If my interpretation is incorrect,
> > is there a more efficient way to use the "-multi" option to match the
> > performance I see of running a single simulation * 8?
> >
>
> mdrun will handle all of that correctly if it hasn't been crippled by how
> the MPI library has organized life. You want it to assign ranks to cores
> that are close to each other and their matching GPU. That tends to be the
> default behaviour, but clusters intended for node sharing can do weird
> things. (It is not yet clear that any of this is a problem.)
>
> Mark
>
>
> > Regards,
> > -Maxwell
> >
> >
> > ________________________________________
> > From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark
> > Abraham <mark.j.abraham at gmail.com>
> > Sent: Wednesday, September 16, 2015 3:52 AM
> > To: gmx-users at gromacs.org; gromacs.org_gmx-users at maillist.sys.kth.se
> > Subject: Re: [gmx-users] Efficiently running multiple simulations
> >
> > Hi,
> >
> > I'm confused by your description of the cluster as having 8 GPUs and 16
> > CPUs. The relevant parameters are the number of GPUs and CPU cores per
> > node. See the examples at
> >
> >
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-features.html#running-multi-simulations
> >
> > Mark
> >
> > On Tue, Sep 15, 2015 at 11:38 PM Zimmerman, Maxwell <mizimmer at wustl.edu>
> > wrote:
> >
> > > Hello,
> > >
> > >
> > > I am having some troubles efficiently running simulations in parallel
> on
> > a
> > > gpu-cluster. The cluster has 8 GPUs and 16 CPUs. Currently, the command
> > > that I am using is:
> > >
> > >
> > > mpirun -np 8 mdrun_mpi -multi 8 -nice 4 -s md -o md -c after_md -v -x
> > > frame -pin on
> > >
> > >
> > > Per-simulation, the performance I am getting with this command is
> > > significantly lower than running 1 simulation that uses 1 GPU and 2
> CPUs
> > > alone. This command seems to use all 8 GPUs and 16 CPUs on the 8
> parallel
> > > simulations, although I think this would be faster if I could pin each
> > > simulation to a specific GPU and pair of CPUs. The -gpu_id option does
> > not
> > > seem to change anything when I am using the mpirun. Is there a way
> that I
> > > can efficiently run the 8 simulations on the cluster by specifying the
> > GPU
> > > and CPUs to run with each simulation?
> > >
> > >
> > > Thank you in advance!
> > >
> > >
> > > Regards,
> > >
> > > -Maxwell
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list