[gmx-users] using dual CPU's
Mark Abraham
mark.j.abraham at gmail.com
Mon Dec 10 22:53:44 CET 2018
Hi,
One of your reported runs only used six threads, by the way.
Something sensible can be said when the performance report at the end of
the log file can be seen.
Mark
On Tue., 11 Dec. 2018, 01:25 p buscemi, <pbuscemi at q.com> wrote:
> Thank you, Mark, for the prompt response. I realize the limitations of the
> system ( its over 8 yo ), but I did not expect the speed to decrease by 50%
> with 12 available threads ! No combination of ntomp, ntmpi could raise
> ns/day above 4 with two GPU, vs 6 with one GPU.
>
> This is actually a learning/practice run for a new build - an AMD 4.2 Ghz
> 32 core TR, 64G ram. In this case I am trying to decide upon either a RTX
> 2080 ti or two GTX 1080 TI. I'd prefer the two 1080's for the 7000 cores vs
> the 4500 cores of the 2080. The model systems will have ~ million particles
> and need the speed. But this is a major expense so I need to get it right.
> I'll do as you suggest and report the results for both systems and I
> really appreciate the assist.
> Paul
> UMN, BICB
>
> On Dec 9 2018, at 4:32 pm, paul buscemi <pbuscemi at q.com> wrote:
> >
> > Dear Users,
> > I have good luck using a single GPU with the basic setup.. However in
> going from one gtx 1060 to a system with two - 50,000 atoms - the rate
> decrease from 10 ns/day to 5 or worse. The system models a ligand, solvent
> ( water ) and a lipid membrane
> > the cpu is a 6 core intel i7 970( 12 threads ) , 750W PS, 16G Ram.
> > with the basic command " mdrun I get:
> > ck Off! I just backed up sys.nvt.log to ./#.sys.nvt.log.10#
> > Reading file SR.sys.nvt.tpr, VERSION 2018.3 (single precision)
> > Changing nstlist from 10 to 100, rlist from 1 to 1
> >
> > Using 2 MPI threads
> > Using 6 OpenMP threads per tMPI thread
> >
> > On host I7 2 GPUs auto-selected for this run.
> > Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
> > PP:0,PP:1
> >
> > Back Off! I just backed up SR.sys.nvt.trr to ./#SR.sys.nvt.trr.10#
> > Back Off! I just backed up SR.sys.nvt.edr to ./#SR.sys.nvt.edr.10#
> > NOTE: DLB will not turn on during the first phase of PME tuning
> > starting mdrun 'SR-TA'
> > 100000 steps, 100.0 ps.
> > and ending with ^C
> >
> > Received the INT signal, stopping within 200 steps
> >
> > Dynamic load balancing report:
> > DLB was locked at the end of the run due to unfinished PP-PME balancing.
> > Average load imbalance: 0.7%.
> > The balanceable part of the MD step is 46%, load imbalance is computed
> from this.
> > Part of the total run time spent waiting due to load imbalance: 0.3%.
> >
> >
> > Core t (s) Wall t (s) (%)
> > Time: 543.475 45.290 1200.0
> > (ns/day) (hour/ns)
> > Performance: 1.719 13.963 before DBL is turned on
> >
> > Very poor performance. I have been following - or trying to follow -
> "Performance Tuning and Optimization fo GROMACA ' M.Abraham andR Apsotolov
> - 2016 but have not yet broken the code.
> > ----------------
> > gmx mdrun -deffnm SR.sys.nvt -ntmpi 2 -ntomp 3 -gpu_id 01 -pin on.
> >
> >
> > Back Off! I just backed up SR.sys.nvt.log to ./#SR.sys.nvt.log.13#
> > Reading file SR.sys.nvt.tpr, VERSION 2018.3 (single precision)
> > Changing nstlist from 10 to 100, rlist from 1 to 1
> >
> > Using 2 MPI threads
> > Using 3 OpenMP threads per tMPI thread
> >
> > On host I7 2 GPUs auto-selected for this run.
> > Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
> > PP:0,PP:1
> >
> > Back Off! I just backed up SR.sys.nvt.trr to ./#SR.sys.nvt.trr.13#
> > Back Off! I just backed up SR.sys.nvt.edr to ./#SR.sys.nvt.edr.13#
> > NOTE: DLB will not turn on during the first phase of PME tuning
> > starting mdrun 'SR-TA'
> > 100000 steps, 100.0 ps.
> >
> > NOTE: DLB can now turn on, when beneficial
> > ^C
> >
> > Received the INT signal, stopping within 200 steps
> >
> > Dynamic load balancing report:
> > DLB was off during the run due to low measured imbalance.
> > Average load imbalance: 0.7%.
> > The balanceable part of the MD step is 46%, load imbalance is computed
> from this.
> > Part of the total run time spent waiting due to load imbalance: 0.3%.
> >
> >
> > Core t (s) Wall t (s) (%)
> > Time: 953.837 158.973 600.0
> > (ns/day) (hour/ns)
> > Performance: 2.935 8.176
> >
> > ====================
> > the beginning of the log file is
> > GROMACS version: 2018.3
> > Precision: single
> > Memory model: 64 bit
> > MPI library: thread_mpi
> > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support: CUDA
> > SIMD instructions: SSE4.1
> > FFT library: fftw-3.3.8-sse2
> > RDTSCP usage: enabled
> > TNG support: enabled
> > Hwloc support: disabled
> > Tracing support: disabled
> > Built on: 2018-10-19 21:26:38
> > Built by: pb at Q4 [CMAKE]
> > Build OS/arch: Linux 4.15.0-20-generic x86_64
> > Build CPU vendor: Intel
> > Build CPU brand: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz
> > Build CPU family: 6 Model: 44 Stepping: 2
> > Build CPU features: aes apic clfsh cmov cx8 cx16 htt intel lahf mmx msr
> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1
> sse4.2 ssse3
> > C compiler: /usr/bin/gcc-6 GNU 6.4.0
> > C compiler flags: -msse4.1 -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> > C++ compiler: /usr/bin/g++-6 GNU 6.4.0
> > C++ compiler flags: -msse4.1 -std=c++11 -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> > CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
> Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
> > CUDA compiler
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
> ;-msse4.1;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver: 9.10
> > CUDA runtime: 9.10
> >
> >
> > Running on 1 node with total 12 cores, 12 logical cores, 2 compatible
> GPUs
> > Hardware detected:
> > CPU info:
> > Vendor: Intel
> > Brand: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz
> > Family: 6 Model: 44 Stepping: 2
> > Features: aes apic clfsh cmov cx8 cx16 htt intel lahf mmx msr
> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1
> sse4.2 ssse3
> > Hardware topology: Only logical processor count
> > GPU info:
> > Number of GPUs detected: 2
> > #0: NVIDIA GeForce GTX 1060 6GB, compute cap.: 6.1, ECC: no, stat:
> compatible
> > #1: NVIDIA GeForce GTX 1060 6GB, compute cap.: 6.1, ECC: no, stat:
> compatible
> >
> >
> > There were no errors encountered during the runs. Suggestions would be
> appreciated.
> > Regards
> > Paul
> >
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list