[gmx-users] AMD 32 core TR

paul buscemi pbuscemi at q.com
Fri Jan 4 04:18:56 CET 2019


Tamas,  thanks for the response.

In previous posts I mention using a single gtx 1080ti, sorry for not making it clear in the last post.
 
 On the 8 core AMD and an Intel 6 core I am running Cuda 10 with Gromacs 18.3 with no issues.  I believe the larger factor in the slowness of the 32 core was in having the runtime Cuda 7 with the cuda 10 drivers.  On the 8 core, runtime cuda 9.1 and cuda 10 drivers work well together  - all with Gromacs 18.3.  Now with  Gromacs v19 , Cuda 10 and the 410 nvidia drivers, the 8 core and 32 core systems seem quite content.

I have been tracing results from the log, and you are correct in what it can tell you.  It was the log file that actually brought my attention to the Cuda 7 runtime issue. Also the PP PME distributions were noted with the ntomp/ntmpi arrangements. I have been experimenting with those as suggested in the Gromas acceleration hints.

By 10% I meant that the 32 core unit ( in my hands ) ran 10% faster  in ns/day than the 8 core AMD using the same model system and the the same 1080ti  GPU.  Gromacs points out that  150k to 300k atom systems are on the rather small side and so not to expect tremendous differences from the CPU.  The reason for using the 32 core is the eventual addition of a second GPU and the subsequent distribution of threads.

With a little OC and tweeking of the fourier spacing and vdw cutoffs in the npt I edged the 137k atom AHD model  to 57 ns/day,  but this falls short of  the Exxact corp benchmarks of 80-90 ns/d —  assuming they are using a 1080ti.  Schrodinger’s Maestro- with the 8 core AMD and 1080ti -  runs a 300k membrane model at about 15 ns/d  but a  60k atom model at 150 ns/day implying  30 ns/day for 300k atoms.  . In general, if I can indeed maintain 20-25 ns/day for 300k atoms I’d be satisfied.  The original posts were made because I was frustrated seeing 6 to 8 ns/d with the 32core machine and the 8 core was producing 20 ns/day.   As I mentioned the wounds were self inflicted  with the installation of Cuda runtime 7 and at one point compilation with g++-5.   As far as I am concerned it’s imperative that the latest drivers and Gromacs versions be used or at least the same  genre of drivers and versions be assembled.

Again, I’d like to point out that in using four different machines, 4 different Intel and AMD  CPU’s, 5 different MBs,  5 different GPU’s, now 4 progressive versions of Gromacs, and model systems of 200-300 k particles, I’ve not run across a single problem associated with the software or hardware per se but rather was caused by the my models or my compilation methods. 

Hope this addresses your questions and helps any other users contemplating using a Ryzen TR.

Paul



> On Jan 3, 2019, at 2:09 PM, Tamas Hegedus <tamas at hegelab.org> wrote:
> 
> Please provide more information.
> 
> If you use gmx 2018 then I think that gmx limits the gcc version to 6 and not cuda 10.
> 
> You did not specify what type of and how many GPUs you use.
> 
> In addition, the choice of gmx for distributing computation could be also informative - you find this info in the log file.
> 
> It is also not clear what do you mean of 10% improvement: 8ns/day to 26ns/day are the only numbers but it corresponds to 3x faster simulations and not 1.1x
> 
> In addition, I think if you have 49.5 ns/day for 137K atoms than 26ns/day seems to be ok for 300K.
> 
> Bests, Tamas
> 
> 
> On 1/3/19 6:11 PM, pbuscemi at q.com wrote:
>> Dear users,
>> 
>>  
>> I had trouble getting suitable performance from an AMD 32 core TR.  By
>> updating  all the cuda drivers and runtime to v10  and using gcc,g++ -6 from
>> v5  -- I did try gcc-7 but Cuda 10 did not appreciate the attempt  --  and
>> in particular removing  CUDA v7 runtime.), I was able to improve a 300k atom
>> nvt run from 8 ns/day to 26 ns/day .  I replicated  as far as possible the
>> Gromacs ADH benchmark with 137000 atoms-spc/e.  I could achieve an md of
>> 49.5 ns/day. I do not have a firm grasp if this is respectable or not (
>> comments ? )  but appears at least ok.   The input command was simply mdrun
>> ADH.md   -nb gpu  -pme gp   ( and not using -ntomp or ntmpi which in my
>> hands degraded performance ) .   To run the ADH  I replaced the two ZN ions
>> in  ADH file from PDB ( 2ieh.pdb ) with CA ions  since ZN was not found in
>> the OPLS data base in using pdb2gmx.
>> 
>>  
>> The points being ( 1) Gromacs appears reasonably happy with  the 8 core and
>> 32 core Ryzen although ( again in my hands ) for these  smallish systems
>> there is only about a 10% improvement between the two, and  (2) , as often
>> suggested in the Gromacs literature, use the latest drivers possible
>> 
>>  
>>  
> -- 
> Tamas Hegedus, PhD
> Senior Research Fellow
> MTA-SE Molecular Biophysics Research Group
> Hungarian Academy of Sciences  | phone: (36) 1-459 1500/60233
> Semmelweis University          | fax:   (36) 1-266 6656
> Tuzolto utca 37-47             | mailto:tamas at hegelab.org
> Budapest, 1094, Hungary        | http://www.hegelab.org
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.



More information about the gromacs.org_gmx-users mailing list