[gmx-developers] gromacs 5.1rc1 OpenCL problem with Parrinello-Rahman

Carlo Camilloni carlo.camilloni at gmail.com
Thu Jul 16 18:52:08 CEST 2015


Hi, 

I tested the OpenCL kernel on my macbook (nvidia gpu) and here it produces the correct forces,
so it could be a problem related to amd+osx, or maybe to some specific compiler/os version

Carlo


> On 15 Jul 2015, at 17:42, Carlo Camilloni <carlo.camilloni at gmail.com> wrote:
> 
> Hi,
> 
> these are the tests that fail:
> 
> FAILED. Check checkpot.out (12 errors), checkforce.out (3516 errors) file(s) in dd121 for dd121
> FAILED. Check checkpot.out (10 errors), checkforce.out (4027 errors) file(s) in nbnxn-energy-groups for nbnxn-energy-groups
> FAILED. Check checkpot.out (26 errors), checkforce.out (2998 errors) file(s) in nbnxn-free-energy for nbnxn-free-energy
> FAILED. Check checkpot.out (26 errors), checkforce.out (2998 errors) file(s) in nbnxn-free-energy-vv for nbnxn-free-energy-vv
> FAILED. Check checkpot.out (11 errors), checkforce.out (4039 errors) file(s) in nbnxn-ljpme-geometric for nbnxn-ljpme-geometric
> FAILED. Check checkpot.out (14 errors), checkforce.out (52 errors) file(s) in nbnxn-ljpme-LB for nbnxn-ljpme-LB
> FAILED. Check checkpot.out (14 errors), checkforce.out (52 errors) file(s) in nbnxn-ljpme-LB-geometric for nbnxn-ljpme-LB-geometric
> FAILED. Check checkpot.out (10 errors), checkforce.out (4029 errors) file(s) in nbnxn-vdw-force-switch for nbnxn-vdw-force-switch
> FAILED. Check checkpot.out (10 errors), checkforce.out (4032 errors) file(s) in nbnxn-vdw-potential-switch for nbnxn-vdw-potential-switch
> FAILED. Check checkpot.out (4 errors), checkforce.out (250 errors) file(s) in nbnxn-vdw-potential-switch-argon for nbnxn-vdw-potential-switch-argon
> FAILED. Check checkpot.out (10 errors), checkforce.out (4027 errors) file(s) in nbnxn_pme for nbnxn_pme
> FAILED. Check checkpot.out (10 errors), checkforce.out (4027 errors) file(s) in nbnxn_pme_order5 for nbnxn_pme_order5
> FAILED. Check checkpot.out (10 errors), checkforce.out (4027 errors) file(s) in nbnxn_pme_order6 for nbnxn_pme_order6
> FAILED. Check checkpot.out (9 errors), checkforce.out (4028 errors) file(s) in nbnxn_rf for nbnxn_rf
> FAILED. Check checkpot.out (2 errors), checkforce.out (4 errors) file(s) in nbnxn_rzero for nbnxn_rzero
> FAILED. Check mdrun.out, md.log file(s) in nbnxn_vsite for nbnxn_vsite
> FAILED. Check checkpot.out (13 errors), checkforce.out (15512 errors) file(s) in octahedron for octahedron
> FAILED. Check mdrun.out, md.log file(s) in position-restraints for position-restraints
> FAILED. Check mdrun.out, md.log file(s) in pull_constraint for pull_constraint
> FAILED. Check checkpot.out (10 errors), checkforce.out (4021 errors) file(s) in pull_cylinder for pull_cylinder
> FAILED. Check checkpot.out (11 errors), checkforce.out (39054 errors) file(s) in swap_x for swap_x
> FAILED. Check checkpot.out (11 errors), checkforce.out (39053 errors) file(s) in swap_y for swap_y
> FAILED. Check checkpot.out (12 errors), checkforce.out (39054 errors) file(s) in swap_z for swap_z
> 23 out of 60 complex tests FAILED
> FAILED. Check mdrun.out, md.log file(s) in expanded for expanded
> FAILED. Check mdrun.out, md.log file(s) in transformAtoB for transformAtoB
> 2 out of 10 freeenergy tests FAILED
> 
> 
> Carlo
> 
> 
>> 
>> 
>> Message: 4
>> Date: Wed, 15 Jul 2015 15:35:13 +0000
>> From: Mark Abraham <mark.j.abraham at gmail.com>
>> To: gmx-developers at gromacs.org,
>> 	gromacs.org_gmx-developers at maillist.sys.kth.se
>> Subject: Re: [gmx-developers] gromacs 5.1rc1 OpenCL problem with
>> 	Parrinello-Rahman
>> Message-ID:
>> 	<CAMNuMATveVRRyBBwn312xrY+w3M7deC2Hs3A7PZnaeugkw+VVA at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>> 
>> Hi,
>> 
>> Thanks. If a difference of that magnitude can be seen, then it should also
>> show up when running the regressiontests (e.g. cmake
>> -DREGRESSIONTEST_DOWNLOAD=on and then make check) as a failure
>> of complex/nbnxn-ljpme-LB (which is the only P-R test that can run on the
>> GPU). If other tests fail, then the problem is actually more widespread.
>> 
>> It may be that there is some issue with some part of the Mac+clang+OpenCL
>> stack - we didn't target it during development, and at the last minute when
>> Erik was unexpectedly able to get it to compile. I don't know if he got
>> tests to pass. Erik?
>> 
>> Mark
>> 
>> On Wed, Jul 15, 2015 at 5:22 PM Carlo Camilloni <carlo.camilloni at gmail.com>
>> wrote:
>> 
>>> 
>>> Dear Mark and Szilard,
>>> 
>>> thanks for your answer. I filed a bug in redmine but in the meantime I was
>>> running more tests and I am a bit scared by what I found:
>>> 
>>> what I have done is the following I have performed a single step run with
>>> gmx51-rc1 compiled with cuda, again clang and so on
>>> and compared the forces on the first step with -nb cpu or not (I am using
>>> -pforce 1), the forces are identical:
>>> 
>>> ie.:
>>> 
>>> cuda-gpu
>>> 
>>> step 0  atom      1  x    3.940    5.612    2.226  force  1.90839e+03
>>> step 0  atom      2  x    3.852    5.659    2.211  force  4.24845e+02
>>> step 0  atom      3  x    3.979    5.665    2.303  force  6.89472e+02
>>> step 0  atom      4  x    3.992    5.610    2.139  force  7.42053e+02
>>> 
>>> 
>>> cpu:
>>> 
>>> step 0  atom      1  x    3.940    5.612    2.226  force  1.90839e+03
>>> step 0  atom      2  x    3.852    5.659    2.211  force  4.24845e+02
>>> step 0  atom      3  x    3.979    5.665    2.303  force  6.89472e+02
>>> step 0  atom      4  x    3.992    5.610    2.139  force  7.42053e+02
>>> 
>>> if I do the same test on the version compiled with OpenCL
>>> 
>>> cpu:
>>> 
>>> (the former are done on my macbook pro avx2_256  this latter on a MacPro
>>> avx_256, this should
>>> explain the small differences in the forces)
>>> 
>>> step 0  atom      1  x    3.940    5.612    2.226  force  1.90838e+03
>>> step 0  atom      2  x    3.852    5.659    2.211  force  4.24848e+02
>>> step 0  atom      3  x    3.979    5.665    2.303  force  6.89470e+02
>>> step 0  atom      4  x    3.992    5.610    2.139  force  7.42043e+02
>>> 
>>> opencl-gpu:
>>> step 0  atom      1  x    3.940    5.612    2.226  force  1.48597e+03
>>> step 0  atom      2  x    3.852    5.659    2.211  force  6.26942e+02
>>> step 0  atom      3  x    3.979    5.665    2.303  force  8.44032e+02
>>> step 0  atom      4  x    3.992    5.610    2.139  force  7.92786e+02
>>> 
>>> I am afraid there is something wrong  in OpenCL kernels.
>>> 
>>> I am using the topol-nvt-nogen.tpr I have uploaded on redmine.
>>> 
>>> Best,
>>> Carlo
>>> 
>>> 
>>> 
>>> --
>>> Gromacs Developers mailing list
>>> 
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>>> posting!
>>> 
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> 
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>> or send a mail to gmx-developers-request at gromacs.org.
>>> 
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20150715/fb1d4126/attachment.html>
>> 
>> ------------------------------
>> 
>> -- 
>> Gromacs Developers mailing list
>> 
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>> 
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.
>> 
>> End of gromacs.org_gmx-developers Digest, Vol 135, Issue 17
>> ***********************************************************
> 



More information about the gromacs.org_gmx-developers mailing list