[gmx-developers] threads are now ON by default

Sander Pronk pronk at cbr.su.se
Mon Feb 15 14:31:26 CET 2010


I've looked at it, and there were two different problems: first, my implementation of MPI_Waitall() had a potential deadlock when called with 0 MPI_Requests (I've fixed that), and there's a deadlock in src/tools/md.c that involves the broadcasting of when to checkpoint. 
I've talked to Berk, and he's working on fixing that.

Sander



On 13 Feb 2010, at 14:08 , Alexey Shvetsov wrote:

> Hi,
> Yes its on the same place. Looks like there is some kind of race conditions.
> Input files available via ftp://alexxy.gentoo.ru/pub/gmx/
> 
> speptide.tar.bz2 contains finished run with md (leap frog)
> speptide-vv.tar.bz2 contains crushed run with md-vv
> i get deadlock near step 14k of md run without restrains
> 
> On Суббота 13 февраля 2010 00:23:51 Michael Shirts wrote:
>> Hi, Alexey-
>> 
>> Thanks for tracking this down.  md-vv is still getting the kinks
>> worked out.  Is this in the same place as the bug you were seeing a
>> couple of days ago, or a different place?
>> 
>> Sander, perhaps if you could check for non-threadsafeness (looks like
>> its in write_traj) since you're a bit more familiar -- if you can't
>> see it quickly, please let me know, and I'll try to track it down!
>> 
>> Best,
>> Michael
>> 
>>> Date: Fri, 12 Feb 2010 23:21:03 +0300
>>> From: Alexey Shvetsov <alexxyum at gmail.com>
>>> Subject: Re: [gmx-developers] threads are now ON by default
>>> To: Discussion list for GROMACS development
>>>       <gmx-developers at gromacs.org>
>>> Message-ID: <201002122321.15014.alexxyum at gmail.com>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> On Пятница 12 февраля 2010 19:32:46 Sander Pronk wrote:
>>>> Now that the last issues have been resolved with the threading code,
>>>> thread-based parallelization has been turned on by default. To disable
>>>> all the threading code, use
>>>> 
>>>> --disable-threads.
>>>> 
>>>> in configure, or turn the option GMX_THREADS off with ccmake.
>>>> 
>>>> Running mdrun with just one thread (the default) is almost exactly the
>>>> same as running it without threading code: the only thing that's
>>>> different is that the few remaining global variables are protected by
>>>> mutexes.
>>>> 
>>>> Performance-wise mdrun runs very slightly faster with threads than with
>>>> OpenMPI when Nthreads<=Ncores (and there is no other processes on the
>>>> computer). When Nthreads>Ncores (or other processes are running), the
>>>> thread code is much faster than OpenMPI, but the total runtime is still
>>>> smaller than when Nthreads==Ncores.
>>>> 
>>>> If there's any problems in getting things running, or with performance,
>>>> I'd very much like to hear about it.
>>>> 
>>>> Sander
>>> 
>>> Good news.
>>> but looks like i get deadlock when running gromacs with 4 threads (intel
>>> core i5 750) with md-vv integrator. I can share all input files. Same
>>> system with almost same input parameters except integrator (md) runs
>>> fine with 4 threads backtrace
>>> 0x00002b9c9fd9dd87 in sched_yield () at
>>> ../sysdeps/unix/syscall-template.S:82 82    
>>> ../sysdeps/unix/syscall-template.S: No such file or directory. in
>>> ../sysdeps/unix/syscall-template.S
>>> (gdb) bt
>>> #0  0x00002b9c9fd9dd87 in sched_yield () at ../sysdeps/unix/syscall-
>>> template.S:82
>>> #1  0x00002b9c9f5aeb5d in tMPI_Gather (sendbuf=0x4, sendcount=<value
>>> optimized out>, sendtype=<value optimized out>, recvbuf=<value optimized
>>> out>, recvcount=<value optimized out>, recvtype=<value optimized out>,
>>> root=0, comm=0xee2160)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/gmxlib/thread_mpi/gather.c:9
>>> 8 #2  0x00002b9c9f17885e in dd_gather (dd=<value optimized out>,
>>> nbytes=1607464840, src=0x1290748, dest=0xffffffffffffffff)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec_network.c:233
>>> #3  0x00002b9c9f16f841 in dd_collect_cg (dd=<value optimized out>,
>>> state_local=0x12d3600, lv=<value optimized out>, v=0x2b9cb4001010)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec.c:1263
>>> #4  dd_collect_vec (dd=<value optimized out>, state_local=0x12d3600,
>>> lv=<value optimized out>, v=0x2b9cb4001010)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec.c:1420
>>> #5  0x00002b9c9f170df9 in dd_collect_state (dd=0x1287df0,
>>> state_local=0x12d3600, state=0x2b9ca83782b0)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec.c:1474
>>> #6  0x00002b9c9f1d0b7b in write_traj (fplog=<value optimized out>,
>>> cr=0x2b9ca8377ad0, fp_trn=<value optimized out>, bX=-1, bV=8, bF=0,
>>> fp_xtc=-1, bXTC=0,
>>>   xtc_prec=1000, fn_cpt=0xee2490 "speptide.md.cpt", bCPT=1,
>>> top_global=0x2b9ca83780b0, eIntegrator=10, simulation_part=1, step=14690,
>>>   t=29.379999999999999, state_local=0x12d3600,
>>> state_global=0x2b9ca83782b0, f_local=0x12d8b00, f_global=0x2b9cb4c02010,
>>> n_xtc=0x7fff5fd02398, x_xtc=0x7fff5fd02320) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/stat.c:473
>>> #7  0x0000000000414318 in do_md (fplog=<value optimized out>, cr=<value
>>> optimized out>, nfile=<value optimized out>, fnm=<value optimized out>,
>>>   oenv=<value optimized out>, bVerbose=<value optimized out>,
>>> bCompact=1, nstglobalcomm=1, vsite=0x0, constr=0x129b5d0, stepout=100,
>>> ir=0x2b9ca8377b40, top_global=0x2b9ca83780b0, fcd=0x1287c60,
>>> state_global=0x2b9ca83782b0, mdatoms=0x1296de0, nrnb=0x1290b90,
>>> wcycle=0x1290800, ed=0x0, fr=0x1291200, repl_ex_nst=0, repl_ex_seed=-1,
>>> cpt_period=<value optimized out>, max_hours=<value optimized out>,
>>> Flags=7168, runtime=0x7fff5fd02710) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/md.c:1943
>>> #8  0x000000000040f155 in mdrunner (fplog=0xee1c50, cr=0x2b9ca8377ad0,
>>> nfile=<value optimized out>, fnm=<value optimized out>, oenv=<value
>>> optimized out>,
>>>   bVerbose=<value optimized out>, bCompact=1, nstglobalcomm=-1,
>>> ddxyz=0x7fff5fd02844, dd_node_order=1, rdd=<value optimized out>,
>>>   rconstr=<value optimized out>, dddlb_opt=0x41f64d "auto",
>>> dlb_scale=<value optimized out>, ddcsx=0x0, ddcsy=0x0, ddcsz=0x0,
>>> nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_seed=-1,
>>> pforce=<value optimized out>, cpt_period=<value optimized out>,
>>> max_hours=<value optimized out>, Flags=7168) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/runner.c:669
>>> #9  0x0000000000410168 in mdrunner_start_fn (arg=<value optimized out>)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/runner.c:170
>>> #10 0x00002b9c9f5af8d4 in tMPI_Thread_starter (arg=<value optimized out>)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/gmxlib/thread_mpi/tmpi_init.
>>> c:360 #11 0x00002b9c9f5afc04 in tMPI_Init_fn (N=19466056,
>>> start_function=<value optimized out>, arg=<value optimized out>)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/gmxlib/thread_mpi/tmpi_init.
>>> c:472 #12 0x000000000040ff19 in mdrunner_threads (nthreads=4,
>>> fplog=<value optimized out>, cr=<value optimized out>, nfile=<value
>>> optimized out>,
>>>   fnm=<value optimized out>, oenv=<value optimized out>, bVerbose=1,
>>> bCompact=1, nstglobalcomm=-1, ddxyz=0x7fff5fd04b10, dd_node_order=1,
>>>   rdd=<value optimized out>, rconstr=<value optimized out>,
>>> dddlb_opt=0x41f64d "auto", dlb_scale=<value optimized out>, ddcsx=0x0,
>>> ddcsy=0x0, ddcsz=0x0,
>>>   nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0,
>>> repl_ex_seed=-1, pforce=<value optimized out>, cpt_period=<value
>>> optimized out>,
>>>   max_hours=<value optimized out>, Flags=7168) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/runner.c:238
>>> #13 0x0000000000419a1b in main (argc=6, argv=0x7fff5fd04ce8) at
>>> /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/mdrun.c:519
>>> Current language:  auto
>>> The current source language is "auto; currently asm".
>>> (gdb) up
>>> #1  0x00002b9c9f5aeb5d in tMPI_Gather (sendbuf=0x4, sendcount=<value
>>> optimized out>, sendtype=<value optimized out>, recvbuf=<value optimized
>>> out>, recvcount=<value optimized out>, recvtype=<value optimized out>,
>>> root=0, comm=0xee2160)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/gmxlib/thread_mpi/gather.c:9
>>> 8 98                      TMPI_YIELD_WAIT(cur);
>>> Current language:  auto
>>> The current source language is "auto; currently c".
>>> (gdb) up
>>> #2  0x00002b9c9f17885e in dd_gather (dd=<value optimized out>,
>>> nbytes=1607464840, src=0x1290748, dest=0xffffffffffffffff)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec_network.c:233
>>> 233         MPI_Gather(src,nbytes,MPI_BYTE,
>>> (gdb) up
>>> #3  0x00002b9c9f16f841 in dd_collect_cg (dd=<value optimized out>,
>>> state_local=0x12d3600, lv=<value optimized out>, v=0x2b9cb4001010)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec.c:1263
>>> 1263        dd_gather(dd,2*sizeof(int),buf2,ibuf);
>>> (gdb) up
>>> #4  dd_collect_vec (dd=<value optimized out>, state_local=0x12d3600,
>>> lv=<value optimized out>, v=0x2b9cb4001010)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec.c:1420
>>> 1420        dd_collect_cg(dd,state_local);
>>> (gdb) up
>>> #5  0x00002b9c9f170df9 in dd_collect_state (dd=0x1287df0,
>>> state_local=0x12d3600, state=0x2b9ca83782b0)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/domdec.c:1474
>>> 1474                  
>>> dd_collect_vec(dd,state_local,state_local->x,state-
>>> 
>>>> x);
>>>> 
>>> (gdb) up
>>> #6  0x00002b9c9f1d0b7b in write_traj (fplog=<value optimized out>,
>>> cr=0x2b9ca8377ad0, fp_trn=<value optimized out>, bX=-1, bV=8, bF=0,
>>> fp_xtc=-1, bXTC=0,
>>>   xtc_prec=1000, fn_cpt=0xee2490 "speptide.md.cpt", bCPT=1,
>>> top_global=0x2b9ca83780b0, eIntegrator=10, simulation_part=1, step=14690,
>>>   t=29.379999999999999, state_local=0x12d3600,
>>> state_global=0x2b9ca83782b0, f_local=0x12d8b00, f_global=0x2b9cb4c02010,
>>> n_xtc=0x7fff5fd02398, x_xtc=0x7fff5fd02320) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/mdlib/stat.c:473
>>> 473                 dd_collect_state(cr->dd,state_local,state_global);
>>> (gdb) up
>>> #7  0x0000000000414318 in do_md (fplog=<value optimized out>, cr=<value
>>> optimized out>, nfile=<value optimized out>, fnm=<value optimized out>,
>>>   oenv=<value optimized out>, bVerbose=<value optimized out>,
>>> bCompact=1, nstglobalcomm=1, vsite=0x0, constr=0x129b5d0, stepout=100,
>>> ir=0x2b9ca8377b40, top_global=0x2b9ca83780b0, fcd=0x1287c60,
>>> state_global=0x2b9ca83782b0, mdatoms=0x1296de0, nrnb=0x1290b90,
>>> wcycle=0x1290800, ed=0x0, fr=0x1291200, repl_ex_nst=0, repl_ex_seed=-1,
>>> cpt_period=<value optimized out>, max_hours=<value optimized out>,
>>> Flags=7168, runtime=0x7fff5fd02710) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/md.c:1943
>>> 1943                write_traj(fplog,cr,fp_trn,bX,bV,bF,fp_xtc,bXTC,ir-
>>> 
>>>> xtcprec,
>>>> 
>>> (gdb) up
>>> #8  0x000000000040f155 in mdrunner (fplog=0xee1c50, cr=0x2b9ca8377ad0,
>>> nfile=<value optimized out>, fnm=<value optimized out>, oenv=<value
>>> optimized out>,
>>>   bVerbose=<value optimized out>, bCompact=1, nstglobalcomm=-1,
>>> ddxyz=0x7fff5fd02844, dd_node_order=1, rdd=<value optimized out>,
>>>   rconstr=<value optimized out>, dddlb_opt=0x41f64d "auto",
>>> dlb_scale=<value optimized out>, ddcsx=0x0, ddcsy=0x0, ddcsz=0x0,
>>> nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_seed=-1,
>>> pforce=<value optimized out>, cpt_period=<value optimized out>,
>>> max_hours=<value optimized out>, Flags=7168) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/runner.c:669
>>> 669             integrator[inputrec->eI].func(fplog,cr,nfile,fnm,
>>> (gdb) up
>>> #9  0x0000000000410168 in mdrunner_start_fn (arg=<value optimized out>)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/runner.c:170
>>> 170         mda->ret=mdrunner(fplog, cr, mc.nfile, mc.fnm, mc.oenv,
>>> mc.bVerbose,
>>> (gdb) up
>>> #10 0x00002b9c9f5af8d4 in tMPI_Thread_starter (arg=<value optimized out>)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/gmxlib/thread_mpi/tmpi_init.
>>> c:360 360             th->start_fn(th->start_arg);
>>> (gdb) up
>>> #11 0x00002b9c9f5afc04 in tMPI_Init_fn (N=19466056, start_function=<value
>>> optimized out>, arg=<value optimized out>)
>>>   at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/gmxlib/thread_mpi/tmpi_init.
>>> c:472 472             tMPI_Start_threads(N, 0, 0, start_function, arg);
>>> (gdb) up
>>> #12 0x000000000040ff19 in mdrunner_threads (nthreads=4, fplog=<value
>>> optimized out>, cr=<value optimized out>, nfile=<value optimized out>,
>>>   fnm=<value optimized out>, oenv=<value optimized out>, bVerbose=1,
>>> bCompact=1, nstglobalcomm=-1, ddxyz=0x7fff5fd04b10, dd_node_order=1,
>>>   rdd=<value optimized out>, rconstr=<value optimized out>,
>>> dddlb_opt=0x41f64d "auto", dlb_scale=<value optimized out>, ddcsx=0x0,
>>> ddcsy=0x0, ddcsz=0x0,
>>>   nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0,
>>> repl_ex_seed=-1, pforce=<value optimized out>, cpt_period=<value
>>> optimized out>,
>>>   max_hours=<value optimized out>, Flags=7168) at /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/runner.c:238
>>> 238             tMPI_Init_fn(nthreads, mdrunner_start_fn, (void*)(&mda)
>>> ); (gdb) up
>>> #13 0x0000000000419a1b in main (argc=6, argv=0x7fff5fd04ce8) at
>>> /var/tmp/portage/sci-
>>> chemistry/gromacs-9999/work/gromacs-9999/src/kernel/mdrun.c:519
>>> 519       rc = mdrunner_threads(nthreads,
>>> (gdb) up
>>> 
>>> 
>>> --
>>> Best Regards,
>>> Alexey 'Alexxy' Shvetsov
>>> Petersburg Nuclear Physics Institute, Russia
>>> Department of Molecular and Radiation Biophysics
>>> Gentoo Team Ru
>>> Gentoo Linux Dev
>>> mailto:alexxyum at gmail.com
>>> mailto:alexxy at gentoo.org
>>> mailto:alexxy at omrb.pnpi.spb.ru
>>> -------------- next part --------------
>>> A non-text attachment was scrubbed...
>>> Name: not available
>>> Type: application/pgp-signature
>>> Size: 198 bytes
>>> Desc: This is a digitally signed message part.
>>> Url :
>>> http://lists.gromacs.org/pipermail/gmx-developers/attachments/20100212/1
>>> 59ee41f/attachment.bin
>>> 
>>> ------------------------------
>>> 
>>> --
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> 
>>> 
>>> End of gmx-developers Digest, Vol 70, Issue 8
>>> *********************************************
> 
> -- 
> Best Regards,
> Alexey 'Alexxy' Shvetsov
> Petersburg Nuclear Physics Institute, Russia
> Department of Molecular and Radiation Biophysics
> Gentoo Team Ru
> Gentoo Linux Dev
> mailto:alexxyum at gmail.com
> mailto:alexxy at gentoo.org
> mailto:alexxy at omrb.pnpi.spb.ru
> -- 
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-developers-request at gromacs.org.




More information about the gromacs.org_gmx-developers mailing list