[gmx-users] mdrun_mpi issue with CHARMM36 FF

Mark Abraham Mark.Abraham at anu.edu.au
Mon May 14 08:21:00 CEST 2012


On 14/05/2012 4:18 PM, Anirban wrote:
>
>
> On Mon, May 14, 2012 at 11:35 AM, Mark Abraham 
> <Mark.Abraham at anu.edu.au <mailto:Mark.Abraham at anu.edu.au>> wrote:
>
>     On 14/05/2012 3:52 PM, Anirban wrote:
>>     Hi ALL,
>>
>>     I am trying to simulate a membrane protein system using CHARMM36
>>     FF on GROAMCS4.5.5 on a parallel cluster running on MPI. The
>>     system consists of arounf 1,17,000 atoms. The job runs fine on 5
>>     nodes (5X12=120 cores) using mpirun and gives proper output. But
>>     whenever I try to submit it on more than 5 nodes, the job gets
>>     killed with the following error:
>
>     That's likely going to be an issue with the configuration of your
>     MPI system, or your hardware, or both. Do check your .log file for
>     evidence of unsuitable DD partiion, though the fact of "turning on
>     dynamic load balancing" suggest DD partitioning worked OK.
>
>     Mark
>
>
> Hello Mark,
>
> Thanks for the reply.
> The .log file reports no error/warning and ends abruptly with the 
> following last lines:

That's most consistent with a problem external to GROMACS.

Mark

>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Making 3D domain decomposition grid 4 x 3 x 9, home cell index 0 0 0
>
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
>   0:  Protein_POPC
>   1:  SOL_CL
> There are: 117548 Atoms
> Charge group distribution at step 0: 358 353 443 966 1106 746 374 351 
> 352 352 358 454 975 1080 882 381 356 357 357 358 375 770 1101 882 365 
> 359 358 351 348 487 983 1051 912 377 344 361 363 352 596 1051 1036 
> 1050 553 351 349 366 352 375 912 1125 1045 478 351 344 356 362 445 971 
> 1040 959 520 405 355 357 355 639 1032 1072 1096 790 474 353 349 345 
> 449 1019 1047 971 444 354 357 355 357 391 946 1093 904 375 367 368 349 
> 349 409 934 1082 867 406 350 350 364 341 398 978 1104 937 415 341 368
> Grid: 6 x 7 x 4 cells
> Initial temperature: 300.318 K
>
> Started mdrun on node 0 Fri May 11 20:43:52 2012
>
>            Step           Time         Lambda
>               0        0.00000        0.00000
>
>    Energies (kJ/mol)
>             U-B    Proper Dih.  Improper Dih.      CMAP Dih.         
>  LJ-14
>     8.67972e+04    6.15820e+04    1.38445e+03   -1.60452e+03   
>  1.44395e+04
>      Coulomb-14        LJ (SR)   Coulomb (SR)   Coul. recip.     
>  Potential
>    -5.21377e+04    4.98413e+04   -1.21372e+06   -8.94296e+04   
> -1.14284e+06
>     Kinetic En.   Total Energy    Temperature Pressure (bar)   Constr. 
> rmsd
>     2.93549e+05   -8.49294e+05    3.00132e+02   -1.80180e+01   
>  1.40708e-05
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Any suggestion is welcome.
>
> Thanks,
>
> Anirban
>
>
>>
>>     -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     starting mdrun 'Protein'
>>     50000000 steps, 100000.0 ps.
>>
>>     NOTE: Turning on dynamic load balancing
>>
>>     Fatal error in MPI_Sendrecv: Other MPI error
>>     Fatal error in MPI_Sendrecv: Other MPI error
>>     Fatal error in MPI_Sendrecv: Other MPI error
>>
>>     =====================================================================================
>>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>     =   EXIT CODE: 256
>>     =   CLEANING UP REMAINING PROCESSES
>>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>     =====================================================================================
>>     [proxy:0:0 at cn034] HYD_pmcd_pmip_control_cmd_cb
>>     (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>>     [proxy:0:0 at cn034] HYDT_dmxu_poll_wait_for_event
>>     (./tools/demux/demux_poll.c:77): callback returned error status
>>     [proxy:0:0 at cn034] main (./pm/pmiserv/pmip.c:214): demux engine
>>     error waiting for event
>>     .
>>     .
>>     .
>>     ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     Why is this happening? Is it related to DD and PME? How to solve
>>     it? Any suggestion is welcome.
>>     Sorry for re-posting.
>>
>>
>>     Thanks and regards,
>>
>>     Anirban
>>
>>
>>
>>
>
>
>     --
>     gmx-users mailing list gmx-users at gromacs.org
>     <mailto:gmx-users at gromacs.org>
>     http://lists.gromacs.org/mailman/listinfo/gmx-users
>     Please search the archive at
>     http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>     Please don't post (un)subscribe requests to the list. Use the
>     www interface or send it to gmx-users-request at gromacs.org
>     <mailto:gmx-users-request at gromacs.org>.
>     Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120514/dd29137c/attachment.html>


More information about the gromacs.org_gmx-users mailing list