[gmx-users] mdrun_mpi issue with CHARMM36 FF
Mark Abraham
Mark.Abraham at anu.edu.au
Mon May 14 08:21:00 CEST 2012
On 14/05/2012 4:18 PM, Anirban wrote:
>
>
> On Mon, May 14, 2012 at 11:35 AM, Mark Abraham
> <Mark.Abraham at anu.edu.au <mailto:Mark.Abraham at anu.edu.au>> wrote:
>
> On 14/05/2012 3:52 PM, Anirban wrote:
>> Hi ALL,
>>
>> I am trying to simulate a membrane protein system using CHARMM36
>> FF on GROAMCS4.5.5 on a parallel cluster running on MPI. The
>> system consists of arounf 1,17,000 atoms. The job runs fine on 5
>> nodes (5X12=120 cores) using mpirun and gives proper output. But
>> whenever I try to submit it on more than 5 nodes, the job gets
>> killed with the following error:
>
> That's likely going to be an issue with the configuration of your
> MPI system, or your hardware, or both. Do check your .log file for
> evidence of unsuitable DD partiion, though the fact of "turning on
> dynamic load balancing" suggest DD partitioning worked OK.
>
> Mark
>
>
> Hello Mark,
>
> Thanks for the reply.
> The .log file reports no error/warning and ends abruptly with the
> following last lines:
That's most consistent with a problem external to GROMACS.
Mark
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Making 3D domain decomposition grid 4 x 3 x 9, home cell index 0 0 0
>
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
> 0: Protein_POPC
> 1: SOL_CL
> There are: 117548 Atoms
> Charge group distribution at step 0: 358 353 443 966 1106 746 374 351
> 352 352 358 454 975 1080 882 381 356 357 357 358 375 770 1101 882 365
> 359 358 351 348 487 983 1051 912 377 344 361 363 352 596 1051 1036
> 1050 553 351 349 366 352 375 912 1125 1045 478 351 344 356 362 445 971
> 1040 959 520 405 355 357 355 639 1032 1072 1096 790 474 353 349 345
> 449 1019 1047 971 444 354 357 355 357 391 946 1093 904 375 367 368 349
> 349 409 934 1082 867 406 350 350 364 341 398 978 1104 937 415 341 368
> Grid: 6 x 7 x 4 cells
> Initial temperature: 300.318 K
>
> Started mdrun on node 0 Fri May 11 20:43:52 2012
>
> Step Time Lambda
> 0 0.00000 0.00000
>
> Energies (kJ/mol)
> U-B Proper Dih. Improper Dih. CMAP Dih.
> LJ-14
> 8.67972e+04 6.15820e+04 1.38445e+03 -1.60452e+03
> 1.44395e+04
> Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
> Potential
> -5.21377e+04 4.98413e+04 -1.21372e+06 -8.94296e+04
> -1.14284e+06
> Kinetic En. Total Energy Temperature Pressure (bar) Constr.
> rmsd
> 2.93549e+05 -8.49294e+05 3.00132e+02 -1.80180e+01
> 1.40708e-05
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Any suggestion is welcome.
>
> Thanks,
>
> Anirban
>
>
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> starting mdrun 'Protein'
>> 50000000 steps, 100000.0 ps.
>>
>> NOTE: Turning on dynamic load balancing
>>
>> Fatal error in MPI_Sendrecv: Other MPI error
>> Fatal error in MPI_Sendrecv: Other MPI error
>> Fatal error in MPI_Sendrecv: Other MPI error
>>
>> =====================================================================================
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = EXIT CODE: 256
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> =====================================================================================
>> [proxy:0:0 at cn034] HYD_pmcd_pmip_control_cmd_cb
>> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> [proxy:0:0 at cn034] HYDT_dmxu_poll_wait_for_event
>> (./tools/demux/demux_poll.c:77): callback returned error status
>> [proxy:0:0 at cn034] main (./pm/pmiserv/pmip.c:214): demux engine
>> error waiting for event
>> .
>> .
>> .
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Why is this happening? Is it related to DD and PME? How to solve
>> it? Any suggestion is welcome.
>> Sorry for re-posting.
>>
>>
>> Thanks and regards,
>>
>> Anirban
>>
>>
>>
>>
>
>
> --
> gmx-users mailing list gmx-users at gromacs.org
> <mailto:gmx-users at gromacs.org>
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org
> <mailto:gmx-users-request at gromacs.org>.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120514/dd29137c/attachment.html>
More information about the gromacs.org_gmx-users
mailing list