[gmx-users] mdrun_mpi issue with CHARMM36 FF
Anirban
reach.anirban.ghosh at gmail.com
Mon May 14 08:18:34 CEST 2012
On Mon, May 14, 2012 at 11:35 AM, Mark Abraham <Mark.Abraham at anu.edu.au>wrote:
> On 14/05/2012 3:52 PM, Anirban wrote:
>
> Hi ALL,
>
> I am trying to simulate a membrane protein system using CHARMM36 FF on
> GROAMCS4.5.5 on a parallel cluster running on MPI. The system consists of
> arounf 1,17,000 atoms. The job runs fine on 5 nodes (5X12=120 cores) using
> mpirun and gives proper output. But whenever I try to submit it on more
> than 5 nodes, the job gets killed with the following error:
>
>
> That's likely going to be an issue with the configuration of your MPI
> system, or your hardware, or both. Do check your .log file for evidence of
> unsuitable DD partiion, though the fact of "turning on dynamic load
> balancing" suggest DD partitioning worked OK.
>
> Mark
>
>
Hello Mark,
Thanks for the reply.
The .log file reports no error/warning and ends abruptly with the following
last lines:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Making 3D domain decomposition grid 4 x 3 x 9, home cell index 0 0 0
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: Protein_POPC
1: SOL_CL
There are: 117548 Atoms
Charge group distribution at step 0: 358 353 443 966 1106 746 374 351 352
352 358 454 975 1080 882 381 356 357 357 358 375 770 1101 882 365 359 358
351 348 487 983 1051 912 377 344 361 363 352 596 1051 1036 1050 553 351 349
366 352 375 912 1125 1045 478 351 344 356 362 445 971 1040 959 520 405 355
357 355 639 1032 1072 1096 790 474 353 349 345 449 1019 1047 971 444 354
357 355 357 391 946 1093 904 375 367 368 349 349 409 934 1082 867 406 350
350 364 341 398 978 1104 937 415 341 368
Grid: 6 x 7 x 4 cells
Initial temperature: 300.318 K
Started mdrun on node 0 Fri May 11 20:43:52 2012
Step Time Lambda
0 0.00000 0.00000
Energies (kJ/mol)
U-B Proper Dih. Improper Dih. CMAP Dih. LJ-14
8.67972e+04 6.15820e+04 1.38445e+03 -1.60452e+03 1.44395e+04
Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
-5.21377e+04 4.98413e+04 -1.21372e+06 -8.94296e+04 -1.14284e+06
Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
2.93549e+05 -8.49294e+05 3.00132e+02 -1.80180e+01 1.40708e-05
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Any suggestion is welcome.
Thanks,
Anirban
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> starting mdrun 'Protein'
> 50000000 steps, 100000.0 ps.
>
> NOTE: Turning on dynamic load balancing
>
> Fatal error in MPI_Sendrecv: Other MPI error
> Fatal error in MPI_Sendrecv: Other MPI error
> Fatal error in MPI_Sendrecv: Other MPI error
>
>
> =====================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = EXIT CODE: 256
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> =====================================================================================
> [proxy:0:0 at cn034] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:0 at cn034] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at cn034] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> .
> .
> .
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Why is this happening? Is it related to DD and PME? How to solve it? Any
> suggestion is welcome.
> Sorry for re-posting.
>
>
> Thanks and regards,
>
> Anirban
>
>
>
>
>
>
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120514/fbc84e25/attachment.html>
More information about the gromacs.org_gmx-users
mailing list