[gmx-users] Problems with simulation on multi-nodes cluster

James Starlight jmsstarlight at gmail.com
Thu Mar 15 12:25:15 CET 2012


Mark, Peter,


I've tried to do .tpr file on my local CPU and launch only

mpiexec -np 24 mdrun_mpi_d.openmpi -v -deffnm MD_100

on the cluster with 2 nodes.

I see my job as working but when I've checking the MD_100.log (attached)
file there are no any information about simulation steps in that file (
when I use just one node I see in that file step-by-step progression of my
simulation like below wich was find in the same log file for ONE NODE
simulation ):

Started mdrun on node 0 Thu Mar 15 11:22:35 2012

           Step           Time         Lambda
              0        0.00000        0.00000

Grid: 12 x 9 x 12 cells
   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    1.32179e+04    3.27485e+03    2.53267e+03    4.06443e+02    6.15315e+04
        LJ (SR)        LJ (LR)  Disper. corr.   Coulomb (SR)   Coul. recip.
    4.12152e+04   -5.51788e+03   -1.70930e+03   -4.54886e+05   -1.46292e+05
     Dis. Rest. D.R.Viol. (nm)     Dih. Rest.      Potential    Kinetic En.
    2.14240e-02    3.46794e+00    1.33793e+03   -4.84889e+05    9.88771e+04
   Total Energy  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)
   -3.86012e+05   -3.86012e+05    3.11520e+02   -1.14114e+02    3.67861e+02
   Constr. rmsd
    3.75854e-05

           Step           Time         Lambda
           2000        4.00000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    1.31741e+04    3.25280e+03    2.58442e+03    3.51371e+02    6.15913e+04
        LJ (SR)        LJ (LR)  Disper. corr.   Coulomb (SR)   Coul. recip.
    4.16349e+04   -5.53474e+03   -1.70930e+03   -4.56561e+05   -1.46485e+05
     Dis. Rest. D.R.Viol. (nm)     Dih. Rest.      Potential    Kinetic En.
    4.78276e+01    3.38844e+00    9.82735e+00   -4.87644e+05    9.83280e+04
   Total Energy  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)
   -3.89316e+05   -3.87063e+05    3.09790e+02   -1.14114e+02    7.25905e+02
   Constr. rmsd
    1.88008e-05

end etc...



What's wrong can be with multi-node computations?


James


15 марта 2012 г. 11:25 пользователь Mark Abraham
<Mark.Abraham at anu.edu.au>написал:

> On 15/03/2012 6:13 PM, Peter C. Lai wrote:
>
>> Try separating your grompp run from your mpirun:
>> You should not really be having the scheduler execute the grompp. Run
>> your grompp step to generate a .tpr either on the head node or on your
>> local
>> machine (then copy it over to the cluster).
>>
>
> Good advice.
>
>
>> (The -p that the scheduler is complaining about only appears in the grompp
>> step, so don't have the scheduler run it).
>>
>
> grompp is running successfully, as you can see from the output
>
> I think "mpiexec -np 12" is being interpreted as "mpiexec -n 12 -p", and
> the process of separating the grompp stage from the mdrun stage would help
> make that clear - read documentation first, however.
>
> Mark
>
>
>
>>
>> On 2012-03-15 10:04:49AM +0300, James Starlight wrote:
>>
>>> Dear Gromacs Users!
>>>
>>>
>>> I have some problems with running my simulation on multi-modes station
>>> wich
>>> use open_MPI
>>>
>>> I've launch my jobs by means of that script. The below example of running
>>> work on 1 node ( 12 cpu).
>>>
>>> #!/bin/sh
>>> #PBS -N gromacs
>>> #PBS -l nodes=1:red:ppn=12
>>> #PBS -V
>>> #PBS -o gromacs.out
>>> #PBS -e gromacs.err
>>>
>>> cd /globaltmp/xz/job_name
>>> grompp -f md.mdp -c nvtWprotonated.gro -p topol.top -n index.ndx -o
>>> job.tpr
>>> mpiexec -np 12 mdrun_mpi_d.openmpi -v -deffnm job
>>>
>>> All nodes of my cluster consist of 12 CPU. When I'm using just 1 node on
>>> that cluster I have no problems with running of my jobs but when I try to
>>> use more than one nodes I've obtain error ( the example is attached in
>>> the
>>> gromacs.err file as well as mmd.mdp of that system). Another outcome of
>>> such multi-node simulation is that my job has been started but no
>>> calculation were done ( the name_of_my_job.log file was empty and no
>>> update
>>> of .trr file was seen ). Commonly this error occurs when I uses many
>>> nodes
>>> (8-10) Finally sometimes I've obtain some errors with the PME order (
>>> this
>>> time I've used 3 nodes). The exactly error differs when I varry the
>>> number
>>> of nodes.
>>>
>>>
>>> Could you tell me whats wrong could be with my cluster?
>>>
>>> Thanks for help
>>>
>>> James
>>>
>>
>>
>>  --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
>>> Please search the archive at http://www.gromacs.org/**
>>> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
>>> Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-users-request at gromacs.org.
>>> Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>>>
>>
>>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
> Please search the archive at http://www.gromacs.org/**
> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
> Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120315/e96be952/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MD_100.log
Type: application/octet-stream
Size: 7189 bytes
Desc: not available
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120315/e96be952/attachment.obj>


More information about the gromacs.org_gmx-users mailing list