[gmx-users] about parallel run

chris.neale at utoronto.ca chris.neale at utoronto.ca
Tue Jan 15 18:24:48 CET 2008


> What i was trying to do is to run a parallel simulation. I have
> successfully compiled mdrun with mpi support.
>
> This is my grompp command
>
> grompp -f md.mdp -c rec_pr.gro -p rec.top -o rec.tpr -np 12
>
> And this is the script I sent to the cluster:
>
> #!/bin/bash
>
> cd /home/yunierkis/MD
> export LAMRSH="ssh -x"
> lamboot -v .nodes
> nohup mpirun -np 12 mdrun_mpi -s rec_md.tpr -o rec_md.trr -c rec_md.gro
> -e rec.edr -g rec_md.log -nice 0 -np 12 &
>

Are you sure that you are running Lam correctly on your machine?
I would personally run it like this:
/tools/lam/lam-7.1.2/bin/mpirun C mdrun_mpi -np 12 -deffnm rec_md

>
> I have deleted all #md.trr.*# files and the simulation is still running
> on all the 6 nodes and no new #md.trr.*# file have been created.
> It sounds very strange for me and I can't find an explanation.

Did you have other #files# ? e.g. the .edr or .log ?

Do you lamhalt properly after previous attempts?

>
> Yunierkis
>
>
> On Tue, 2008-01-15 at 11:54 +1100, Mark Abraham wrote:
>
>> Yunierkis Perez Castillo wrote:
>> > Hi all, I'm new to gromacs. I have setup a protein MD simulation in a
>> > cluster, I'm using 6 computers with 2 CPUs each one.
>> > After gromacs begun running I had 12 trajectory files in the folder the
>> > output is written:
>> >
>> > md.trr
>> > #md.trr.1#
>> > #md.trr.2#
>> > ................
>> > #md.trr.11#
>> >
>> > It seems like the trajectory is replicated by each CPU the simulation is
>> > running on.
>> > All files has the same size, and grows  simultaneously as the simulation
>> > advances.
>> > Is that a normal thing??
>> > Can I delete the #* files??
>>
>> I infer from your results that you've run 12 single-processor
>> simulations from the same working directory. GROMACS makes backups of
>> files when you direct it to write to an existing file, and these are
>> numbered as #filename.index#. Your 12 simulations are all there, but you
>> can't assume that those files with number 5 are all from the same
>> simulation, because of the possibility of filesystem asynchronicities in
>> creating the files.
>>
>> If you're trying to run 12 single-processor simulations in the same
>> working directory, then you need to rethink your strategy. If you're
>> trying to do something else, then you also need to rethink :-)
>>
>> Mark






More information about the gromacs.org_gmx-users mailing list