[gmx-users] gromacs, lam and condor
    Oliver Stueker 
    ostueker at gmail.com
       
    Mon Apr  5 04:19:13 CEST 2010
    
    
  
Hi Hsin-Lin,
As Mark answered already yesterday:
1) 1Gb Ethernet is not fast enough for more than 2 or maximum 4 CPUs.
(and if each process of mdrun has to wait ~90% of the time to get data
from the others, then you result in just 10% load, right? )
2) telling from the line:
>> > /stathome/jiangsl/soft/gromacs-4.0.5/bin/mdrun_mpi_d \
you are using a double precision of gromacs that slows things down and
is only of use in very special cases. use mdrun_mpi instead.
3) your system is just 6000 atoms small, I would rather run such a
small system on one dual-core system (so max 2 CPU's)
 I don't think the system will scale well on more CPU's
And in case there's really something wrong with condor and has nothing
to do with gromacs then this is the wrong mailing-list anyway.
Oliver
2010/4/4 Hsin-Lin Chiang <jiangsl at phys.sinica.edu.tw>:
> Hi,
>
>
>
> I tried to use 4 and 8 CPUs.
>
> There are about 6000 atoms in my system.
>
> The interconnect of our computer is the network with speed 1Gb but not
> optical fiber.
>
>
>
> I'm sorry for my poor English and I couldn't express well in my question.
>
> Everytime I submitted the parallel job, the nodes assigned to mehave been
> 100% loading,
>
> and the CPU source availble to me is less then 10%.
>
> I think there is something wrong with my submit script or executable script,
>
> and I post them in my previous message.
>
> How should I correct my script?
>
>
>
> Hsin-Lin
>
>
>> Hi,
>>
>> how many CPUs do you try to use? How big is your system. What kind of
>> interconnect? Since you use condor probably some pretty slow interconnect.
>> Than you can't aspect it to work on many CPUs. If you want to use many
>> CPUs
>> for MD you need a faster interconnect.
>>
>> Roland
>>
>> 2010/4/2 Hsin-Lin Chiang <jiangsl at phys.sinica.edu.tw>
>>
>> >  Hi,
>> >
>> > Do someone use gromacs, lam, and condor together here?
>> > I use gromacs with lam/mpi on condor system.
>> > Everytime I submit the parallel job.
>> > I got the node which is occupied before and the performance of each cpu
>> > is
>> > below 10%.
>> > How should I change the script?
>> > Below is one submit script and two executable script.
>> >
>> > condor_mpi:
>> > ----
>> > #!/bin/bash
>> > Universe = parallel
>> > Executable = ./lamscript
>> > machine_count = 8
>> > output = md_$(NODE).out
>> > error = md_$(NODE).err
>> > log = md.log
>> > arguments = /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md.sh
>> > +WantIOProxy = True
>> > should_transfer_files = yes
>> > when_to_transfer_output = on_exit
>> > Queue
>> > -------
>> >
>> > lamscript:
>> > -------
>> > #!/bin/sh
>> >
>> > _CONDOR_PROCNO=$_CONDOR_PROCNO
>> > _CONDOR_NPROCS=$_CONDOR_NPROCS
>> > _CONDOR_REMOTE_SPOOL_DIR=$_CONDOR_REMOTE_SPOOL_DIR
>> >
>> > SSHD_SH=`condor_config_val libexec`
>> > SSHD_SH=$SSHD_SH/sshd.sh
>> >
>> > CONDOR_SSH=`condor_config_val libexec`
>> > CONDOR_SSH=$CONDOR_SSH/condor_ssh
>> >
>> > # Set this to the bin directory of your lam installation
>> > # This also must be in your .cshrc file, so the remote side
>> > # can find it!
>> > export LAMDIR=/stathome/jiangsl/soft/lam-7.1.4
>> > export PATH=${LAMDIR}/bin:${PATH}
>> > export
>> > LD_LIBRARY_PATH=/lib:/usr/lib:$LAMDIR/lib:.:/opt/intel/compilers/lib
>> >
>> >
>> > . $SSHD_SH $_CONDOR_PROCNO $_CONDOR_NPROCS
>> >
>> > # If not the head node, just sleep forever, to let the
>> > # sshds run
>> > if [ $_CONDOR_PROCNO -ne 0 ]
>> > then
>> >                 wait
>> >                 sshd_cleanup
>> >                 exit 0
>> > fi
>> >
>> > EXECUTABLE=$1
>> > shift
>> >
>> > # the binary is copied but the executable flag is cleared.
>> > # so the script have to take care of this
>> > chmod +x $EXECUTABLE
>> >
>> > # to allow multiple lam jobs running on a single machine,
>> > # we have to give somewhat unique value
>> > export LAM_MPI_SESSION_SUFFIX=$$
>> > export LAMRSH=$CONDOR_SSH
>> > # when a job is killed by the user, this script will get sigterm
>> > # This script have to catch it and do the cleaning for the
>> > # lam environment
>> > finalize()
>> > {
>> > sshd_cleanup
>> > lamhalt
>> > exit
>> > }
>> > trap finalize TERM
>> >
>> > CONDOR_CONTACT_FILE=$_CONDOR_SCRATCH_DIR/contact
>> > export $CONDOR_CONTACT_FILE
>> > # The second field in the contact file is the machine name
>> > # that condor_ssh knows how to use. Note that this used to
>> > # say "sort -n +0 ...", but -n option is now deprecated.
>> > sort < $CONDOR_CONTACT_FILE | awk '{print $2}' > machines
>> >
>> > # start the lam environment
>> > # For older versions of lam you may need to remove the -ssi boot rsh
>> > line
>> > lamboot -ssi boot rsh -ssi rsh_agent "$LAMRSH -x" machines
>> >
>> > if [ $? -ne 0 ]
>> > then
>> >         echo "lamscript error booting lam"
>> >         exit 1
>> > fi
>> >
>> > mpirun C -ssi rpi usysv -ssi coll_smp 1 $EXECUTABLE $@ &
>> >
>> > CHILD=$!
>> > TMP=130
>> > while [ $TMP -gt 128 ] ; do
>> >         wait $CHILD
>> >         TMP=$?;
>> > done
>> >
>> > # clean up files
>> > sshd_cleanup
>> > /bin/rm -f machines
>> >
>> > # clean up lam
>> > lamhalt
>> >
>> > exit $TMP
>> > ----
>> >
>> > md.sh
>> > ----
>> > #!/bin/sh
>> > #running GROMACS
>> > /stathome/jiangsl/soft/gromacs-4.0.5/bin/mdrun_mpi_d \
>> > -s /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.tpr \
>> > -e /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.edr \
>> > -o /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.trr \
>> > -g /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.log \
>> > -c /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.gro
>> > -----
>> >
>> >
>> > Hsin-Lin
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
    
    
More information about the gromacs.org_gmx-users
mailing list