[gmx-developers] Re: gmx-developers Digest, Vol 50, Issue 5

Mon Jun 23 02:34:39 CEST 2008

gmx-developers-request,您好！

The interconnect is Gigabit Ethernet. The simulation system has 102,136 SPC waters and about 40,000 protein atoms. I use PME to calculate the long-range electrostatic force. And the command line is 
"mpiexec -machinefile ./mf -n 24 mdrun -v -dd 6 3 1 -npme 6 -dlb -s md1.tpr -o md1.trr -g md1.log  -e md1.edr -x md1.xtc>& md1.job &"
The machinefile "mf" have all the 24 CPUs of the three nodes. The CVS version I use is "3.3.99_development_200800503".
Carsten, would you analyse something and help me more? 

Thanks!

======= 2008-06-21 12:00:00 您在来信中写道：=======

>Send gmx-developers mailing list submissions to
>	gmx-developers at gromacs.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://www.gromacs.org/mailman/listinfo/gmx-developers
>or, via email, send a message with subject or body 'help' to
>	gmx-developers-request at gromacs.org
>
>You can reach the person managing the list at
>	gmx-developers-owner at gromacs.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of gmx-developers digest..."
>
>
>Today's Topics:
>
>   1. Re: parallelization-dependent crashes in CVS Gromacs (Berk Hess)
>   2. problem about Gromacs CVS version
>      3.3.99_development_200800503 parallel efficiency (xuji)
>   3. Re: problem about Gromacs CVS	version
>      3.3.99_development_200800503 parallel efficiency (Carsten Kutzner)
>   4. Re: problem about Gromacs CVS	version
>      3.3.99_development_200800503 parallel efficiency (Yang Ye)
>   5. MPICH2 and parallel Gromacs errors (Casey,Richard)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Fri, 20 Jun 2008 12:32:55 +0200
>From: Berk Hess <hessb at mpip-mainz.mpg.de>
>Subject: Re: [gmx-developers] parallelization-dependent crashes in CVS
>	Gromacs
>To: Discussion list for GROMACS development
>	<gmx-developers at gromacs.org>
>Message-ID: <485B8757.9050802 at mpip-mainz.mpg.de>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Hi,
>
>I have not heard of problems from others.
>
>This sounds like a good debug system, since it is small and crashes 
>after just a few steps.
>Could you send me the input files?
>
>Berk.
>
>Peter Kasson wrote:
>> I've been encountering crashes in CVS gromacs depending on the 
>> parallelization.  If I run a CVS-up-to-date mdrun (or any of various 
>> snapshots within the past two months), I get a rapid crash as detailed 
>> below.  However, if I run with either -rdd 1.6 [somewhat arbitrarily 
>> chosen], -dd 2 2 1, or single-processor mdrun the  job completes 
>> successfully.  I have also had errors on similar small test systems 
>> where the error is a nsgrid failure (again on parallel but not 
>> single-processor).  Larger test systems have been working ok for me, 
>> although I haven't tried just replicating this box.
>>
>> Any ideas?  Has anyone else encountered something similar?
>> (If you want input files, drop me a line.)
>> Thanks,
>> --Peter
>>
>> mpiexec -np 4 /array10/software/gmx/src/kernel/mdrun -v -dlb -deffnm 
>> frame0
>>
>> [...]
>>
>> Reading file frame0.tpr, VERSION 3.3.99_development_20080208 (single 
>> precision)
>> Note: tpx file_version 54, software version 56
>> Loaded with Money
>>
>> Making 1D domain decomposition 4 x 1 x 1
>>
>> starting mdrun 'CMG in water'
>> 1000000 steps,   4000.0 ps.
>> step 0
>>
>> t = 0.144 ps: Water molecule starting at atom 28978 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step36b_n1.pdb to ./#step36b_n1.pdb.1#
>>
>> Back Off! I just backed up step36c_n1.pdb to ./#step36c_n1.pdb.1#
>> Wrote pdb files with previous and current coordinates
>>
>> t = 0.148 ps: Water molecule starting at atom 28978 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step37b_n1.pdb to ./#step37b_n1.pdb.1#
>>
>> Back Off! I just backed up step37c_n1.pdb to ./#step37c_n1.pdb.1#
>> Wrote pdb files with previous and current coordinates
>>
>> t = 0.152 ps: Water molecule starting at atom 22435 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step38b_n1.pdb to ./#step38b_n1.pdb.1#
>>
>> Back Off! I just backed up step38c_n1.pdb to ./#step38c_n1.pdb.1#
>> Wrote pdb files with previous and current coordinates
>>
>> t = 0.156 ps: Water molecule starting at atom 9895 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step39b_n0.pdb to ./#step39b_n0.pdb.1#
>>
>> t = 0.156 ps: Water molecule starting at atom 22435 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step39b_n1.pdb to ./#step39b_n1.pdb.1#
>>
>> Back Off! I just backed up step39c_n1.pdb to ./#step39c_n1.pdb.1#
>>
>> Back Off! I just backed up step39c_n0.pdb to ./#step39c_n0.pdb.1#
>> Wrote pdb files with previous and current coordinates
>> Wrote pdb files with previous and current coordinates
>>
>> t = 0.160 ps: Water molecule starting at atom 9895 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step40b_n0.pdb to ./#step40b_n0.pdb.1#
>>
>> t = 0.160 ps: Water molecule starting at atom 39988 can not be settled.
>> Check for bad contacts and/or reduce the timestep.
>>
>> Back Off! I just backed up step40b_n1.pdb to ./#step40b_n1.pdb.1#
>>
>> Back Off! I just backed up step40c_n1.pdb to ./#step40c_n1.pdb.1#
>>
>> Back Off! I just backed up step40c_n0.pdb to ./#step40c_n0.pdb.1#
>> Wrote pdb files with previous and current coordinates
>> Wrote pdb files with previous and current coordinates
>>
>> -------------------------------------------------------
>> Program mdrun, VERSION 3.3.99_development_200800503
>> Source code file: pme.c, line: 510
>>
>> Fatal error:
>> 1 particles communicated to PME node 1 are more than a cell length out 
>> of the domain decomposition cell of their charge group
>> -------------------------------------------------------
>>
>>
>> _______________________________________________
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use thewww 
>> interface or send it to gmx-developers-request at gromacs.org.
>>
>> This email was Anti Virus checked by Astaro Security Gateway. 
>> http://www.astaro.com
>>
>
>
>
>------------------------------
>
>Message: 2
>Date: Fri, 20 Jun 2008 20:13:52 +0800
>From: "xuji"<xuji at home.ipe.ac.cn>
>Subject: [gmx-developers] problem about Gromacs CVS version
>	3.3.99_development_200800503 parallel efficiency
>To: "gmx-developers" <gmx-developers at gromacs.org>
>Message-ID: <20080620121421.B083D165 at colibri.its.uu.se>
>Content-Type: text/plain; charset="gb2312"
>
>Hi all
>
>I have 3 nodes, and there're 8 CPUs in each node. I run a 24 processes mdrun on the three nodes. And I use MPICH2 environment. But the efficiency of the mdrun program is very low. The occupancy of each CPU is only about 10%. I don't know why. Can some one give me some help?
>
>Appricaite any help in advance!
>
>Best wishes!
>
>Ji Xu
>xuji at home.ipe.ac.cn
>
>2008-06-20
>　　　　　　　　　　　　　　
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: http://www.gromacs.org/pipermail/gmx-developers/attachments/20080620/7d4b3afa/attachment-0001.html
>
>------------------------------
>
>Message: 3
>Date: Fri, 20 Jun 2008 14:22:11 +0200
>From: Carsten Kutzner <ckutzne at gwdg.de>
>Subject: Re: [gmx-developers] problem about Gromacs CVS	version
>	3.3.99_development_200800503 parallel efficiency
>To: Discussion list for GROMACS development
>	<gmx-developers at gromacs.org>
>Message-ID: <485BA0F3.8020800 at gwdg.de>
>Content-Type: text/plain; charset=GB2312
>
>xuji wrote:
>> 
>> Hi all
>>  
>> I have 3 nodes, and there're 8 CPUs in each node. I run a 24 processes 
>> mdrun on the three nodes. And I use MPICH2 environment. But the 
>> efficiency of the mdrun program is very low. The occupancy of each CPU 
>> is only about 10%. I don't know why. Can some one give me some help?
>Hi,
>
>what kind of interconnect do you have? It should be *at least* Gigabit
>Ethernet! How big is your system? Are you using PME? Please give more
>information, also on what was your command line to start the runs and
>what CVS version (date) you are using.
>
>Carsten
>
>
>-- 
>Dr. Carsten Kutzner
>Max Planck Institute for Biophysical Chemistry
>Theoretical and Computational Biophysics Department
>Am Fassberg 11
>37077 Goettingen, Germany
>Tel. +49-551-2012313, Fax: +49-551-2012302
>http://www.mpibpc.mpg.de/research/dep/grubmueller/
>http://www.gwdg.de/~ckutzne
>
>
>------------------------------
>
>Message: 4
>Date: Fri, 20 Jun 2008 21:04:09 +0800
>From: Yang Ye <leafyoung at yahoo.com>
>Subject: Re: [gmx-developers] problem about Gromacs CVS	version
>	3.3.99_development_200800503 parallel efficiency
>To: Discussion list for GROMACS development
>	<gmx-developers at gromacs.org>
>Message-ID: <485BAAC9.9050809 at yahoo.com>
>Content-Type: text/plain; charset=GB2312
>
>Hi,
>
>This is related to the type of your network connection.
>
>If it is just ethernet, such low CPU utilization is expected. Also, if
>your system has a small number of atoms, the speed-up through parallel
>is also small.
>
>So, reduce the number of parallel, or wait for Gromacs 4, or change to a
>better inter-node network (Infiniband, etc), sorted according to
>required time, IMHO.
>
>Regards,
>Yang Ye
>
>xuji wrote:
>> Hi all
>> I have 3 nodes, and there're 8 CPUs in each node. I run a 24 processes
>> mdrun on the three nodes. And I use MPICH2 environment. But the
>> efficiency of the mdrun program is very low. The occupancy of each CPU
>> is only about 10%. I don't know why. Can some one give me some help?
>> Appricaite any help in advance!
>> Best wishes!
>> Ji Xu
>> xuji at home.ipe.ac.cn <mailto:xuji at home.ipe.ac.cn>
>>
>> 2008-06-20
>> 　　　　　　　　　　　　　　
>>
>> 	
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the 
>> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
>------------------------------
>
>Message: 5
>Date: Fri, 20 Jun 2008 10:56:15 -0600
>From: "Casey,Richard" <Richard.Casey at ColoState.EDU>
>Subject: [gmx-developers] MPICH2 and parallel Gromacs errors
>To: "gmx-developers at gromacs.org" <gmx-developers at gromacs.org>
>Message-ID:
>	<99FB24E3ED967F42A83A146157092CE1340CB34CA3 at EVS2.ColoState.EDU>
>Content-Type: text/plain; charset="us-ascii"
>
>Hello,
>
>This issue appears to have been encountered by many people.  We've searched all the discussion archives and tried every recommended solution but no luck.
>
>We have MPICH2 v.1.0.7 installed on an Apple G5 cluster (64 CPU's). And installed Gromacs v.3.3.3 with --enable-mpi option.
>
>Single CPU jobs run OK; parallel jobs always fail.  For parallel jobs we use:
>
>grompp -v -np 2 -p topol.top (or other values for np for more cpu's)
>
>We launch MPD with:
>
>mpdboot -n 2 -f /common/mpich2/mpd.hosts
>
>We run jobs with:
>
>/common/mpich2/bin/mpiexec -l -n 2 \
>/common/gromacs/bin/mdrun_mpi -v -np 2 \
>  -s /Users/richardcasey/topol.tpr \
>  -g /Users/richardcasey/md.log \
>  -e /Users/richardcasey/ener.edr \
>  -o /Users/richardcasey/traj.trr \
>  -x /Users/richardcasey/traj.xtc \
>  -c /Users/richardcasey/confout.gro
>
>
>The output always says:
>
>-------------------------------------------------------
>1: Program mdrun_mpi, VERSION 3.3.3
>1: Source code file: init.c, line: 69
>1:
>1: Fatal error:
>1: run input file /Users/richardcasey/topol.tpr was made for 2 nodes,
>1: p0_29762:  p4_error: : -1
>1:              while mdrun_mpi expected it to be for 1 nodes.
>1: -------------------------------------------------------
>
>
>We've tried everything (many variations on the above and recommendations from the discussion list) but for some reason mdrun_mpi insists that it use a single-cpu version of the topology file.  We've check environment variables and they appear to point to the right directories. /common is NFS mounted on all nodes.
>
>Completely stumped - no idea what is wrong here.  Any suggestions?
>
>
>
>--------------------------------------------
>Richard Casey
>
>
>------------------------------
>
>_______________________________________________
>gmx-developers mailing list
>gmx-developers at gromacs.org
>http://www.gromacs.org/mailman/listinfo/gmx-developers
>
>
>End of gmx-developers Digest, Vol 50, Issue 5
>*********************************************
>

= = = = = = = = = = = = = = = = = = = =

　　　　　　　　致
礼！

　　　　　　　　xuji
　　　　　　　　xuji at home.ipe.ac.cn
　　　　　　　　　　2008-06-23