[gmx-users] MD PME in paralle

e.akhmatskaya at fle.fujitsu.com e.akhmatskaya at fle.fujitsu.com
Mon Mar 10 13:03:04 CET 2003


Hi David, 

Thanks for your reply.

>Could you give some more detail?
>size of simulation system, how the problem comes about?
I've tried two systems of very different sizes: 23558 and 141154.
Smaller system seems to be more stable. I've managed to finish the
calculations for several times. 

The most typical outcome is:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Getting Loaded...
Reading file topol.tpr, VERSION 3.1.4 (double precision)
Reading file topol.tpr, VERSION 3.1.4 (double precision)
Loaded with Money

starting mdrun 'Protein in water'
100 steps,      0.1 ps.

step 0
[PRIMEPOWER]aprun: parallel process 1 abnormally terminated.
[Blade server]: MPI process rank 0 (n0, p14255) caught a SIGSEGV.
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 14255 failed on node n0 with exit status 1.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Sometimes it performs more steps and fails

>You seem to indicate that the problem is not machine specific, in which
>case it most likely is due to gromacs, however, what is the MPI library
>that you use?
I can reproduce this on 3 systems: 
Linux cluster: MPICH 1.2.0 implemented under SCore 5.0.0 (Myrinet)
Blade Server: LAM 6.5.4/MPI 2
PRIMEPOWER: Parallelnavi 2.1 MPI2 

Cheers,

Elena.
_____________________________________________________________________
Elena Akhmatskaya
Research Scientist
Physical & Life Sciences
Fujitsu Laboratories of Europe Ltd (FLE)
Hayes Park Central
Hayes End Road
Hayes, Middlesex
UB4 8FE
UK
tel: +44 (0) 2086064859
e-mail: e.akhmatskaya at fle.fujitsu.com



-----Original Message-----
From: David van der Spoel [mailto:spoel at xray.bmc.uu.se]
Sent: 10 March 2003 16:17
To: gmx-users at gromacs.org
Subject: Re: [gmx-users] MD PME in paralle


On Mon, 2003-03-10 at 11:36, e.akhmatskaya at fle.fujitsu.com wrote:
> Dear All, 
> 
> I am trying to run a MD PME job in parallel on different machines (Linux
> clusters, PRIMEPOWER) with different compilers and various MPI
> implementations. On up to 8 processors it is OK. Parallel performance was
> not great but this was expected for PME job. On 16 processors it becomes
> unstable on any computer. Sometimes it comes through (non-reproducible)
but
> in the most of cases it fails or produces incorrect results. My
> understanding is that the problem occurs when there is no optimized water
> molecule on a processor 0. It looks like a synchronisation problem though
I
> might be wrong. I am wondering if this is a known problem? Is there a fix
> for that, please? 

I vaguely recall having seen that. Could you give some more detail?
size of simulation system, how the problem comes about?
You seem to indicate that the problem is not machine specific, in which
case it most likely is due to gromacs, however, what is the MPI library
that you use?


> Thanks,
> 
> Elena Akhmatskaya. 
> 
> _____________________
> Elena Akhmatskaya
> Physical & Life Sciences
> Fujitsu Laboratories of Europe Ltd (FLE)
> Hayes Park Central
> Hayes End Road
> Hayes, Middlesex
> UB4 8FE
> UK
> tel: +44 (0) 2086064859
> e-mail: e.akhmatskaya at fle.fujitsu.com
> 
> ----
> 

> This e-mail has been scanned by Trend InterScan Software.
> 
> This e-mail (and its attachment(s) if any) is intended for the named 
> addressee(s) only. It may
> contain information which is privileged and confidential within the 
> meaning of the applicable law.
> Unauthorised use, copying or disclosure is strictly prohibited and may 
> be unlawful.
> 
> If you are not the intended recipient please delete this email and 
> contact the sender via email return.
> 
> Fujitsu Laboratories of Europe Ltd (FLE) does not accept responsibility 
> for changes made to this email after
> it was sent. The views expressed in this email may not necessarily be 
> the views held by FLE.
> 
> Unless expressly stated otherwise, this email does not form part of a 
> legally binding contract
> or agreement between the recipient and Fujitsu Laboratories of Europe Ltd
(FLE).
-- 
Groeten, David.
________________________________________________________________________
Dr. David van der Spoel, 	Dept. of Cell & Mol. Biology
Husargatan 3, Box 596,  	75124 Uppsala, Sweden
phone:	46 18 471 4205		fax: 46 18 511 755
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://xray.bmc.uu.se/~spoel
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
_______________________________________________
gmx-users mailing list
gmx-users at gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-request at gromacs.org.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InterScan_Disclaimer.txt
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20030310/d89baf59/attachment.txt>


More information about the gromacs.org_gmx-users mailing list