[gmx-users] 答复: about exchange replica with replicas 140

HANNIBAL LECTER hanniballecter13 at gmail.com
Thu May 22 16:58:08 CEST 2014


Did you check the .err file as well? What does it say?

How many cores do each node in the HPC have? There might be some issues
with the supercomputer that you may be using? One of the nodes might have
some issues which is probably causing the problem?

I have used > 300 cores for replica exchange simulations without any issues.


On Thu, May 22, 2014 at 12:47 AM, #ZHANG HAIPING# <HZHANG020 at e.ntu.edu.sg>wrote:

> Dear Justin:
> The scripts I used is as follow:
> ## Set job parameters
>
> ## Job Name
> #BSUB -J OpenMPI
>
> ## Queue  Name
> #BSUB -q medium_priority
>
> ## Output and Input Errors
> #BSUB -o job%J.out
> #BSUB -e job%J.err
>
> ## Specify walltime in HH:MM
> #BSUB -W 60:00
>
> ## 16 Processors per Host
> #BSUB -R "span[ptile=16]"
>
> ## Requesting for 32 cores
> #BSUB -n 140
>
> # Need to make our own machinefile
> MACHINEFILE=mymacs.$LSB_JOBID
> for i in `echo $LSB_HOSTS`
> do
> echo $i
> done > $MACHINEFILE
>
>
> ## load module enviroement
> module load openmpi-1.6.5-intel-v12.1.5
> module load intel-12.1.5
>
> ## Run mpi program
> cd /gpfs/home/hzhang020/REMD/remdrun/scratch
> /usr/local/RH6_apps/openmpi-1.6.5-intel-v12.1.5/bin/mpirun --bind-to-core
> --report-bindings  -np 140 -machinefile $MACHINEFILE
> /usr/local/RH6_apps/gromacs-4.6.2-double-intel/bin/mdrun_mpi_d -s
> prefix_.tpr -multi 140   -replex 3000
>
>
> There was not obvious error information, just have the output file size is
> always 0. After long time, I kill it myself.
>
>
> NOTE: The load imbalance in PME FFT and solve is 384%.
>       For optimal PME load balancing
>       PME grid_x (1728) and grid_y (1728) should be divisible by
> #PME_nodes_x (2)
>       and PME grid_y (1728) and grid_z (1728) should be divisible by
> #PME_nodes_y (1)
>
>
> NOTE: The load imbalance in PME FFT and solve is 384%.
>       For optimal PME load balancing
>       PME grid_x (1728) and grid_y (1728) should be divisible by
> #PME_nodes_x (2)
>       and PME grid_y (1728) and grid_z (1728) should be divisible by
> #PME_nodes_y (1)
>
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
>
> Host: comp094
> PID:  6751
>
> This process may still be running and/or consuming resources.
>
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 273 with PID 30914 on node comp122 exited
> on signal 9 (Killed).
> --------------------------------------------------------------------------
>
>
>
>
> ________________________________________
> 发件人: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> 代表 Justin Lemkul <
> jalemkul at vt.edu>
> 发送时间: 2014年5月22日 10:57
> 收件人: gmx-users at gromacs.org
> 主题: Re: [gmx-users] about  exchange replica with  replicas 140
>
> On 5/21/14, 10:53 PM, #ZHANG HAIPING# wrote:
> > Dear gromacs user:
> >
> > I have encounter a problem when used HPC(high performance computer) to
> run replica exchange. I find that when I used replica over 128, it will not
> work, while under 128 , it is ok, even when I used cores much more than
> 128(several cores for one replica). The version  I used is
> gromacs-4.6.2-double-intel. Hope for you help.
> >
>
> You need to provide more information.  Specifically, what commands are you
> issuing?  What does "will not work" mean?  Is there a specific error you
> are
> getting?
>
> -Justin
>
> --
> ==================================================
>
> Justin A. Lemkul, Ph.D.
> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>
> Department of Pharmaceutical Sciences
> School of Pharmacy
> Health Sciences Facility II, Room 601
> University of Maryland, Baltimore
> 20 Penn St.
> Baltimore, MD 21201
>
> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> http://mackerell.umaryland.edu/~jalemkul
>
> ==================================================
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list