[gmx-users] 答复: about exchange replica with replicas 140

#ZHANG HAIPING# HZHANG020 at e.ntu.edu.sg
Thu May 22 07:20:07 CEST 2014


Dear Justin:
The scripts I used is as follow:
## Set job parameters

## Job Name
#BSUB -J OpenMPI

## Queue  Name
#BSUB -q medium_priority 

## Output and Input Errors
#BSUB -o job%J.out
#BSUB -e job%J.err

## Specify walltime in HH:MM
#BSUB -W 60:00

## 16 Processors per Host
#BSUB -R "span[ptile=16]"

## Requesting for 32 cores
#BSUB -n 140

# Need to make our own machinefile
MACHINEFILE=mymacs.$LSB_JOBID
for i in `echo $LSB_HOSTS`
do
echo $i
done > $MACHINEFILE


## load module enviroement
module load openmpi-1.6.5-intel-v12.1.5
module load intel-12.1.5

## Run mpi program  
cd /gpfs/home/hzhang020/REMD/remdrun/scratch
/usr/local/RH6_apps/openmpi-1.6.5-intel-v12.1.5/bin/mpirun --bind-to-core --report-bindings  -np 140 -machinefile $MACHINEFILE   /usr/local/RH6_apps/gromacs-4.6.2-double-intel/bin/mdrun_mpi_d -s prefix_.tpr -multi 140   -replex 3000


There was not obvious error information, just have the output file size is always 0. After long time, I kill it myself.


NOTE: The load imbalance in PME FFT and solve is 384%.
      For optimal PME load balancing
      PME grid_x (1728) and grid_y (1728) should be divisible by #PME_nodes_x (2)
      and PME grid_y (1728) and grid_z (1728) should be divisible by #PME_nodes_y (1)


NOTE: The load imbalance in PME FFT and solve is 384%.
      For optimal PME load balancing
      PME grid_x (1728) and grid_y (1728) should be divisible by #PME_nodes_x (2)
      and PME grid_y (1728) and grid_z (1728) should be divisible by #PME_nodes_y (1)

--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: comp094
PID:  6751

This process may still be running and/or consuming resources.

--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 273 with PID 30914 on node comp122 exited on signal 9 (Killed).
--------------------------------------------------------------------------




________________________________________
发件人: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> 代表 Justin Lemkul <jalemkul at vt.edu>
发送时间: 2014年5月22日 10:57
收件人: gmx-users at gromacs.org
主题: Re: [gmx-users] about  exchange replica with  replicas 140

On 5/21/14, 10:53 PM, #ZHANG HAIPING# wrote:
> Dear gromacs user:
>
> I have encounter a problem when used HPC(high performance computer) to run replica exchange. I find that when I used replica over 128, it will not work, while under 128 , it is ok, even when I used cores much more than 128(several cores for one replica). The version  I used is gromacs-4.6.2-double-intel. Hope for you help.
>

You need to provide more information.  Specifically, what commands are you
issuing?  What does "will not work" mean?  Is there a specific error you are
getting?

-Justin

--
==================================================

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 601
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==================================================
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list