[gmx-users] Performance of Gromacs-4.6.1 on BlueGene/Q
ljggmx at yahoo.com.sg
Tue Jun 4 16:43:35 CEST 2013
Thank you XAvie.
The thing is that the cluster manager set the minimum number of cores of each jobs in Bluegene/Q is 128, so I can not use 64 cores. But according to the performance, 512 cores in Bluegene roughly equivalent to 64 cores in another cluster. Since there are 16 cores in each computational cards, the total number of cores I used in Bluegene//Q is num_cards times 16. So in my test, I acutally run simulations using different number of cards, from 8 to 256. But each card I used 32 mpi tasks (since bluegene accepts up to 4 tasks each core). The following is the script I submitted to bluegene:
# set Use 128 Compute Cards ( 1x Compute Card = 16 cores, 128x16 = 2048 cores )
# set Job name
# set Output file
# set Job queue ( default is normal )
srun --ntasks-per-node=32 --overcommit /scratch/home/biilijg/package/gromacs-461/bin/mdrun -s box_md1.tpr -c box_md1.gro -x box_md1.xtc -g md1.log >& job_md1
----- Original Message -----
From: XAvier Periole <x.periole at rug.nl>
To: Jianguo Li <ljggmx at yahoo.com.sg>; Discussion list for GROMACS users <gmx-users at gromacs.org>
Sent: Tuesday, 4 June 2013, 22:20
Subject: Re: [gmx-users] Performance of Gromacs-4.6.1 on BlueGene/Q
BG CPUs are generally much slower (clock whose) but scale better.
You should try to run on 64 CPUs on the Blue gene too for faire comparison.
The number of CPUs per nodes is also an important factor: the more CPUs per nodes the more communications needs to be done. I observed a significant slow down while going from 16 to 32 CPUs nodes (recent intel) but using the same number of CPUs.
On Jun 4, 2013, at 4:02 PM, Jianguo Li <ljggmx at yahoo.com.sg> wrote:
> Dear All,
> Has anyone has Gromacs benchmark on Bluegene/Q?
> I recently installed gromacs-461 on BG/Q using the following command:
> cmake .. -DCMAKE_TOOLCHAIN_FILE=BlueGeneQ-static-XL-C \
> -DGMX_BUILD_OWN_FFTW=ON \
> -DBUILD_SHARED_LIBS=OFF \
> -DGMX_XML=OFF \
> make install
> After that, I did a benchmark simulation using a box of pure water containing 140k atoms.
> The command I used for the above test is:
> srun --ntasks-per-node=32 --overcommit /scratch/home/biilijg/package/gromacs-461/bin/mdrun -s box_md1.tpr -c box_md1.gro -x box_md1.xtc -g md1.log >& job_md1
> And I got the following performance:
> Num. cores hour/ns
> 128 9.860
> 256 4.984
> 512 2.706
> 1024 1.544
> 2048 0.978
> 4092 0.677
> The scaling seems ok, but the performance is far from what I expected. In terms CPU-to-CPU performance, the Bluegene is 8 times slower than other clusters. For comparison, I also did the same simulation using 64 processors in a SGI cluster, and I got 2.8 hour/ns, which is roughly equivalent to using 512 cores in BlueGene/Q.
> I am wondering if the above benchmark results are reasonable or not? Or Am I doing something wrong in compiling?
> Any comments/suggestions are appreciated, thank you very much!
> Have a nice day!
> gmx-users mailing list gmx-users at gromacs.org
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users