[gmx-users] Gromacs 4.6 crushes in PBS queue system

Richard Broadbent richard.broadbent09 at imperial.ac.uk
Tue Feb 19 14:18:53 CET 2013


Hi Tomek,

Gromacs 4.6 uses very different accelerated kernels to 4.5.5. These are 
hardware specific and you must therefore select acceleration appropriate 
for your hardware.

your login node will automatically use and select AVX-128-FMA 
acceleration. However, your compute nodes are considerably older and 
they need sse2 acceleration instead.
adding:

-DGMX_CPU_ACCELERATION=SSE2

to your cmake line should fix this.

The Open MPI warning is an issue with how your jobs are setup to run on 
your system. I would suggest discussing it with your system 
administrator as that might cause a significant slow down in your 
simulation as well as for other cluster users.

Richard



On 19/02/13 12:32, Tomek Wlodarski wrote:
> Hi All,
>
> The problem is that this is only message I got...
> I also  get this Warning:
> --------------------------------------------------------------------------
> WARNING: Open MPI will create a shared memory backing file in a
> directory that appears to be mounted on a network filesystem.
> Creating the shared memory backup file on a network file system, such
> as NFS or Lustre is not recommended -- it may cause excessive network
> traffic to your file servers and/or cause shared memory traffic in
> Open MPI to be much slower than expected.
>
> You may want to check what the typical temporary directory is on your
> node.  Possible sources of the location of this temporary directory
> include the $TEMPDIR, $TEMP, and $TMP environment variables.
>
> Note, too, that system administrators can set a list of filesystems
> where Open MPI is disallowed from creating temporary files by settings
> the MCA parameter "orte_no_session_dir".
>
>    Local host: n344
>    Fileame:    /tmp/openmpi-sessions-didymos at n344_0
> /19430/1/shared_mem_pool.n344
>
> You can set the MCA paramter shmem_mmap_enable_nfs_warning to 0 to
> disable this message.
> --------------------------------------------------------------------------
>
> but this I got also with gromacs 4.5.5 which is running ok so I think this
> is not a problem in my case.
>
> Like Alexey notice the problem is that my nodes have different
> architecture.. but this was not a problem with gromacs 4.5.5
>
> My access node:
>
> processor    : 0
> vendor_id    : AuthenticAMD
> cpu family    : 21
> model        : 1
> model name    : AMD Opteron(TM) Processor 6272
> stepping    : 2
> cpu MHz        : 2400.003
> cache size    : 2048 KB
> physical id    : 0
> siblings    : 16
> core id        : 0
> cpu cores    : 16
> apicid        : 32
> initial apicid    : 0
> fpu        : yes
> fpu_exception    : yes
> cpuid level    : 13
> wp        : yes
> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
>   mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid
> amd_dcm pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 po
> pcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
> misalignsse 3dnowprefetch osvw ibs xop skinit wdt nodeid_m
> sr arat
> bogomips    : 4199.99
> TLB size    : 1536 4K pages
> clflush size    : 64
> cache_alignment    : 64
> address sizes    : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate [9]
>
> My computational node:
>
> processor    : 0
> vendor_id    : AuthenticAMD
> cpu family    : 16
> model        : 2
> model name    : Quad-Core AMD Opteron(tm) Processor 8354
> stepping    : 3
> cpu MHz        : 2200.001
> cache size    : 512 KB
> physical id    : 0
> siblings    : 4
> core id        : 0
> cpu cores    : 4
> apicid        : 0
> initial apicid    : 0
> fpu        : yes
> fpu_exception    : yes
> cpuid level    : 5
> wp        : yes
> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
>   mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good
> nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm c
> mp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw
> ibs
> bogomips    : 4399.99
> TLB size    : 1024 4K pages
> clflush size    : 64
> cache_alignment    : 64
> address sizes    : 48 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate
>
> Thanks a lot!
>
> Best!
>
> tomek
>
>
>
>
> On Sun, Feb 17, 2013 at 2:37 PM, Alexey Shvetsov <alexxy at omrb.pnpi.spb.ru>wrote:
>
>> Hi!
>>
>> В письме от 16 февраля 2013 23:27:45 пользователь Tomek Wlodarski написал:
>>> Hi!
>>>
>>> I have problem in running gromacs 4.6 in PBS queue...
>>> I end up with error:
>>>
>>>
>>> [n370:03036] [[19430,0],0]-[[19430,1],8] mca_oob_tcp_msg_recv: readv
>>> failed: Connection reset by peer (104)
>>>
>> --------------------------------------------------------------------------
>>> mpirun noticed that process rank 18 with PID 616 on node n344 exited on
>>> signal 4 (Illegal instruction).
>>
>> Aha. Your mdrun process got SIGILL. This means that your nodes have
>> different
>> instruction set then head node. So try to use different acceleration level.
>> Can you share details about your hw?
>>
>>>
>> --------------------------------------------------------------------------
>>> [n370:03036] 3 more processes have sent help message
>>> help-opal-shmem-mmap.txt / mmap on nfs
>>> [n370:03036] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
>>> help / error messages
>>> 3 total processes killed (some possibly by mpirun during cleanup)
>>>
>>> I run the same pbs files with older gromacs 4.5.5 (installed with the
>> same
>>> openmpi, gcc and fftw) and everything is working..
>>>
>>> also when I am running gromacs directly on the access node:
>>>
>>> mpirun -np 32 /home/users/didymos/gromacs/bin/mdrun_mpi -v -deffnm
>>> protein-EM-solvated -c protein-EM-solvated.gro
>>>
>>> it is running OK.
>>> Any ideas?
>>> Thank you!
>>> Best!
>>>
>>> tomek
>> --
>> Best Regards,
>> Alexey 'Alexxy' Shvetsov
>> Petersburg Nuclear Physics Institute, NRC Kurchatov Institute,
>> Gatchina, Russia
>> Department of Molecular and Radiation Biophysics
>> Gentoo Team Ru
>> Gentoo Linux Dev
>> mailto:alexxyum at gmail.com
>> mailto:alexxy at gentoo.org
>> mailto:alexxy at omrb.pnpi.spb.ru
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>



More information about the gromacs.org_gmx-users mailing list