[gmx-users] GPU job often stopped
Albert
mailmd2011 at gmail.com
Sun Apr 28 17:27:42 CEST 2013
Dear:
I am running MD jobs in a workstation with 4 K20 GPU and I found that
the job always failed with following messages from time to time:
[tesla:03432] *** Process received signal ***
[tesla:03432] Signal: Segmentation fault (11)
[tesla:03432] Signal code: Address not mapped (1)
[tesla:03432] Failing at address: 0xfffffffe02de67e0
[tesla:03432] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)
[0x7f4666da1cb0]
[tesla:03432] [ 1] mdrun_mpi() [0x47dd61]
[tesla:03432] [ 2] mdrun_mpi() [0x47d8ae]
[tesla:03432] [ 3]
/opt/intel/lib/intel64/libiomp5.so(__kmp_invoke_microtask+0x93)
[0x7f46667904f3]
[tesla:03432] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3432 on node tesla exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I can continue the jobs with mdrun option "-append -cpi", but it still
stopped from time to time. I am just wondering what's the problem?
thank you very much
Albert
More information about the gromacs.org_gmx-users
mailing list