[gmx-users] Windows/x64, VCVS2012 compiled, crash on checkpoint writing

Mark Abraham mark.j.abraham at gmail.com
Mon Nov 18 16:05:12 CET 2013


On Fri, Nov 15, 2013 at 6:43 PM, Mirco Wahab <
mirco.wahab at chemie.tu-freiberg.de> wrote:

> Gromacs 4.6.4 compiles (and links) perfectly w/VS2012
> and nvcc from CUDA 5.5 on windows/x64
> (MSVC 2012 Version 11.0.60610.01 Update 3).
>
> But -- when compiled with VS2012 (because of linking
> against CUDA 5.5 is only possible then - in contrast
> to VS2010), mdrun crashes on writing the checkpoint file.
>
> This will not happen when compiling with VS2010
> (but this exludes using of CUDA 5.5, only 5.0 is
> supported).
>
> I did set up a debugging session in VS2012 in order
> to determine the exception location (see below, on
> entry into the named function). mdrun has been compiled
> as "Release w/DebugInfo".
>
> (This has been possibly also been a problem in 4.6.3, iirc.)
>

Almost certainly a problem for the whole 4.6 series, if my theory below is
correct.


>
>
> ----- d:\libsrc\gromacs\gromacs-4.6.4\src\gmxlib\gmxfio.c ----
>
>
> /* internal variant of get_file_md5 that operates on a locked file */
> static int gmx_fio_int_get_file_md5(t_fileio *fio, gmx_off_t offset,
>                                     unsigned char digest[])
> {
> 00007FF7B25B74C0  mov         qword ptr [state],rbx
> 00007FF7B25B74C5  push        rbp
> 00007FF7B25B74C6  push        rsi
> 00007FF7B25B74C7  push        rdi
> 00007FF7B25B74C8  mov         eax,100090h
> ********************************************************
> 00007FF7B25B74CD  call        __chkstk (07FF7B28B4160h)  <== exception
>

This is supposed to automagically deal with the large allocation on the
stack (http://support.microsoft.com/kb/100775,
http://stackoverflow.com/questions/8400118/what-is-the-purpose-of-the-chkstk-function).
It seems there are compiler options that might disable/cripple the __chkstk
call, so if CUDA or GROMACS is being evil there, we'd want to have a look
at / change the command-line flags that were used in practice. Can you look
those up for us, please Mirco?


> ********************************************************
> 00007FF7B25B74D2  sub         rsp,rax
> 00007FF7B25B74D5  mov         rax,qword ptr [__security_cookie
> (07FF7B29963E0h)]
> 00007FF7B25B74DC  xor         rax,rsp
> 00007FF7B25B74DF  mov         qword ptr [rsp+100080h],rax
> 00007FF7B25B74E7  mov         rsi,rdx
>     /*1MB: large size important to catch almost identical files */
> #define CPT_CHK_LEN  1048576
>     md5_state_t   state;
>     unsigned char buf[CPT_CHK_LEN];
>

I can see no reason why this should be done on the stack, but neither can I
see a good reason why it should fail.

Supporting machinery for being able to have an automated test
"CanWriteCheckpoint" is on the table for 5.0, which might have prevented
this from occuring.

Mark


>     gmx_off_t     read_len;
>     gmx_off_t     seek_offset;
>     int           ret = -1;
>
>     seek_offset = offset - CPT_CHK_LEN;
>     if (seek_offset < 0)
> 00007FF7B25B74EA  xor         eax,eax
> 00007FF7B25B74EC  add         rdx,0FFFFFFFFFFF00000h
> 00007FF7B25B74F3  cmovs       rdx,rax
> 00007FF7B25B74F7  mov         rdi,rcx
>     {
> ------------------------------------------------------------------
>
> --
> gromacs.org_gmx-users mailing list    gromacs.org_gmx-users@
> maillist.sys.kth.se
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>


More information about the gromacs.org_gmx-users mailing list