[gmx-users] Windows/x64, VCVS2012 compiled, crash on checkpoint writing
Mark Abraham
mark.j.abraham at gmail.com
Mon Nov 18 16:05:12 CET 2013
On Fri, Nov 15, 2013 at 6:43 PM, Mirco Wahab <
mirco.wahab at chemie.tu-freiberg.de> wrote:
> Gromacs 4.6.4 compiles (and links) perfectly w/VS2012
> and nvcc from CUDA 5.5 on windows/x64
> (MSVC 2012 Version 11.0.60610.01 Update 3).
>
> But -- when compiled with VS2012 (because of linking
> against CUDA 5.5 is only possible then - in contrast
> to VS2010), mdrun crashes on writing the checkpoint file.
>
> This will not happen when compiling with VS2010
> (but this exludes using of CUDA 5.5, only 5.0 is
> supported).
>
> I did set up a debugging session in VS2012 in order
> to determine the exception location (see below, on
> entry into the named function). mdrun has been compiled
> as "Release w/DebugInfo".
>
> (This has been possibly also been a problem in 4.6.3, iirc.)
>
Almost certainly a problem for the whole 4.6 series, if my theory below is
correct.
>
>
> ----- d:\libsrc\gromacs\gromacs-4.6.4\src\gmxlib\gmxfio.c ----
>
>
> /* internal variant of get_file_md5 that operates on a locked file */
> static int gmx_fio_int_get_file_md5(t_fileio *fio, gmx_off_t offset,
> unsigned char digest[])
> {
> 00007FF7B25B74C0 mov qword ptr [state],rbx
> 00007FF7B25B74C5 push rbp
> 00007FF7B25B74C6 push rsi
> 00007FF7B25B74C7 push rdi
> 00007FF7B25B74C8 mov eax,100090h
> ********************************************************
> 00007FF7B25B74CD call __chkstk (07FF7B28B4160h) <== exception
>
This is supposed to automagically deal with the large allocation on the
stack (http://support.microsoft.com/kb/100775,
http://stackoverflow.com/questions/8400118/what-is-the-purpose-of-the-chkstk-function).
It seems there are compiler options that might disable/cripple the __chkstk
call, so if CUDA or GROMACS is being evil there, we'd want to have a look
at / change the command-line flags that were used in practice. Can you look
those up for us, please Mirco?
> ********************************************************
> 00007FF7B25B74D2 sub rsp,rax
> 00007FF7B25B74D5 mov rax,qword ptr [__security_cookie
> (07FF7B29963E0h)]
> 00007FF7B25B74DC xor rax,rsp
> 00007FF7B25B74DF mov qword ptr [rsp+100080h],rax
> 00007FF7B25B74E7 mov rsi,rdx
> /*1MB: large size important to catch almost identical files */
> #define CPT_CHK_LEN 1048576
> md5_state_t state;
> unsigned char buf[CPT_CHK_LEN];
>
I can see no reason why this should be done on the stack, but neither can I
see a good reason why it should fail.
Supporting machinery for being able to have an automated test
"CanWriteCheckpoint" is on the table for 5.0, which might have prevented
this from occuring.
Mark
> gmx_off_t read_len;
> gmx_off_t seek_offset;
> int ret = -1;
>
> seek_offset = offset - CPT_CHK_LEN;
> if (seek_offset < 0)
> 00007FF7B25B74EA xor eax,eax
> 00007FF7B25B74EC add rdx,0FFFFFFFFFFF00000h
> 00007FF7B25B74F3 cmovs rdx,rax
> 00007FF7B25B74F7 mov rdi,rcx
> {
> ------------------------------------------------------------------
>
> --
> gromacs.org_gmx-users mailing list gromacs.org_gmx-users@
> maillist.sys.kth.se
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
More information about the gromacs.org_gmx-users
mailing list