-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
The bug is very critical for us. Our customers are getting a lot crashes under profiler.
.NET 6.0.4 x64, Windows 10.0.19044
Our profiler crashed in case when Enter/Leave/Tailcall was called on the thread where ICorProfilerCallback::ThreadCreated wasn't called before. I used the Time Travel Debugger (TTD) to make the situation clear. The investigating thread is 0x7dec. The beginning state show us that the thread is native:
0:028> ~.
. 28 Id: 664.7dec Suspend: 4096 Teb: 00000040`af9a0000 Unfrozen
0:028> !threads
ThreadCount: 18
UnstartedThread: 0
BackgroundThread: 17
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
Lock
DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 40c8 00000210E28F4000 26020 Preemptive 0000021080098D80:0000021080099FD0 00000210e284b430 -00001 STA
10 2 7130 00000210E43DB2A0 2b220 Preemptive 0000000000000000:0000000000000000 00000210e284b430 -00001 MTA (Finalizer)
11 3 4250 00000210E85A3F60 2b220 Preemptive 000002108003D940:000002108003DFD0 00000210e284b430 -00001 MTA
15 4 31fc 00000210E8E8DAA0 2b220 Preemptive 000002108001D440:000002108001DFD0 00000210e284b430 -00001 MTA
19 5 b224 00000210E9221650 102a220 Preemptive 0000000000000000:0000000000000000 00000210e284b430 -00001 MTA (Threadpool Worker)
21 6 7498 00000210E92A7380 102b220 Preemptive 0000021080078248:00000210800791E8 00000210e284b430 -00001 MTA (Threadpool Worker)
22 7 865c 00000210E94744F0 302b220 Preemptive 0000021080064DF8:0000021080065FD0 00000210e284b430 -00001 MTA (Threadpool Worker)
23 8 9fc0 00000210E94A6BF0 102b220 Preemptive 0000021080070FC8:0000021080071FD0 00000210e284b430 -00001 MTA (Threadpool Worker)
24 9 6590 00000210E94CB850 102b220 Preemptive 0000021080086FE8:0000021080087FD0 00000210e284b430 -00001 MTA (Threadpool Worker)
25 10 a750 00000210EEC72F20 20220 Preemptive 0000021080079300:000002108007B1E8 00000210e284b430 -00001 Ukn
29 11 c7fc 00000210EEC7C8C0 20220 Preemptive 0000021080072358:0000021080073FD0 00000210e284b430 -00001 Ukn
32 12 c1d0 00000210EECC26B0 102b220 Preemptive 000002108007D768:000002108007F1E8 00000210e284b430 -00001 MTA (Threadpool Worker)
31 13 c3c4 00000210EECC2D40 102b220 Preemptive 000002108007C088:000002108007D1E8 00000210e284b430 -00001 MTA (Threadpool Worker)
33 14 8174 00000210EECC33D0 102b220 Preemptive 000002108007FE18:00000210800811E8 00000210e284b430 -00001 MTA (Threadpool Worker)
34 15 b320 00000210EECC2020 8029220 Preemptive 00000210800916F8:0000021080091FD0 00000210e284b430 -00001 MTA (Threadpool Completion Port)
35 16 93bc 00000210EED4C210 8029220 Preemptive 0000021080094360:0000021080095FD0 00000210e284b430 -00001 MTA (Threadpool Completion Port)
36 17 2af8 00000210EED4BB80 8029220 Preemptive 0000021080092A30:0000021080093FD0 00000210e284b430 -00001 MTA (Threadpool Completion Port)
37 18 b140 00000210EED4D5C0 8029220 Preemptive 0000000000000000:0000000000000000 00000210e284b430 -00001 MTA (Threadpool Completion Port)
We are in ntdll!LdrpDispatchUserCallTarget:
ntdll!LdrpDispatchUserCallTarget:
00007ff9`bb3fc620 4c8b1d812d0f00 mov r11, qword ptr [ntdll!LdrSystemDllInitBlock+0xb8 (00007ff9`bb4ef3a8)]
00007ff9`bb3fc627 4c8bd0 mov r10, rax
00007ff9`bb3fc62a 49c1ea09 shr r10, 9
00007ff9`bb3fc62e 4f8b1cd3 mov r11, qword ptr [r11+r10*8]
00007ff9`bb3fc632 4c8bd0 mov r10, rax
00007ff9`bb3fc635 49c1ea03 shr r10, 3
00007ff9`bb3fc639 a80f test al, 0Fh
00007ff9`bb3fc63b 7509 jne ntdll!LdrpDispatchUserCallTarget+0x26 (00007ff9`bb3fc646)
00007ff9`bb3fc63d 4d0fa3d3 bt r11, r10
00007ff9`bb3fc641 730e jae ntdll!LdrpDispatchUserCallTarget+0x31 (00007ff9`bb3fc651)
00007ff9`bb3fc643 48ffe0 jmp rax {00007ff9`04485ab0} ; <===================== RIP
00007ff9`bb3fc646 490fbaf200 btr r10, 0
00007ff9`bb3fc64b 4d0fa3d3 bt r11, r10
00007ff9`bb3fc64f 730d jae ntdll!LdrpDispatchUserCallTarget+0x3e (00007ff9`bb3fc65e)
00007ff9`bb3fc651 4983ca01 or r10, 1
00007ff9`bb3fc655 4d0fa3d3 bt r11, r10
00007ff9`bb3fc659 7303 jae ntdll!LdrpDispatchUserCallTarget+0x3e (00007ff9`bb3fc65e)
00007ff9`bb3fc65b 48ffe0 jmp rax
00007ff9`bb3fc65e 41ba01000000 mov r10d, 1
00007ff9`bb3fc664 e957feffff jmp ntdll!LdrpHandleInvalidUserCallTarget (00007ff9`bb3fc4c0)
With stack:
0:028> k
# Child-SP RetAddr Call Site
00 00000040`b417f178 00007ff9`b9e7688a ntdll!LdrpDispatchUserCallTarget+0x23
01 00000040`b417f180 00007ff9`b9e78ad3 WS2_32!NSMASTERJOB::SignalCompleted+0x282
02 00000040`b417f200 00007ff9`b9e78713 WS2_32!NSJOB::SignalCompleted+0xa3
03 00000040`b417f240 00007ff9`b9e78559 WS2_32!NSSUBJOB::SignalCompletedQDNS+0x143
04 00000040`b417f330 00007ff9`b9e77e1f WS2_32!NSSUBJOB::SignalCompletedQDNS4F+0xf9
05 00000040`b417f380 00007ff9`b9e7a78c WS2_32!NSSUBJOB::SignalCompletedQDNS4F+0x3f
06 00000040`b417f3c0 00007ff9`b9e7a6e5 WS2_32!WSALookupServiceNextAsyncWCallback+0x2c
07 00000040`b417f400 00007ff9`b9e79f49 WS2_32!NSQUERY::LookupServiceNextAsyncCoalesce+0x141
08 00000040`b417f480 00007ff9`bb3e0bf9 WS2_32!NSQUERY::LookupServiceNextAsyncCallback+0x229
09 00000040`b417f590 00007ff9`bb3c315a ntdll!TppSimplepExecuteCallback+0x99
0a 00000040`b417f5e0 00007ff9`b9447034 ntdll!TppWorkerThread+0x68a
0b 00000040`b417f8e0 00007ff9`bb3c2651 KERNEL32!BaseThreadInitThunk+0x14
0c 00000040`b417f910 00000000`00000000 ntdll!RtlUserThreadStart+0x21
After executing jumps we are in coreclr.dll:
00007ff9`04488c90 55 push rbp
00007ff9`04488c91 57 push rdi
00007ff9`04488c92 56 push rsi
00007ff9`04488c93 4883ec30 sub rsp, 30h
00007ff9`04488c97 488d6c2440 lea rbp, [rsp+40h]
00007ff9`04488c9c 894d10 mov dword ptr [rbp+10h], ecx
00007ff9`04488c9f 895518 mov dword ptr [rbp+18h], edx
00007ff9`04488ca2 4c894520 mov qword ptr [rbp+20h], r8
00007ff9`04488ca6 b986005002 mov ecx, 2500086h
00007ff9`04488cab 488d5510 lea rdx, [rbp+10h]
00007ff9`04488caf ff1523d7945f call qword ptr [coreclr!hlpDynamicFuncTable+0xe8 (00007ff9`63dd63d8)] ; <================================= RPI
00007ff9`04488cb5 8b4d10 mov ecx, dword ptr [rbp+10h]
00007ff9`04488cb8 8b5518 mov edx, dword ptr [rbp+18h]
00007ff9`04488cbb 4c8b4520 mov r8, qword ptr [rbp+20h]
00007ff9`04488cbf 8bf1 mov esi, ecx
00007ff9`04488cc1 498bf8 mov rdi, r8
00007ff9`04488cc4 488d4de0 lea rcx, [rbp-20h]
00007ff9`04488cc8 48baf0415804f97f0000 mov rdx, 7FF9045841F0h
00007ff9`04488cd2 4533c0 xor r8d, r8d
00007ff9`04488cd5 e8f6ca605f call coreclr!JIT_ReversePInvokeEnterTrackTransitions (00007ff9`63a957d0)
00007ff9`04488cda 8bce mov ecx, esi
00007ff9`04488cdc 488bd7 mov rdx, rdi
00007ff9`04488cdf e8d4cdffff call 00007ff9`04485ab8
00007ff9`04488ce4 488d4de0 lea rcx, [rbp-20h]
00007ff9`04488ce8 e8f3e9605f call coreclr!JIT_ReversePInvokeExitTrackTransitions (00007ff9`63a976e0)
00007ff9`04488ced b986005002 mov ecx, 2500086h
00007ff9`04488cf2 488d5510 lea rdx, [rbp+10h]
00007ff9`04488cf6 ff15e4d6945f call qword ptr [coreclr!hlpDynamicFuncTable+0xf0 (00007ff9`63dd63e0)]
00007ff9`04488cfc 90 nop
00007ff9`04488cfd 4883c430 add rsp, 30h
00007ff9`04488d01 5e pop rsi
00007ff9`04488d02 5f pop rdi
00007ff9`04488d03 5d pop rbp
00007ff9`04488d04 c3 ret
The call qword ptr [coreclr!hlpDynamicFuncTable+0xe8] to our profiler code:
00007df4`71270000 50 push rax
00007df4`71270001 51 push rcx
00007df4`71270002 52 push rdx
00007df4`71270003 4150 push r8
00007df4`71270005 4151 push r9
00007df4`71270007 4152 push r10
00007df4`71270009 4153 push r11
00007df4`7127000b 4883ec30 sub rsp, 30h
00007df4`7127000f 660f7f442420 movdqa xmmword ptr [rsp+20h], xmm0
00007df4`71270015 488bd1 mov rdx, rcx
00007df4`71270018 48b980fe39e410020000 mov rcx, 210E439FE80h
00007df4`71270022 48b810474b71f97f0000 mov rax, offset JetBrains_Profiler_Core!jbprof::performance_tracing_profiler::Rdtsc_fast_enter (00007ff9`714b4710)
00007df4`7127002c ffd0 call rax
00007df4`7127002e 660f6f442420 movdqa xmm0, xmmword ptr [rsp+20h]
00007df4`71270034 4883c430 add rsp, 30h
00007df4`71270038 415b pop r11
00007df4`7127003a 415a pop r10
00007df4`7127003c 4159 pop r9
00007df4`7127003e 4158 pop r8
00007df4`71270040 5a pop rdx
00007df4`71270041 59 pop rcx
00007df4`71270042 58 pop rax
00007df4`71270043 c3 ret
Diagnosis: No any "unmanaged to manage transition" / "thread creating" calls was made before the profiler get the "Enter" callback.
We have 100% repro in attachment WinFormsApp1.zip (run application under profiler with ELT callbacks, press the button till crash, at least 2 times)
TTD recording: https://drive.google.com/file/d/1MoxiP4Y28N7vHiR_33_3HWS5daGuMhy4/view?usp=sharing (276Mb, SHA-256: fa7b670bcf279f79c0759450138bcd08eee831041d625452641f17fccf519b24)
P.S. Connected issue in our tracker https://youtrack.jetbrains.com/issue/DTRC-28226