Skip to content

Capture AVX context when redirecting threads for suspension#47212

Merged
VSadov merged 10 commits intodotnet:masterfrom
VSadov:avx
Jan 22, 2021
Merged

Capture AVX context when redirecting threads for suspension#47212
VSadov merged 10 commits intodotnet:masterfrom
VSadov:avx

Conversation

@VSadov
Copy link
Member

@VSadov VSadov commented Jan 20, 2021

Resolves:#47016

_ASSERTE(!success);

// So now allocate a buffer of that size and call IntitializeContext again
LPVOID buffer = malloc(contextSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In VM, the typically pattern is to new (nothrow) BYTE[....] instead of calling malloc directly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in fact tried this, but I observed an assert saying new [] isnt allowed here. Wonder what was different then.

@VSadov
Copy link
Member Author

VSadov commented Jan 20, 2021

About the context size savings due to compaction:

On my machine:

  • sizeof(CONTEXT) -> 1232 bytes
  • InitializeContext with XSTATE_MASK_AVX -> 1775 bytes
  • InitializeContext2 with xStateCompactionMask = XSTATE_MASK_LEGACY | XSTATE_MASK_AVX -> 1647 bytes

I see 128 bytes savings from compaction, which is ~10%. Not a lot.
However my machine does not have AVX512 and perhaps on a machine with AVX512 the savings could be larger. Could be 800-900 bytes difference if my math is right.
It would be interesting to see what happens on actual AVX512 hardware though.

Also - somehow we do not have InitializeContext2 statically available fir our build config. The documentation specifies the same requirements for both InitializeContext and InitializeContext2 - Win7SP1, so I am not sure why we have one and not another.
As a workaround I check for InitializeContext2 dynamically.

@VSadov VSadov marked this pull request as ready for review January 21, 2021 03:47
@VSadov VSadov requested review from janvorli and mangod9 January 21, 2021 03:47
@VSadov
Copy link
Member Author

VSadov commented Jan 21, 2021

I think this is ready to be reviewed.

{
success = pfnInitializeContext2 ?
pfnInitializeContext2(buffer, context, &pOSContext, &contextSize, xStateCompactionMask):
InitializeContext(buffer, context, &pOSContext, &contextSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does our CI validate on non-AVX machine? Would be good to find one to ensure this change is not breaking on those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to try on AVX512 machine as well. I doubt the CI machines/VMs have that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly everything has AVX now and AVX512 is fairly rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMs in Azure should support some 512 variant:

GenuineIntel
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
3DNOW not supported
3DNOWEXT not supported
ABM not supported
ADX supported
AES supported
AVX supported
AVX2 supported
AVX512CD supported
AVX512ER not supported
AVX512F supported
AVX512PF not supported


if (!success)
{
free(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be delete[] buffer.

*contextBuffer = buffer;

#else
pOSContext = new (nothrow) CONTEXT();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pOSContext = new (nothrow) CONTEXT();
pOSContext = new (nothrow) CONTEXT;

I believe that new (nothrow) CONTEXT() will zero-initialize the context that is not necessary. This change should omit the zero-initialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true - verified with the compiler explorer


_ASSERTE(!success);

// So now allocate a buffer of that size and call IntitializeContext again
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// So now allocate a buffer of that size and call IntitializeContext again
// So now allocate a buffer of that size and call InitializeContext again

pfnInitializeContext2(NULL, context, NULL, &contextSize, xStateCompactionMask) :
InitializeContext(NULL, context, NULL, &contextSize);

_ASSERTE(!success);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be _ASSERTE(success), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, InitializeContext fails with ERROR_INSUFFICIENT_BUFFER but returns contextSize. Different than other win32 apis I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. Then it would be nice to add checking the GetLastError()==ERROR_INSUFFICIENT_BUFFER to the _ASSERTE condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

InitializeContext(buffer, context, &pOSContext, &contextSize);

// if AVX is supported set the appropriate features mask in the context
if (success == TRUE && supportsAVX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - why success==TRUE and not just success?

*contextBuffer = buffer;

#else
pOSContext = new (nothrow) CONTEXT();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true - verified with the compiler explorer

PINITIALIZECONTEXT2 pfnInitializeContext2 = NULL;

#ifdef TARGET_X86
#define CONTEXT_COMPLETE (CONTEXT_FULL | CONTEXT_FLOATING_POINT | \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - you can use CONTEXT_ALL | CONTEXT_EXCEPTION_REQUEST for all targets

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is not a case on ARM64.
On ARM64 CONTEXT_ALL also includes x18, which always contains TEB pointer in user mode. We are not getting it in RedirectCurrentThreadAtHandledJITCase case and that causes assertions, since we expect more.

Ill make this as it was originally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. On Unix, x18 is not handled in a special way and is always part of the context.

// This should not normally fail.
// The system silently ignores any feature specified in the FeatureMask
// which is not enabled on the processor.
success = SetXStateFeaturesMask(pOSContext, XSTATE_MASK_AVX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the context initialized by the InitializeContext already have this set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples in the docs make this call. (https://docs.microsoft.com/en-us/windows/win32/debug/working-with-xstate-context)
I think InitializeContext only knows that we want XSTATE, but more exact feature mask must be set separately.


if (!pCtx)
{
// Even when our caller is YieldTask, we can find a NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this case possible anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YieldTask scenario no longer exists. Looks like it has been mostly removed, but some pieces remained.

It is tempting to do more clean up, but this change might need to be back-ported, so I tried to limit the size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you for the explanation.

// Get and save the thread's context

#if defined(TARGET_X86) || defined(TARGET_AMD64)
_ASSERTE((pCtx->ContextFlags & CONTEXT_XSTATE) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we use the CopyContext API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know if it would be always safe to use CopyContext. CopyContext doc mentions that the context must be initialized with InitializeContext. We fall back to regular CONTEXT when we do not have AVX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I noticed that we still use InitializeContext even if AVX is disabled and it does not cost us much extra. Therefore I will use CopyContext. It will make this part a simpler.

// If the thread is at a GC safe point, push a RedirectedThreadFrame with
// the interrupted context and pulse the GC mode so that GC can proceed.
FrameWithCookie<RedirectedThreadFrame> frame(interruptedContext);
pThread->SetSavedRedirectContext(NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to mark the redirect context as in use here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in this case interruptedContext is stack-allocated in the signal handler. It is not the redirect instance, which we are guarding for under/over use. Marking "in use" here will detect a different instance and trigger an assert.

The original SetSavedRedirectContext(NULL) was a bit suspicious since it was not balanced with a "returning" set. If we ever had an actual redirect instance allocated, it might have leaked here.

Copy link
Member

@mangod9 mangod9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VSadov
Copy link
Member Author

VSadov commented Jan 22, 2021

it looks like AVX in windows can be disabled via bcdedit /set xsavedisable 1 , so I tested the changes with that.

  • everything seems to work fine. The original repro for this bug does not repro, regardless of having a fix or not.
  • we still use InitializeContext in no-AVX mode and it gives us context buffer size 39 bytes bigger than sizeof(CONTEXT). I think that is acceptable.
  • since we use InitializeContext anyways, we can use CopyContext as @janvorli suggested, which makes RedirectCurrentThreadAtHandledJITCase a bit simpler.
    I will make such change.
  • libs tests are all passing with AVX disabled
  • clr vector tests had some failures, but it is very likely a feature detection problem in the test harness. I will enter a separate issue on that - if we care to fix.

@VSadov
Copy link
Member Author

VSadov commented Jan 22, 2021

I think I have addressed all the concerns on this. Please let me know if I missed something.

@mangod9
Copy link
Member

mangod9 commented Jan 22, 2021

Should we validate with avx512? I have an Azure VM so can do a sanity check if you have a self-contained repro available.

@VSadov
Copy link
Member Author

VSadov commented Jan 22, 2021

@mangod9 - Thanks! I wanted to run tests and do some debugging, so I just acquired a similar VM

Here are my observations:

  • regular tests all passed
  • the repro sample failed without a fix and passed with the fix - as expected
  • There is a noticeable size benefit from context compaction on this machine.
    the size of context buffer:
    InitializeContext -> 3375 bytes
    InitializeContext2 with compaction flags -> 1647 bytes, same as on machine with no AVX512

Hardware:

	Name			Intel Xeon Platinum
	Codename		Skylake-SP
	Specification		Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
	Package (platform ID)	Socket 3647 LGA (0x0)
	CPUID			6.5.4
	Extended CPUID		6.55
	Core Stepping		H0/U0
	Technology		14 nm
	Core Speed		2693.0 MHz                                                                  
	Stock frequency		2700 MHz                                                                    V
	Instructions sets	MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX, AVX2, AVX512F, FMA3, TSX

@mangod9
Copy link
Member

mangod9 commented Jan 22, 2021

Ok, nice. Good to go then :). Thx!

@VSadov
Copy link
Member Author

VSadov commented Jan 22, 2021

Thanks!!!

@VSadov
Copy link
Member Author

VSadov commented Jan 22, 2021

/backport to release/5.0

@github-actions
Copy link
Contributor

Started backporting to release/5.0: https://github.com/dotnet/runtime/actions/runs/504558594

@github-actions
Copy link
Contributor

@VSadov backporting to release/5.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Save avx regs during SuspendThread
error: sha1 information is lacking or useless (src/coreclr/vm/threadsuspend.cpp).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Save avx regs during SuspendThread
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

VSadov added a commit to VSadov/runtime that referenced this pull request Jan 26, 2021
VSadov added a commit to VSadov/coreclr that referenced this pull request Jan 28, 2021
Anipik pushed a commit that referenced this pull request Feb 10, 2021
aik-jahoda pushed a commit to dotnet/coreclr that referenced this pull request Feb 10, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants