Skip to content

Conversation

@MichalStrehovsky
Copy link
Member

There are two issues with this:

1. The merged tests are actually broken, but the failures are only observable without Helix

Helix does not report the failures (this is the reason I’m doing a rollback right away instead of waiting around for a fix). Example:

if (!stackTrace.Contains("BringUpTest.Main"))
{
Console.WriteLine("Unexpected stack trace: " + stackTrace);
return Fail;
}

The above should obviously fail because there is no BringUpTest.Main (it got renamed to BringUpTest.TestEntryPoint as part of merging). The CI is all green, despite of this. The test fails if I run it locally through:

build clr.aot+libs -rc Checked -lc Release
src\tests\build.cmd nativeaot checked tree nativeaot\SmokeTests
src\tests\run.cmd runnativeaottests checked

2. There are several test failures, some of which look like they’ll need fixes in the merged wrapper infrastructure

  • The Exceptions test wants to be able to test AppDomain.UnhandledException:

    AppDomain.CurrentDomain.UnhandledException += UnhandledExceptionEventHandler;

    The wrapper however generates a Try/Catch that doesn’t let UnhandledException happen. The test fails.

  • The TrimmingBehaviors test is sensitive to ripple effects in whole program view. The wrapper’s use of Xunit Assert.Equal brings in a massive amount of reflection that causes ripple effects all over the place. The test fails in a rather obscure way and it took a while to root cause.

I’m also concerned about whether we have any tests that may now be succeeding because of the ripple effects from the wrapper (and we would not notice if the thing they are testing gets broken because they get saved by the ripple effects.

To fix this one, can we make it so RequiresProcessIsolation really just generates a Main that calls the other Main, without checking exit codes, wrapping anything, Assert.Equaling etc?

Cc @dotnet/ilc-contrib

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Mar 26, 2024

Helix does not report the failures

Does this affect all merged tests or just the nativeaot ones that you are reverting?

@MichalStrehovsky
Copy link
Member Author

Helix does not report the failures

Does this affect all merged tests or just the nativeaot ones that you are reverting?

I'm afraid of the answer to this. I didn't check. Smoke tests are absolutely critical.

@jkotas
Copy link
Member

jkotas commented Mar 27, 2024

I'm afraid of the answer to this. I didn't check.

Checking: #100319

@agocke
Copy link
Member

agocke commented Mar 27, 2024

The wrapper’s use of Xunit Assert.Equal brings in a massive amount of reflection that causes ripple effects all over the place.

This shouldn't be true anymore. The Microsoft.Dotnet.XUnitAssert package should sub in and it should be trim safe.

@jkotas
Copy link
Member

jkotas commented Mar 27, 2024

I'm afraid of the answer to this. I didn't check.

The other runtime tests are failing as expected https://dev.azure.com/dnceng-public/public/_build/results?buildId=620112&view=ms.vss-test-web.build-test-results-tab .

@MichalStrehovsky
Copy link
Member Author

The wrapper’s use of Xunit Assert.Equal brings in a massive amount of reflection that causes ripple effects all over the place.

This shouldn't be true anymore. The Microsoft.Dotnet.XUnitAssert package should sub in and it should be trim safe.

Trim safety is not the problem here, it's the use of reflection on random stuff in the BCL.

In the case I saw, somewhere after Assert.Equal<T> we marked methods on IEquatable<T> as reflected on and that was breaking what a test was trying to exercise because we were no longer able to remove something. (It took extra long to troubleshoot because I thought it's something I broke, only to find out the test is broken in main.)

The tests under nativeaot need to have very precise control over what's in the dependency graph to be able to test the things they are testing. The implementation of Assert.Equal that still depends on reflection doesn't feel like what we'd want in the runtime test tree (it's fine for libs testing if it's trim safe, but runtime tests should have as little code running as part of the test infrastructure as possible).

@MichalStrehovsky
Copy link
Member Author

I'm afraid of the answer to this. I didn't check.

The other runtime tests are failing as expected https://dev.azure.com/dnceng-public/public/_build/results?buildId=620112&view=ms.vss-test-web.build-test-results-tab .

Thanks for re-running the check - I also suspect this might be related to RequiresProcessIsolation; there's not much else that is different in these tests. We'll see if that's the case soon enough...

@MichalStrehovsky
Copy link
Member Author

Can anyone sign off? It's not acceptable to have main in a state where we basically don't run nativeaot testing for a week.

This needs to be merged before #100331 can fix testing RequiresProcessIsolation tests.

The test failure is an unrelated GC issue.

@MichalStrehovsky MichalStrehovsky merged commit 782c2ee into dotnet:main Mar 27, 2024
@MichalStrehovsky
Copy link
Member Author

Thanks!

@MichalStrehovsky MichalStrehovsky deleted the rollb branch March 27, 2024 11:32
MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this pull request Apr 3, 2024
This is unfortunate because it was added for native AOT sake in dotnet#91415.

Then Tomas rewrote the test to execute corerun.exe in dotnet#91560 making it 100% unsupportable with native AOT.

What the test is doing doesn't look supportable with the merged runner infra and runs into the same issues I saw in dotnet#100269.

We never noticed this test is broken because tests requiring process isolation are completely busted with native AOT. I did a local run and started re-baselining again because this is not the only regression.
MichalStrehovsky added a commit that referenced this pull request Apr 8, 2024
This is unfortunate because it was added for native AOT sake in #91415.

Then Tomas rewrote the test to execute corerun.exe in #91560 making it 100% unsupportable with native AOT.

What the test is doing doesn't look supportable with the merged runner infra and runs into the same issues I saw in #100269.

We never noticed this test is broken because tests requiring process isolation are completely busted with native AOT. I did a local run and started re-baselining again because this is not the only regression.
@github-actions github-actions bot locked and limited conversation to collaborators Apr 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants