Skip to content

Log assembly loads during task run#8316

Merged
JaynieBai merged 23 commits intodotnet:mainfrom
JanKrivanek:proto/assembly-loads
Feb 24, 2023
Merged

Log assembly loads during task run#8316
JaynieBai merged 23 commits intodotnet:mainfrom
JanKrivanek:proto/assembly-loads

Conversation

@JanKrivanek
Copy link
Copy Markdown
Member

@JanKrivanek JanKrivanek commented Jan 18, 2023

Fixes #6303
Related: KirillOsenkov/MSBuildStructuredLog#653

Context

  • Assembly loading logging for:
    • Task runs - including loads in separate app domain for AppDomainIsolatedTask
    • Except for tasks runs in OutOfProcessRaskHostNode - as neither LoggingService nor LoggingContext are available there - not sure if not supporting those is a problem.
    • Sdk resolving (explicitly skipping resolvers defined in Microsoft.Build assembly)
    • Loggers initialization (explicitly skipping resolvers defined in Microsoft.Build assembly orwithin Microsoft.Build.Logging namespace) - Note - loggers initialiation reordered - Binlog is the first one - as assembly load events are emited at the time of each logger initialization, currently LoggerService dispatches events immediately after construction, so as a result order of logger definitions matter (only the earlier initialized gets messages about latter ones initialization assembly loads). Alternative would be to change the LoggerService contract to start dispatching on explicit request (easy to implement and preserving current MSBuild behavior - however might be potentially breaking for API users?) or to cache those events and log them later (doable, but convoluted - probably not worth the efforts now)
    • Evaluation

Samples

  • task run:

    Assembly loaded during TaskRun: System.Diagnostics.Debug, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a (location: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Diagnostics.Debug\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Diagnostics.Debug.dll, MVID: bc6b825d-5f99-464e-a06d-e3ae4b860a34, AppDomain: [Default])
    
  • evaluation:

    Assembly loaded during Evaluation: System.Collections.Immutable, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a (location: C:\Program Files\dotnet\shared\Microsoft.NETCore.App\7.0.2\System.Collections.Immutable.dll, MVID: 5a4c54a3-2092-428e-89cc-a391fd9d398a, AppDomain: [Default])
    
  • logger initialization:

    Assembly loaded during LoggerInitialization (MyLoggers.BasicFileLogger): System.Reflection.Metadata, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a (location: C:\Program Files\dotnet\shared\Microsoft.NETCore.App\7.0.2\System.Reflection.Metadata.dll, MVID: c602a8ce-3890-40fc-aa9b-fa3c23f81aab, AppDomain: [Default])
    
  • custom sdk resolver (no sample - similar to logger initialization)

Perf impact

No observable impact.

  • MSBUILDLOGALLASSEMBLYLOADS is used to turn on loggin of all assembly loads
  • build command: msbuild.exe <proj> /nodeReuse:false /bl
Scenario Mean Duration
Orchard - MSBuild main 00:00:59.25
Orchard - curr branch 00:00:58.30
Orchard - curr branch, all loads 00:00:58.30
Console - MSBuild main 00:00:01.08
Console - curr branch 00:00:01.09
Console - curr branch, all loads 00:00:01.10

@JanKrivanek JanKrivanek changed the title Proto/assembly loads Log assembly loads during task run Jan 18, 2023
@JanKrivanek
Copy link
Copy Markdown
Member Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@JanKrivanek JanKrivanek marked this pull request as ready for review January 25, 2023 09:55
@KirillOsenkov
Copy link
Copy Markdown
Member

Don't forget to increment the file format version:
https://source.dot.net/#Microsoft.Build/Logging/BinaryLogger/BinaryLogger.cs,7791c3b6dbbeedbf,references

I'd say don't worry about the out of proc task node - it's a black box to us anyway, let's consider it a "tool" that we don't control.

Make sure AppDomain stuff works on Core (Core doesn't have AppDomains) or gracefully degrades.

I'm torn on whether we should include the built-in assembly loads. On one hand, we want to know where our own stuff is loaded from. On the other hand, we can already infer since we know the path to MSBuild.exe, and maybe not worth having the same redundant messages at the beginning of every binlog.

I'd say let's record a couple of medium-sized real-world binlogs when the PR is ready, add viewer support, look at the logs in the viewer and then tweak to achieve the desired behavior.

Also remember that we should submit the binlog viewer support for the new file format version first, test it with the test binlogs, I will publish the new viewer version, then we merge this PR. So that by the time early adopters start recording these binlogs the viewer already supports them.

See a sample PR by @edvilme that incremented the binlog version:
#8146

and the corresponding viewer PR:
KirillOsenkov/MSBuildStructuredLog#644

@KirillOsenkov
Copy link
Copy Markdown
Member

Forgot to say, the general direction looks good to me!

Copy link
Copy Markdown
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my big question is whether there's any noticeable perf cost. If there is, is it worth the additional information in general, or should we build an opt-in?

We shouldn't be loading assemblies all that much, so I'm hopeful we can just have it on . . .


if (e.BuildEventContext == null && e is AssemblyLoadBuildEventArgs)
{
return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not log this in the text log, too? It's generally less helpful than binlogs but still has some place, and is (sadly) still popular.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr;: I believe it's out of scope currently.

It logs with exception of assembly loading for custom loggers. Those do not have any BuildContext that ParallelConsoleLogger relies upon quite a lot. It would take some refactoring of ParallelConsoleLogger, which is definitely doable but feels beyond scope of this change.
If it would be missed - it can definitely be added later on.
Thoughts?

namespace Microsoft.Build.BackEnd.Components.RequestBuilder
{
internal class AssemblyLoadsTracker : MarshalByRefObject, IDisposable
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this type run on both core and framework? Heavy use of AppDomain makse me think framework but not seeing #if to remove it from core builds.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's core and framework tested.
That being said - the dictionary with app domains is really needed just on FW - so putting this behing #ifdef
Rest I believe needs to stay as is - but it might be my lack of knowledge - for tracking assembly loads I need AppDomain (even if it's just the default one on core) to subscribe - correct?

{
s_instances.AddOrUpdate(appDomain, tracker, (_, loadsTracker) => loadsTracker);
}
tracker.StartTracking();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: consider the case where the same AppDomain is registered twice (the crux of my comment above). This code actually produces a subtle race condition. There is a period of time inside the call to StartTracking that assembly loads are not monitored and that means we could end up missing an entry in the log.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convo was marked as resolved but I can't see how. The race condition still exists.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reiterating this one!
Since the StartTracking instance method was private and was invoked only after construction - there was no chance for double subscription case - so the unsubscribe-subscribe was superfluous and I removed it.
But it's a good point for future cases as well

/// </summary>
private async Task<WorkUnitResult> InitializeAndExecuteTask(TaskLoggingContext taskLoggingContext, ItemBucket bucket, IDictionary<string, string> taskIdentityParameters, TaskHost taskHost, TaskExecutionMode howToExecuteTask)
{
using var assemblyLoadsTracker = AssemblyLoadsTracker.StartTracking(taskLoggingContext, AssemblyLoadingContext.TaskRun);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not be specific about AppDomain.CurrentDomain here to make it explicit it's part of the tracking?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have adjusted the code not to store and transfer AppDomain name and id for default app domain (so never on Core and only for explict app domains on Framework).
So I'd probably still prefer to keep the majority use case simpler (being able to specify the AppDomain explicitly but not being forced to).
Thoughts?

@JanKrivanek
Copy link
Copy Markdown
Member Author

Coresponding Binlog viewer PR: KirillOsenkov/MSBuildStructuredLog#653
@KirillOsenkov FYI

var loggers = new List<ILogger>();

var outVerbosity = verbosity;
ProcessBinaryLogger(binaryLoggerParameters, loggers, ref outVerbosity);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reordered loggers initialization to have binlog on top - to ensure it receives all the events from initialiation of custom loggers.

@rainersigwald - is there any hidden risk associated to reordering the creation and initialiation of the loggers

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing springs to mind.

@Forgind
Copy link
Copy Markdown
Contributor

Forgind commented Feb 8, 2023

I'm wondering if we could simplify this. When do we expect to load assemblies? This PR seems to just say that when we load assemblies, we should log it but does that with broad strokes. Could we have something that targets the places where we load an assembly and log that an assembly was loaded at those places? So for tasks, for instance, you could put things here and here.

@rainersigwald
Copy link
Copy Markdown
Member

@Forgind the .NET assembly loader(s) load assemblies just-in-time to provide definitions for methods that can be called in a method as it is JITted before its first run. Because of that, we cannot control when assemblies will be loaded beyond the plugin entrypoints we explicitly control.

I originally was thinking of logging only the resolved paths of plugin entrypoints, but this approach is more comprehensive and enables debugging collisions between transitive dependencies of plugins, so it seems superior to me.

@JanKrivanek
Copy link
Copy Markdown
Member Author

This change is mostly about runtime induced (implicit) loading.
The explicit type loading is actualy not loaded.

For this reason (closer explained by @rainersigwald) we unfortunately need to mount observing for the entire execution time of the component in question.
For tasks and for evaluation this is fortunately possible to perform in properly constrained code section (and execution time).

For loggers and sdkresolvers, the typeloader should actually be the additional point in code hooked (I choosed not to hook, as for those two components we hopefully should know where we are loading from and this way I do not need to complicately distinguish which type loading is of interest and which to fileter out).

Additionaly for the loggers the code obvserves only the initialization section - in hopes all the assemblies will be touched there. But I can imagine esoteric loggers that will use additional assemblies only during handling of specific events or even shutdown - this is intentionally not supported (as it opens a can of worm of distniguishing what caused assembly load at specific point of time - e.g. when custom task is logging to custom logger).

tl;dr;: please let me know if you think the assembly loads caused by loading the actual types implementing the custom loggers or sdkresolvers should be logged as well (in addition to the current hooked usages of the components.)

@rainersigwald
Copy link
Copy Markdown
Member

tl;dr;: please let me know if you think the assembly loads caused by loading the actual types implementing the custom loggers or sdkresolvers should be logged as well (in addition to the current hooked usages of the components.)

I think this is important. One way things can go wrong is a build that has references to two versions of a build task with the same identity. If they're loaded in separate worker nodes they can both work, but when loaded in the same process the first one "wins" the load and might cause behavior differences.

@JanKrivanek
Copy link
Copy Markdown
Member Author

I think this is important. One way things can go wrong is a build that has references to two versions of a build task with the same identity. If they're loaded in separate worker nodes they can both work, but when loaded in the same process the first one "wins" the load and might cause behavior differences.

The task case is hooked withing InitializeAndExecuteTask - which is eventually calling into the TypeLoader - so the assembly information should be properly loged in this case. I'll verify with explicit custom task.

The TypeLoader loading is not hooked for logger and sdk resolver (the logging service is not yet available there) - though this can definitely be solved somehow (deffered log message?) if it feels as importnat usecase.

One another question that pops-up during testing this: can we somehow distinguish builtin tasks? (while possibly special case keep handling some that are used as surrogate for custom tasks - like RoslynCodeTaskFactory) Is there anything that can be used to filter those out?

image

@Forgind
Copy link
Copy Markdown
Contributor

Forgind commented Feb 8, 2023

Can we just check which assembly it's from? If it's in a fixed list, then it's us. I think we have a "fixed" list somewhere in MSBuildLocator

@rainersigwald
Copy link
Copy Markdown
Member

For tasks I think it's literally just Microsoft.Build.Tasks.Core.dll. Loggers we have in Microsoft.Build.dll though.

@JanKrivanek
Copy link
Copy Markdown
Member Author

I'm slowly peeling the onion here :-)
Some of the requests I can filter out by the initiating type being defined in the same assembly as is the executing one (Microsoft.Build.dl), others by being defined within namespace or an assembly starting with Microsoft.Build.
This is still not perfect - considerable amount of tasks are defined in SDK - so I'm experimenting with excluding them via string prefix (Microsoft.NET.Build) or probably better, by trying to type relate them to the loaded sdks.
Lastly some requests for monitoring we want to likely accept even if comming from internal types (e.g. the RoslynCodeTaskFactory).

Currently I'm waist down in the code and experimentation - trying to have it ready for PR review meeting

{
s_instances.AddOrUpdate(appDomain, tracker, (_, loadsTracker) => loadsTracker);
}
tracker.StartTracking();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convo was marked as resolved but I can't see how. The race condition still exists.

@JanKrivanek JanKrivanek requested a review from jaredpar February 12, 2023 17:25
// Combine SDK path with the "project" relative path
try
{
// Is the sdk resolver a custom type?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't SDK resolvers in different assemblies? We have our built-in one, but NuGet owns theirs, C++ owns theirs, etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I believe that if it's not built-in, than the default tracking has a sense, as it might not be obvious from where it was loaded. (similarly to this comment about roslyn tasks: #8316 (comment)).
What do you think?

Btw. this code was accidentaly performin the filtering twice - I removed the superfluous call wrapping the StartTracking, as unified (and bit more comprehensive) filtering is performed within StartTracking.
It filters as well based on the typename - https://github.com/dotnet/msbuild/pull/8316/files/f6c696399c13667bc7742a8e94c0cfc3c79e46aa#diff-7f217b4ccb76592a2ab78dd7f8332509dfa2e2216d8a6032f956cbdf48df8d03R86-R96

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d rather err on the side of logging known assemblies than filtering something out too aggressively.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a opt-in trait (via MSBUILDLOGALLASSEMBLYLOADS env var) to stop any filtering.
So if for any reason any component of interest would not be hooked (e.g. due to being placed to 'Microsoft.Build' namespace), it can allways be opted-in.

@JanKrivanek JanKrivanek requested a review from Forgind February 14, 2023 09:26
@JanKrivanek JanKrivanek added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Feb 23, 2023
@JaynieBai JaynieBai merged commit 6ce195f into dotnet:main Feb 24, 2023
@KirillOsenkov
Copy link
Copy Markdown
Member

I've published the viewer version that supports this, would be nice to test that it's able to open binlogs produced with this change.

@JanKrivanek JanKrivanek deleted the proto/assembly-loads branch February 27, 2023 12:49
@JanKrivanek
Copy link
Copy Markdown
Member Author

JanKrivanek commented Feb 27, 2023

@KirillOsenkov I confirm I can open my test logs (new console, orchard, etc.) with the latest official version 👍

I have an issue with some logs from previous version - I'm on it now.
All my old (without the assembly load logging feature) and new (with the assembly load logging feature) binlogs can be opened and reviewed without issues with the newest official version (2.1.784) of binlog viewer. Issue I was seeing was only for binlog produced by the in-progress version of msbuild. Appology for confusion here

Btw. a dumb question - Were there any thoughts to make the auto-update experience more friendly or understandable? I observed that the first open attempt led to error due to an old version, without simple info about update option/process. The Help->About menu presented a clear message about a new version and need to restart the app. Can this be perceived as confusing?

@KirillOsenkov
Copy link
Copy Markdown
Member

Yeah we can certainly improve this. One of these days!

@JanKrivanek
Copy link
Copy Markdown
Member Author

Yeah we can certainly improve this. One of these days!

Since I complained :-) - let me try to propose a fix ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider logging assembly loads into the binlog

6 participants