-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Request
Add diagnostic IPC command that allows prepending a path to the startup hooks (e.g. what's parsed from the DOTNET_STARTUP_HOOKS environment variable by the host) in the host during diagnostic startup suspension. This would allow dotnet-monitor to better participate in startup diagnostics scenarios via managed code without requiring our customers to meticulously specify the DOTNET_STARTUP_HOOKS environment variable.
Ideally, the tool is able to invoke something to specify the path to the startup hook by some mechanism and not leave it up to the user to figure out what that path is, set it correctly, and make sure that the library is available at the correct time.
Suggested commands would be:
AddStartupHook: Adds a startup hook path (preferably prepends) to the existing list of startup hooks in the host.
OR:
GetStartupHooks: Gets the currently list of startup hooks from the host.SetStartupHooks: Sets the list of startup hooks in the host.
Background
The .NET Monitor team is looking for a mechanism by which the dotnet-monitor tool can dynamically specify a startup hook so that the tool can participate in startup diagnostics for aspects of the runtime that do not have native diagnostic scenarios. For example, the tool wants to collect exceptions from the beginning of the process in order to aid in startup failures.
The startup hook notion is well-positioned to allow integration with applications before their entrypoints are executed. This feature uses the DOTNET_STARTUP_HOOKS environment variable is set at process start in order to determine the list of startup hook assemblies that should be loaded and executed. The host will load the environment variable value and make it available as part of the AppContext data. When the main thread is initialized, the StartupHookProvider.ProcessStartupHooks will consume the value and execute each of the specified startup hooks.
This mechanism works well if the envirnoment in which the app is executing is managed and prepopulated with the startup hook assemblies in well-known locations. The DOTNET_STARTUP_HOOKS environment variable must contain assembly paths that exist on disk at the time of execution or assembly names from which the probing algorithm for the default load context can locate it. If one of the paths do not exist, it will take down the process before it starts executing the application. This largely requires that either the managed environment knows exactly where the assemblies are so it can set the environment variable or a person needs to manually configure their deployment to ensure that the assembly paths are correct; both need to ensure that the files are available at execution time.
Details
Here's where it gets tricky for .NET Monitor:
- The dotnet-monitor tool bundles its startup hook assembly with itself for easy acquisition.
- The dotnet-monitor tool is largely acquirable through two means: .NET Tool install or a Docker image
- .NET Tool Installation
- If a customer installs it via
dotnet tool install, then the path to the startup hook looks something likeC:\Users\<user>\.dotnet\tools\.store\dotnet-monitor\8.0.0-preview.2.23155.4\dotnet-monitor\8.0.0-preview.2.23155.4\tools\net6.0\any\shared\win-x64\any\Microsoft.Diagnostics.Monitoring.StartupHook.dll. - This path will change between updates due to the version number of dotnet-monitor being included (twice). Anyone who installs the latest version of the tool will likely not be able to accurately and repeatedly predict what this path is, nor would automation know what version of dotnet-monitor is installed (wihtout executing it) in order to construct the path. This represents an easy failure mode if this path is used for a startup hook specification.
- If a customer installs it via
- Docker image
- The image only contains the dotnet-monitor tool. It is completely separated from any other application as those will be running in their own container. Thus, any file that must be loaded into the target application cannot be loaded directly from the dotnet-monitor container.
- .NET Tool Installation
- To mitigate both the inaccessibility of the assembly and the lack of predictability of the assembly path, we've added a configuration section to dotnet-monitor that allows copying the startup hook (among other assemblies for other features) to a well-known root location.
- The user sets the configuration property to some well-known path e.g.
/diag - This path is something that needs to be accessible by both dotnet-monitor and the target application.
- In Docker and Kubernetes, this is done by moutning an empty volume at this location in both containers.
- The tool will the copy its startup hook and other libraries under a version folder under this path e.g.
/diag/8.0.0-preview.2.23155.4/- This version folder is meant to mitigate the possibility that more than one dotnet-monitor instance is running but they are different versions or older instances of the shared location were not removed before starting new instances of dotnet-monitor or the target application. Basically, avoiding file locking for various scenarios.
- The startup hook would be located at
/diag/8.0.0-preview.2.23155.4/shared/any/Microsoft.Diagnostics.Monitoring.StartupHook.dll. The full path under the specified configuration property value (e.g./diag) is an implementation detail that the user shouldn't have to worry about and we'd like to reserve the ability to change it (e.g. reorganizing the file structure under the path). - This path is more (but not wholely) predictable and much more accessible since the root of it is specified by the user via configuration and it should be on a shared mounted volume.
- The path is still prone to mutation due to updates of the dotnet-monitor tool and the sub path being an implementation detail, thus setting this statically in a deployment will be error prone.
- The user sets the configuration property to some well-known path e.g.
Here's where it gets tricky for our customers:
- To reiterate, customers have to specify the path to dotnet-monitor's startup hook (which may be different depending on the installation and deployment methodologies) in the
DOTNET_STARTUP_HOOKSenvironment variable, which can be prone to error and cause their applications to not start if configured incorrectly. - Some customers have long deployment cycles (multi-month infrastructure rollouts) in which their managed offerings are not allowed to update environment variables except at those rollout times. For those customers, trying dotnet-monitor features that require configuring environment variables from the managed offering is largely a no-go.
- Some customers are not comfortable with doing a diagnostic runtime suspension at the beginning of their application launches fearing that if dotnet-monitor doesn't startup or doesn't respond appropriately, then their applications are forever waiting before executing any application code. This won't be solved with startup hooks, so we'll have to use hosting APIs to load the startup hook assembly.
Alternate Solutions
We could add a new mode to dotnet-monitor that allows copying of the shared assemblies ahead of starting the target application and the normal operation of dotnet-monitor. In Kubenetes, this would execute as an init container (which run before the application containers); this would copy the assemblies to the prescribed path (which should be a mounted volume that will be shared by the application container and the dotnet-monitor container) without any version sub-pathing. With this solution, a user or managed environment can then prescribe the shared path to both the init container, the application container, and the dotnet-monitor container. The user would configure an init container to effectively do dotnet-monitor stage-libriares --path /diag to put the libraries on the volume and and set DOTNET_STARTUP_HOOKS=/diag/shared/any/Microsoft.Diagnostics.Monitoring.StartupHook.dll on the target process. This simplifies the variability of the path a little bit and ensures that the files exist at the expected location, but (1) still puts the onus on the customer to understand the specific path scheme for dotnet-monitor and (2) causes application startup to be delayed due to the existance of the init container.
Other Investigations
- Use COM hosting APIs to manually load startup hook via ICorProfilerCallback implementation
- The dotnet-monitor tool has a profiler impelementation that could possibly be used to bootstrap the startup hook assembly.
- Try to load and execute in
ICorProfilerCallback::InitializebutICLRRuntimeHost::ExecuteInDefaultAppDomainwill always returnHOST_E_CLRNOTAVAILABLEbecause EE is not running yet. - Spin off a thread and repeatedly try the above until we get S_OK back. This works, but is a race condition with whatever is starting on the main thread of the application. We will likely miss some part of the managed code. As far as I can tell, there's no way for us to hold up the main thread without doing something like ReJIT + IL rewriting (at that point you may as well just insert a method call in the Main method that loads the startup hook!).
- Attempt to block the first module load (and let all others go through) and then execute
ICLRRuntimeHost::ExecuteInDefaultAppDomain; no surprise that it deadlocks because the loader lock is held. - Attempt to block the first thread creationg (and let all others go through) and then execute
ICLRRuntimeHost::ExecuteInDefaultAppDomain; no deadlock here but it always returnsHOST_E_CLRNOTAVAILABLE.
- Try to load and execute in
- The dotnet-monitor tool has a profiler impelementation that could possibly be used to bootstrap the startup hook assembly.
- Use C-style hosting APIs to manually load startup hook via ICorProfilerCallback
- Try get any C-style hosting API via
hostfxr_get_runtime_delegatewill fail withInvalidArgFailure (0x80008081)without specifying ahostfxr_handle; we aren't the host and didn't start the runtime, so we don't have the onlyhostfxr_handleinstance. This is concerning given that ourICorProfilerCallbackdoesn't have access to thehostfxr_handlein order to use these hosting APIs and the desire to remove GetCLRRuntimeHost. - Try mutate (via
hostfxr_get_runtime_property_value/hostfxr_set_runtime_property_value) or read (viahostfxr_get_runtime_properties) runtime properties:- While these take a
hostfxr_handle, seems not be required; good thing because we don't have access to the only one that is running. - This actually crashes the process if called from
ICorProfilerCallback::Initialize(while debugging with Visual Studio):
- While these take a
- Try get any C-style hosting API via
KernelBase.dll!RaiseException�() Unknown
hostpolicy.dll!_CxxThrowException(void * pExceptionObject=0x000000e3a1f7dc70, const _s__ThrowInfo * pThrowInfo) Line 75 C++
hostpolicy.dll!std::_Throw_Cpp_error(int code) Line 35 C++
hostpolicy.dll!std::_Throw_C_error(int code) Line 45 C++
[Inline Frame] hostpolicy.dll!std::_Check_C_return(int _Res) Line 131 C++
[Inline Frame] hostpolicy.dll!std::_Mutex_base::lock() Line 50 C++
[Inline Frame] hostpolicy.dll!std::lock_guard<std::mutex>::{ctor}(std::mutex &) Line 427 C++
hostpolicy.dll!`anonymous namespace'::get_hostpolicy_context(bool require_runtime) Line 152 C++
hostpolicy.dll!corehost_initialize(const corehost_initialize_request_t * init_request=0x0000000000000000, unsigned int options, corehost_context_contract * context_contract=0x000000e3a1f7df40) Line 756 C++
hostfxr.dll!fx_muxer_t::get_active_host_context() Line 954 C++
hostfxr.dll!hostfxr_get_runtime_properties(void * const host_context_handle, unsigned __int64 * count=0x000000e3a1f7e008, const wchar_t * * keys=0x0000000000000000, const wchar_t * * values=0x0000000000000000) Line 797 C++
- Using
DOTNET_SHARED_STORE/DOTNET_ADDITIONAL_DEPSand using only the assembly name in theDOTNET_STARTUP_HOOKS- Startup hooks also allow just the assembly name, but it's up to the default load context can locate it using probing and environment variables.
- This would still require the customer to specify an assembly name (e.g.
Microsoft.Diagnostics.Monitoring.StartupHook) that isn't really meaningful to them in theDOTNET_STARTUP_HOOKSenvironment variable. - These environment variables are read by the host and the paths are typically validated that they exist at that time as well. This prevents the tool from copying out the libraries during diagnostic runtime suspension and effictively requires the alternate solution above. These also come with additional complications in that they typically require the same assembly be available for every possible TFM that may load it; specific patch version folders may also be necessary depending on the
<app>.runtimeconfig.jsoncontent for the application.