.Net: Feature: Telemetry Instrumentation for Sequential Planner, Plan Execution, and SKFuntion Execution#2152
.Net: Feature: Telemetry Instrumentation for Sequential Planner, Plan Execution, and SKFuntion Execution#2152GalRabin wants to merge 23 commits intomicrosoft:mainfrom
Conversation
1. Sequential Planner. 2. Plan execution. 3. SKFuntion execution.
dmytrostruk
left a comment
There was a problem hiding this comment.
@GalRabin thank you for contribution!
Please check the comments below.
| /// <summary> | ||
| /// The plan name, Mainly used for telemetry purposes | ||
| /// </summary> | ||
| string? PlanName { get; } |
There was a problem hiding this comment.
I'm not sure this property is necessary. I've already added SequentialPlanner tag to all logs, which are related to this type of planner, please check following PR: #2156.
If consumers want to use custom name, there are a couple of ways how to achieve that (I'm pretty sure there are even more ways):
- Wrap planner functionality on consumers side and add necessary logs to specify custom planner name.
- Assign dedicated logger to planner with appropriate logger Category Name.
Maybe you can provide some use-cases where it's helpful, so I will be able to better understand the scenario. Thanks!
There was a problem hiding this comment.
In case this property is necessary for your scenarios, I would like to understand for what entity you want to set custom name. If it's a Planner, then I would rename this property to PlannerName.
If it's a Plan, then maybe it's better to add custom name as parameter of CreatePlanAsync method. Because as soon as you apply PlanName to Planner class, the same plan name will be used for all generated plans, even with different goals, so I'm not sure it makes sense. Again, your use-case would be very helpful here.
There was a problem hiding this comment.
@dmytrostruk
I commented on the PR and not here, It reference the all PR, What do you think about it ?
There was a problem hiding this comment.
Ignoring this comment, I removed PlanName usage.
|
|
||
| this._logger.LogInformation("Plan creation started."); | ||
| this.s_executionTotalCounter.Add(1); | ||
| this._logger.LogInformation("Planner (Plan: {PlanName}) creation started", this.PlanName); |
There was a problem hiding this comment.
Since PlanName is nullable, there is a chance when this message can be unfinished: Planner (Plan: ) creation started. I also added similar changes in this PR:
Maybe we can align it in some way, please let me know what you think.
Same applies to all other places where
PlanName property is used.
There was a problem hiding this comment.
Ignoring this comment, I removed PlanName usage. However, I adopted my changes to your PR, It's better that way.
|
|
||
| this.s_executionSuccessCounter.Add(1); | ||
| this.s_executionTimeHistogram.Record(stopwatch.ElapsedMilliseconds); | ||
| this._logger.LogInformation("Plan {Name} creation finished in {ExecutionTime}ms", this.PlanName, stopwatch.ElapsedMilliseconds); |
There was a problem hiding this comment.
If you used PlanName as property key in previous logs, it would be better to use the same key in all logs:
| this._logger.LogInformation("Plan {Name} creation finished in {ExecutionTime}ms", this.PlanName, stopwatch.ElapsedMilliseconds); | |
| this._logger.LogInformation("Plan {PlanName} creation finished in {ExecutionTime}ms", this.PlanName, stopwatch.ElapsedMilliseconds); |
There was a problem hiding this comment.
I removed the Plan Name usage, we will handle it outside. So, I am ignoring it.
| this._planner = planner; | ||
| this._logger = logger ?? NullLogger.Instance; | ||
|
|
||
| string telemetryPrefixName = this.PlanName ?? this.GetType().Name; |
There was a problem hiding this comment.
The format for metric name is SK.SequentialPlanner.{0}.ExecutionTime.
In case PlanName is null here, the value of telemetryPrefixName will be InstrumentedSequentialPlanner.
In this case metric name will be SK.SequentialPlanner.InstrumentedSequentialPlanner.ExecutionTime, which is not preferred naming.
Previous metric name SK.SequentialPlanner.CreatePlan.ExecutionTime should work better, as it covers the template SK.{ClassName}.{MethodName}.{MetricCategory}. This template should cover all places in the code, and same template format will allow to quickly navigate through big list of metrics.
There was a problem hiding this comment.
Done I changed it to the following pattern SK.SequentialPlanner.CreatePlan.{MetricCategory}
| private const string executionTimeMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTime"; | ||
| private const string executionTotalMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTotal"; | ||
| private const string executionCountFailureMetricFormat = "SK.SequentialPlanner.{0}.ExecutionFailure"; | ||
| private const string executionSuccessMetricFormat = "SK.SequentialPlanner.{0}.ExecutionSuccess"; |
There was a problem hiding this comment.
I think we use PascalCase for constant names:
| private const string executionTimeMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTime"; | |
| private const string executionTotalMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTotal"; | |
| private const string executionCountFailureMetricFormat = "SK.SequentialPlanner.{0}.ExecutionFailure"; | |
| private const string executionSuccessMetricFormat = "SK.SequentialPlanner.{0}.ExecutionSuccess"; | |
| private const string ExecutionTimeMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTime"; | |
| private const string ExecutionTotalMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTotal"; | |
| private const string ExecutionCountFailureMetricFormat = "SK.SequentialPlanner.{0}.ExecutionFailure"; | |
| private const string ExecutionSuccessMetricFormat = "SK.SequentialPlanner.{0}.ExecutionSuccess"; |
There was a problem hiding this comment.
Thanks, However I changed it to the static prop naming.
| string promptTemplate = prompt ?? EmbeddedResource.Read("skprompt.txt"); | ||
|
|
||
| this._functionFlowFunction = kernel.CreateSemanticFunction( | ||
| functionName: this.Config.PlanName?.Replace(" ", ""), |
There was a problem hiding this comment.
I think this whitespace replacement should be performed on consumer's side, since it will be hard to cover all cases to handle function name rules:
There was a problem hiding this comment.
I removed the plan name, So now it's not relevant anymore.
|
|
||
| #endregion ISKFunction implementation | ||
|
|
||
| #pragma warning disable CS8618 // Non-nullable field must contain a non-null value when exiting constructor. Consider declaring as nullable. |
There was a problem hiding this comment.
We should implement this code in a way which will avoid non-nullable warning suppressions.
There was a problem hiding this comment.
Done, by changing it back the meters to static properties.
| /// 1. Meters. | ||
| /// 2. Activity Source. | ||
| /// </summary> | ||
| private void ConfigureTelemetry() |
There was a problem hiding this comment.
As described above, I think all metering tools should be implemented as static and initialized in place of definition. Otherwise, this method forces us to call it every time when we define new constructor, which shouldn't be the case.
There was a problem hiding this comment.
Changed it back to be static, Without plan name. So, I removed this section.
| return stepVariables; | ||
| } | ||
|
|
||
| private async Task<SKContext> InstrumentedInvokeAsync( |
There was a problem hiding this comment.
The purpose of this method was to move all telemetry-related logic to separate place to avoid mixing telemetry and business-related logic. It looks like this improvement is reverted now. Is there any reason for that?
There was a problem hiding this comment.
This method works great when executing plan, however what telemetry I will have if I execute standalone function? we have such use case.
However, Now I also separated the BL from telemetry.
| using var activity = s_activitySource.StartActivity($"{this.SkillName}.{this.Name}"); | ||
| this.s_executionTotalCounter.Add(1); | ||
| context.Log.LogInformation("{SkillName}.{StepName}: Function execution started", this.SkillName, this.Name); | ||
| var stopwatch = new Stopwatch(); | ||
| stopwatch.Start(); |
There was a problem hiding this comment.
I think telemetry-related logic should be moved to separate place to keep InvokeAsync method clean and readable.
There was a problem hiding this comment.
Agree, I moved it both in Plan and SKFunction to different method called InvokeAsyncInstrumented as you did before.
|
I am going to address the use case that I feel that I missed. User Story:As a developer designing web services that encompass various plans, my goal is to view the metrics for each plan, function, planner integrated into the service. This will empower both the product owner and the developer to make informed decisions regarding potential enhancements and improvements to the web service. Example:Consider a scenario where we have three different planners in the same service:
Suppose these planners utilize the following skills and functions:
Moreover, each plan use all those functions in every inoke. We expect to obtain:
Concerning your comment about creating a meters on static variables, I need to know the plan's name to fulfill the requirements, which is why I am not using static properties. |
| this.Name = functionName; | ||
| this.SkillName = skillName; | ||
| this.Description = description; | ||
| this.ExecutionTimeHistogram = s_meter.CreateHistogram<double>( |
There was a problem hiding this comment.
@dmytrostruk Here I think it's important to include the skill and function, So I think we need to initialize it here, what do you think?
### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> Based on PR: #2152. Should fix: #2151 This PR contains changes to move telemetry logic from `Plan` class to separate class-decorator and add telemetry for `SKFunction` using decorator pattern. ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> 1. Added interface `IPlan` which inherits from `ISKFunction` to decouple `Plan` and `SKFunction` instrumentation and future decorations. 2. Implemented `InstrumentedPlan` with telemetry for `InvokeAsync` methods. 3. Implemented `InstrumentedSKFunction` with telemetry for `InvokeAsync` methods. 4. Added extension methods to create instrumented instances of plans and functions. 5. Added additional counters for Plan/Function to measure number of total, successful and failed Plan/Function executions. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄
Issue - #2151
FYI - @dmytrostruk (As we spoked)
Motivation and Context
This change is required to enhance the semantic-kernel repository by adding telemetry instrumentation for three critical components: Sequential Planner, Plan Execution, and SKFuntion Execution. The lack of telemetry tracking for these components hinders our ability to monitor their performance and identify potential issues effectively. This PR aims to address this limitation and contribute to a more robust and performant system.
Description
This PR introduces telemetry instrumentation for the Sequential Planner, Plan Execution, and SKFuntion Execution components within the semantic-kernel repository. The implementation follows best practices to ensure accurate data collection without compromising data privacy. The telemetry data will include essential metrics such as execution time, resource usage, errors, and usage frequency.
Contribution Checklist
Before submitting this PR, the following checklist has been completed:
Additional Notes
This PR will significantly contribute to the overall system performance, as it will provide valuable insights into the behavior of the Sequential Planner, Plan Execution, and SKFuntion Execution components. I kindly request the team's review and feedback to proceed with merging this feature into the main branch. Thank you for your attention to this improvement.