Skip to content

.Net: Feature: Telemetry Instrumentation for Sequential Planner, Plan Execution, and SKFuntion Execution#2152

Closed
GalRabin wants to merge 23 commits intomicrosoft:mainfrom
GalRabin:issue-2151
Closed

.Net: Feature: Telemetry Instrumentation for Sequential Planner, Plan Execution, and SKFuntion Execution#2152
GalRabin wants to merge 23 commits intomicrosoft:mainfrom
GalRabin:issue-2151

Conversation

@GalRabin
Copy link

@GalRabin GalRabin commented Jul 24, 2023

Issue - #2151
FYI - @dmytrostruk (As we spoked)

Motivation and Context

This change is required to enhance the semantic-kernel repository by adding telemetry instrumentation for three critical components: Sequential Planner, Plan Execution, and SKFuntion Execution. The lack of telemetry tracking for these components hinders our ability to monitor their performance and identify potential issues effectively. This PR aims to address this limitation and contribute to a more robust and performant system.

Description

This PR introduces telemetry instrumentation for the Sequential Planner, Plan Execution, and SKFuntion Execution components within the semantic-kernel repository. The implementation follows best practices to ensure accurate data collection without compromising data privacy. The telemetry data will include essential metrics such as execution time, resource usage, errors, and usage frequency.

Contribution Checklist

Before submitting this PR, the following checklist has been completed:

  • The code builds clean without any errors or warnings.
  • The PR adheres to the SK Contribution Guidelines and passes the pre-submission formatting script without violations.
  • All unit tests pass, and new tests have been added where applicable.
  • Careful consideration has been given to prevent breaking existing functionality.

Additional Notes

This PR will significantly contribute to the overall system performance, as it will provide valuable insights into the behavior of the Sequential Planner, Plan Execution, and SKFuntion Execution components. I kindly request the team's review and feedback to proceed with merging this feature into the main branch. Thank you for your attention to this improvement.

1. Sequential Planner.
2. Plan execution.
3. SKFuntion execution.
@GalRabin GalRabin requested a review from a team as a code owner July 24, 2023 11:56
@shawncal shawncal added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel kernel.core labels Jul 24, 2023
@shawncal shawncal changed the title Feature: Telemetry Instrumentation for Sequential Planner, Plan Execution, and SKFuntion Execution .Net: Feature: Telemetry Instrumentation for Sequential Planner, Plan Execution, and SKFuntion Execution Jul 24, 2023
@dmytrostruk dmytrostruk self-assigned this Jul 24, 2023
Copy link
Member

@dmytrostruk dmytrostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GalRabin thank you for contribution!
Please check the comments below.

/// <summary>
/// The plan name, Mainly used for telemetry purposes
/// </summary>
string? PlanName { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this property is necessary. I've already added SequentialPlanner tag to all logs, which are related to this type of planner, please check following PR: #2156.

If consumers want to use custom name, there are a couple of ways how to achieve that (I'm pretty sure there are even more ways):

  • Wrap planner functionality on consumers side and add necessary logs to specify custom planner name.
  • Assign dedicated logger to planner with appropriate logger Category Name.

Maybe you can provide some use-cases where it's helpful, so I will be able to better understand the scenario. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this property is necessary for your scenarios, I would like to understand for what entity you want to set custom name. If it's a Planner, then I would rename this property to PlannerName.
If it's a Plan, then maybe it's better to add custom name as parameter of CreatePlanAsync method. Because as soon as you apply PlanName to Planner class, the same plan name will be used for all generated plans, even with different goals, so I'm not sure it makes sense. Again, your use-case would be very helpful here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmytrostruk
I commented on the PR and not here, It reference the all PR, What do you think about it ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring this comment, I removed PlanName usage.


this._logger.LogInformation("Plan creation started.");
this.s_executionTotalCounter.Add(1);
this._logger.LogInformation("Planner (Plan: {PlanName}) creation started", this.PlanName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PlanName is nullable, there is a chance when this message can be unfinished: Planner (Plan: ) creation started. I also added similar changes in this PR:

this._logger.LogInformation("{PlannerType}: Plan creation started.", PlannerType);

Maybe we can align it in some way, please let me know what you think.
Same applies to all other places where PlanName property is used.

Copy link
Author

@GalRabin GalRabin Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring this comment, I removed PlanName usage. However, I adopted my changes to your PR, It's better that way.


this.s_executionSuccessCounter.Add(1);
this.s_executionTimeHistogram.Record(stopwatch.ElapsedMilliseconds);
this._logger.LogInformation("Plan {Name} creation finished in {ExecutionTime}ms", this.PlanName, stopwatch.ElapsedMilliseconds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used PlanName as property key in previous logs, it would be better to use the same key in all logs:

Suggested change
this._logger.LogInformation("Plan {Name} creation finished in {ExecutionTime}ms", this.PlanName, stopwatch.ElapsedMilliseconds);
this._logger.LogInformation("Plan {PlanName} creation finished in {ExecutionTime}ms", this.PlanName, stopwatch.ElapsedMilliseconds);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the Plan Name usage, we will handle it outside. So, I am ignoring it.

this._planner = planner;
this._logger = logger ?? NullLogger.Instance;

string telemetryPrefixName = this.PlanName ?? this.GetType().Name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format for metric name is SK.SequentialPlanner.{0}.ExecutionTime.
In case PlanName is null here, the value of telemetryPrefixName will be InstrumentedSequentialPlanner.
In this case metric name will be SK.SequentialPlanner.InstrumentedSequentialPlanner.ExecutionTime, which is not preferred naming.

Previous metric name SK.SequentialPlanner.CreatePlan.ExecutionTime should work better, as it covers the template SK.{ClassName}.{MethodName}.{MetricCategory}. This template should cover all places in the code, and same template format will allow to quickly navigate through big list of metrics.

Copy link
Author

@GalRabin GalRabin Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done I changed it to the following pattern SK.SequentialPlanner.CreatePlan.{MetricCategory}

Comment on lines +98 to +101
private const string executionTimeMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTime";
private const string executionTotalMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTotal";
private const string executionCountFailureMetricFormat = "SK.SequentialPlanner.{0}.ExecutionFailure";
private const string executionSuccessMetricFormat = "SK.SequentialPlanner.{0}.ExecutionSuccess";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use PascalCase for constant names:

Suggested change
private const string executionTimeMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTime";
private const string executionTotalMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTotal";
private const string executionCountFailureMetricFormat = "SK.SequentialPlanner.{0}.ExecutionFailure";
private const string executionSuccessMetricFormat = "SK.SequentialPlanner.{0}.ExecutionSuccess";
private const string ExecutionTimeMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTime";
private const string ExecutionTotalMetricFormat = "SK.SequentialPlanner.{0}.ExecutionTotal";
private const string ExecutionCountFailureMetricFormat = "SK.SequentialPlanner.{0}.ExecutionFailure";
private const string ExecutionSuccessMetricFormat = "SK.SequentialPlanner.{0}.ExecutionSuccess";

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, However I changed it to the static prop naming.

string promptTemplate = prompt ?? EmbeddedResource.Read("skprompt.txt");

this._functionFlowFunction = kernel.CreateSemanticFunction(
functionName: this.Config.PlanName?.Replace(" ", ""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whitespace replacement should be performed on consumer's side, since it will be hard to cover all cases to handle function name rules:

private static readonly Regex s_asciiLettersDigitsUnderscoresRegex = new("^[0-9A-Za-z_]*$");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the plan name, So now it's not relevant anymore.


#endregion ISKFunction implementation

#pragma warning disable CS8618 // Non-nullable field must contain a non-null value when exiting constructor. Consider declaring as nullable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should implement this code in a way which will avoid non-nullable warning suppressions.

Copy link
Author

@GalRabin GalRabin Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, by changing it back the meters to static properties.

/// 1. Meters.
/// 2. Activity Source.
/// </summary>
private void ConfigureTelemetry()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described above, I think all metering tools should be implemented as static and initialized in place of definition. Otherwise, this method forces us to call it every time when we define new constructor, which shouldn't be the case.

Copy link
Author

@GalRabin GalRabin Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back to be static, Without plan name. So, I removed this section.

return stepVariables;
}

private async Task<SKContext> InstrumentedInvokeAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this method was to move all telemetry-related logic to separate place to avoid mixing telemetry and business-related logic. It looks like this improvement is reverted now. Is there any reason for that?

Copy link
Author

@GalRabin GalRabin Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method works great when executing plan, however what telemetry I will have if I execute standalone function? we have such use case.
However, Now I also separated the BL from telemetry.

Comment on lines +231 to +235
using var activity = s_activitySource.StartActivity($"{this.SkillName}.{this.Name}");
this.s_executionTotalCounter.Add(1);
context.Log.LogInformation("{SkillName}.{StepName}: Function execution started", this.SkillName, this.Name);
var stopwatch = new Stopwatch();
stopwatch.Start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think telemetry-related logic should be moved to separate place to keep InvokeAsync method clean and readable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I moved it both in Plan and SKFunction to different method called InvokeAsyncInstrumented as you did before.

@dmytrostruk dmytrostruk added the PR: feedback to address Waiting for PR owner to address comments/questions label Jul 24, 2023
@GalRabin
Copy link
Author

@dmytrostruk

I am going to address the use case that I feel that I missed.

User Story:

As a developer designing web services that encompass various plans, my goal is to view the metrics for each plan, function, planner integrated into the service. This will empower both the product owner and the developer to make informed decisions regarding potential enhancements and improvements to the web service.

Example:

Consider a scenario where we have three different planners in the same service:

  1. Planner 1 - Generate a poem and translate it into French.
  2. Planner 2 - Generate a poem and translate it into German.
  3. Planner 3 - Generate a poem and translate it into Hebrew.

Suppose these planners utilize the following skills and functions:

  1. SummarizeSkill:
    1. Summarize
  2. WriterSkill:
    1. ShortPoem
    2. Translate

Moreover, each plan use all those functions in every inoke.

We expect to obtain:

  1. Metrics:

    1. Function:
    SK.Function.<skill_name>.<function_name>.ExecutionTotal <long>
    SK.Function.<skill_name>.<function_name>.ExecutionSuccess <long>
    SK.Function.<skill_name>.<function_name>.ExecutionFailure <long>
    SK.Function.<skill_name>.<function_name>.ExecutionTime <double>
    
    1. Planner:
    SK.SequentialPlanner.<PlanX>.ExecutionTotal <long>
    SK.SequentialPlanner.<PlanX>.ExecutionSuccess <long>
    SK.SequentialPlanner.<PlanX>.ExecutionFailure <long>
    SK.SequentialPlanner.<PlanX>.ExecutionTime <double>
    

    Where X=1,2,3

    1. Plan:
    SK.Plan.<PlanX>.ExecutionTotal <long>
    SK.Plan.<PlanX>.ExecutionSuccess <long>
    SK.Plan.<PlanX>.ExecutionFailure <long>
    SK.Plan.<PlanX>.ExecutionTime <double>
    

    Where X=1,2,3

  2. Logs:

    1. Function:
      1. Start (Include the skill and function name)
      2. Fail (Include the skill and function name)
      3. Success (Include the skill and function name)
      4. Time (Include the skill and function name)
    2. Planner:
      1. Start (Include the plan name and planner type)
      2. Fail (Include the plan name and planner type)
      3. Success (Include the plan name and planner type)
      4. Time (Include the plan name and planner type)
    3. Plan:
      1. Start (Include the plan name)
      2. Fail (Include the plan name)
      3. Success (Include the plan name)
      4. Time (Include the plan name)

Concerning your comment about creating a meters on static variables, I need to know the plan's name to fulfill the requirements, which is why I am not using static properties.

this.Name = functionName;
this.SkillName = skillName;
this.Description = description;
this.ExecutionTimeHistogram = s_meter.CreateHistogram<double>(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmytrostruk Here I think it's important to include the skill and function, So I think we need to initialize it here, what do you think?

@GalRabin GalRabin requested a review from dmytrostruk July 24, 2023 19:56
@dmytrostruk
Copy link
Member

As per discussion with @GalRabin , closing this PR in order to proceed with another one, which solves requested requirements: #2180

github-merge-queue bot pushed a commit that referenced this pull request Jul 26, 2023
### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

Based on PR: #2152.
Should fix: #2151

This PR contains changes to move telemetry logic from `Plan` class to
separate class-decorator and add telemetry for `SKFunction` using
decorator pattern.

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

1. Added interface `IPlan` which inherits from `ISKFunction` to decouple
`Plan` and `SKFunction` instrumentation and future decorations.
2. Implemented `InstrumentedPlan` with telemetry for `InvokeAsync`
methods.
3. Implemented `InstrumentedSKFunction` with telemetry for `InvokeAsync`
methods.
4. Added extension methods to create instrumented instances of plans and
functions.
5. Added additional counters for Plan/Function to measure number of
total, successful and failed Plan/Function executions.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code PR: feedback to address Waiting for PR owner to address comments/questions

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants