Add telemetry for Lambda Managed Instances#13466
Conversation
| try: | ||
| invocation_result = self.version_manager.invoke(invocation=invocation) | ||
| function_config = self.version_manager.function_version.config | ||
| function_counter.labels( |
There was a problem hiding this comment.
Given the counter in the finally clause, this counter seems to produce double counting 🤔
There was a problem hiding this comment.
Great catch! I agree about your thoughts in the PR description - we should have a single point for the telemetry, instead of having it split up - this would have already prevented this double counting.
Test Results (amd64) - Integration, Bootstrap 5 files 5 suites 2h 32m 25s ⏱️ Results for commit 44105d8. ♻️ This comment has been updated with latest results. |
dfangl
left a comment
There was a problem hiding this comment.
LGTM in principal, but I believe there are some errors in how we construct the telemetry labels now, which we should fix before merging!
| try: | ||
| invocation_result = self.version_manager.invoke(invocation=invocation) | ||
| function_config = self.version_manager.function_version.config | ||
| function_counter.labels( |
There was a problem hiding this comment.
Great catch! I agree about your thoughts in the PR description - we should have a single point for the telemetry, instead of having it split up - this would have already prevented this double counting.
dfangl
left a comment
There was a problem hiding this comment.
Thanks for addressing those comments!
There was a problem hiding this comment.
praise: thank you for taking the time to thoroughly consider the metric design and for suggesting initialization_type instead of the flag. It significantly improves clarity 🚀
note: the schema bump shouldn't cause any major difficulties when querying the raw data because we've only added a new attribute. For historical data (before version v4.12.0), you will simply see empty values for the initialization_type attribute.
note: the lambda.function metric counter has now hit its soft limit of 6 labels.
note: I have already updated the metrics dictionary ✅
Motivation
We're currently missing runtime telemetry for Lambda Managed Instances (LMI) to track errors and usage of the feature beyond the control plane API.
Changes
initialization_type.This maps to an AWS-defined property and is more extensible than a simple boolean flaguses_capacity_provider.Limitations
provisioned-concurrencyis currently not taken into account because it is not easily accessible in the context (might require enriching the invocation result)lambda.Invoke. However, this would require some refactorings and API enrichment (e.g., passing ESM metadata throughlambda.Invokevia custom fields or headers)Tests
The following tests can be used to validate that the metrics work as expected:
tests.aws.services.lambda_.test_lambda_managed_instances.TestLambdaManagedInstances.test_lifecycleshould generate acreateandinvokeevent with the newinitialization_type=lambda-managed-instances.aws.services.lambda_.test_lambda.TestLambdaFeatures.test_invocation_type_eventshould generate acreateandinvokeevent with theinitialization_type=on-demandand theinvocation_type=eventfor each execution (node & python). Validate that theinvokeevent is not recorded twice.Related
Part of DRG-36