Add per-decorator sampling_ratio_override parameter to @mlflow.trace#19784
Add per-decorator sampling_ratio_override parameter to @mlflow.trace#19784harupy merged 21 commits intomlflow:masterfrom
sampling_ratio_override parameter to @mlflow.trace#19784Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a sampling_ratio parameter to the @mlflow.trace decorator, enabling per-function control over trace sampling that overrides the global MLFLOW_TRACE_SAMPLING_RATIO environment variable. The feature allows different functions to have different sampling rates (e.g., 10% for high-volume endpoints, 100% for critical transactions).
Key changes:
- Added
sampling_ratioparameter to decorator and internal wrapping functions - Implemented
_should_sample()function to determine sampling decisions - Added context variable
_TRACE_SAMPLING_SKIPPEDto propagate sampling decisions to nested calls - Created manual test script with comprehensive test scenarios
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| test_sampling_ratio.py | Manual test script demonstrating the new sampling_ratio feature with various scenarios (marked for removal before merging) |
| mlflow/tracing/fluent.py | Core implementation: adds sampling_ratio parameter to @mlflow.trace decorator, implements sampling decision logic, and propagates sampling state to nested calls |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Documentation preview for df7908a is available at: Changed Pages (1)
More info
|
e2a7ea6 to
9360b4c
Compare
e7feefe to
828728f
Compare
mlflow/tracing/provider.py
Outdated
| _FORCE_SAMPLE = contextvars.ContextVar("force_sample", default=False) | ||
|
|
||
|
|
||
| class _OverridableSampler(Sampler): |
There was a problem hiding this comment.
This is hacky, but could not find any other approach to bypass the global sampling via MLFLOW_TRACE_SAMPLING_RATIO
There was a problem hiding this comment.
Does the user request supporting both global and endpoint-level sampling at the same time? Otherwise we can consider block using both together and simplify the logic?
There was a problem hiding this comment.
Good point. Let me ask the reporter
Add `sampling_ratio` parameter to the `@mlflow.trace` decorator to allow per-function control over trace sampling, overriding the global `MLFLOW_TRACE_SAMPLING_RATIO` setting. Key behaviors: - Decorator value overrides global env var - Child spans follow parent's sampling decision - `sampling_ratio=None` (default) defers to global sampler Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Move manual tests from test_sampling_ratio.py to the proper test location and convert them to pytest format following MLflow test conventions. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Document the new sampling_ratio parameter on @mlflow.trace decorator in the production tracing docs, including use cases and key behaviors. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
|
Tagging a few more folks to have more eyes |
|
/resolve address comments from @serena-ruan ✅ Workflow completed successfully. Claude OutputPerfect! I've successfully addressed all the PR review comments from @serena-ruan: Summary of Changes
The changes have been committed with DCO sign-off and all pre-commit hooks passed successfully. The commit is ready but has not been pushed as per the instructions. |
🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
The test_trace_decorator_sampling_ratio_overrides_global test runs a subprocess that imports full mlflow, which is not available when only the lightweight mlflow-tracing SDK is installed. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
mlflow/tracing/sampling.py
Outdated
| def get_description(self) -> str: | ||
| return f"OverridableSampler(ratio={self._ratio_sampler.rate})" |
There was a problem hiding this comment.
_OverridableSampler.get_description() reaches into the underlying TraceIdRatioBased instance via self._ratio_sampler.rate. This depends on a non-public attribute/property that may not exist across opentelemetry-sdk versions and can raise at runtime when OTel calls get_description(). Prefer storing the ratio in _OverridableSampler at init time or delegating to self._ratio_sampler.get_description() to avoid relying on internal fields.
serena-ruan
left a comment
There was a problem hiding this comment.
LGTM! We can discuss the documentation expression offline :)
There was a problem hiding this comment.
LGTM, #19784 (comment) is not a blocker, we can follow-up if they told us they don't need both.
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
- Fix bug where nested @mlflow.trace could re-sample when parent was not sampled by global sampler (NonRecordingSpan case) - Store ratio in _OverridableSampler to avoid accessing internal OTel attribute (_ratio_sampler.rate) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
sampling_ratio parameter to @mlflow.tracesampling_ratio_override parameter to @mlflow.trace
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
…io_override Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Replace the previous approach (_FORCE_SAMPLE, _skip_tracing, _should_skip_sample, random.random()) with a cleaner design where the _MlflowSampler reads a _SAMPLING_RATIO_OVERRIDE context var and delegates to TraceIdRatioBased. The decorator just sets the context var and lets OTel handle sampling decisions natively. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
5548004 to
c0a407a
Compare
…ted first Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
🛠 DevTools 🛠
Install mlflow from this PR
For Databricks, use the following command:
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Add a
sampling_ratio_overrideparameter to the@mlflow.tracedecorator, allowing per-function control over trace sampling that overrides the globalMLFLOW_TRACE_SAMPLING_RATIOsetting.How it works:
A custom OTel sampler (
_MlflowSampler) wrapsTraceIdRatioBasedand checks a_SAMPLING_RATIO_OVERRIDEcontext variable. The decorator sets this context var before span creation, and the sampler uses it instead of the default ratio. This keeps all sampling logic inside the OTel sampler and leverages OTel's native parent-child sampling inheritance.Key behaviors:
sampling_ratio_overrideoverrides the globalMLFLOW_TRACE_SAMPLING_RATIOfor that functionsampling_ratio_override=None(default) defers to the global samplerHow is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
Add
sampling_ratio_overrideparameter to@mlflow.tracedecorator for per-function sampling control.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityHow should the PR be classified in the release notes? Choose one:
rn/feature- A new user-facing feature worth mentioning in the release notesShould this PR be included in the next patch release?
🤖 Generated with Claude Code