Skip to content

Add per-decorator sampling_ratio_override parameter to @mlflow.trace#19784

Merged
harupy merged 21 commits intomlflow:masterfrom
harupy:per-trace-sampling
Feb 7, 2026
Merged

Add per-decorator sampling_ratio_override parameter to @mlflow.trace#19784
harupy merged 21 commits intomlflow:masterfrom
harupy:per-trace-sampling

Conversation

@harupy
Copy link
Member

@harupy harupy commented Jan 7, 2026

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19784/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19784/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19784/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add a sampling_ratio_override parameter to the @mlflow.trace decorator, allowing per-function control over trace sampling that overrides the global MLFLOW_TRACE_SAMPLING_RATIO setting.

@mlflow.trace(sampling_ratio_override=0.1)  # 10% sampling for high-volume
def high_volume_endpoint():
    ...

@mlflow.trace(sampling_ratio_override=1.0)  # 100% sampling for critical
def critical_transaction():
    ...

How it works:

A custom OTel sampler (_MlflowSampler) wraps TraceIdRatioBased and checks a _SAMPLING_RATIO_OVERRIDE context variable. The decorator sets this context var before span creation, and the sampler uses it instead of the default ratio. This keeps all sampling logic inside the OTel sampler and leverages OTel's native parent-child sampling inheritance.

Key behaviors:

  • sampling_ratio_override overrides the global MLFLOW_TRACE_SAMPLING_RATIO for that function
  • sampling_ratio_override=None (default) defers to the global sampler
  • Must be a float between 0.0 and 1.0

How is this PR tested?

  • Unit tests covering validation, sampling at 0%/50%/100%, nested functions, global override (subprocess), generators, and async functions

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Add sampling_ratio_override parameter to @mlflow.trace decorator for per-function sampling control.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality

How should the PR be classified in the release notes? Choose one:

  • rn/feature - A new user-facing feature worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

🤖 Generated with Claude Code

@harupy harupy marked this pull request as ready for review January 7, 2026 03:34
Copilot AI review requested due to automatic review settings January 7, 2026 03:34
@github-actions github-actions bot added area/tracing MLflow Tracing and its integrations rn/feature Mention under Features in Changelogs. labels Jan 7, 2026
@harupy harupy marked this pull request as draft January 7, 2026 03:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a sampling_ratio parameter to the @mlflow.trace decorator, enabling per-function control over trace sampling that overrides the global MLFLOW_TRACE_SAMPLING_RATIO environment variable. The feature allows different functions to have different sampling rates (e.g., 10% for high-volume endpoints, 100% for critical transactions).

Key changes:

  • Added sampling_ratio parameter to decorator and internal wrapping functions
  • Implemented _should_sample() function to determine sampling decisions
  • Added context variable _TRACE_SAMPLING_SKIPPED to propagate sampling decisions to nested calls
  • Created manual test script with comprehensive test scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
test_sampling_ratio.py Manual test script demonstrating the new sampling_ratio feature with various scenarios (marked for removal before merging)
mlflow/tracing/fluent.py Core implementation: adds sampling_ratio parameter to @mlflow.trace decorator, implements sampling decision logic, and propagates sampling state to nested calls

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Documentation preview for df7908a is available at:

Changed Pages (1)

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@harupy harupy removed the v3.9.0 label Jan 15, 2026
@harupy harupy marked this pull request as ready for review January 15, 2026 06:16
@harupy harupy force-pushed the per-trace-sampling branch from e2a7ea6 to 9360b4c Compare January 15, 2026 06:17
@harupy harupy force-pushed the per-trace-sampling branch 2 times, most recently from e7feefe to 828728f Compare January 27, 2026 11:32
@harupy harupy requested a review from B-Step62 January 27, 2026 12:00
_FORCE_SAMPLE = contextvars.ContextVar("force_sample", default=False)


class _OverridableSampler(Sampler):
Copy link
Member Author

@harupy harupy Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hacky, but could not find any other approach to bypass the global sampling via MLFLOW_TRACE_SAMPLING_RATIO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user request supporting both global and endpoint-level sampling at the same time? Otherwise we can consider block using both together and simplify the logic?

Copy link
Member Author

@harupy harupy Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me ask the reporter

harupy and others added 2 commits February 2, 2026 16:43
Add `sampling_ratio` parameter to the `@mlflow.trace` decorator to allow
per-function control over trace sampling, overriding the global
`MLFLOW_TRACE_SAMPLING_RATIO` setting.

Key behaviors:
- Decorator value overrides global env var
- Child spans follow parent's sampling decision
- `sampling_ratio=None` (default) defers to global sampler

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy added the team-review Trigger a team review request label Feb 2, 2026
harupy and others added 2 commits February 2, 2026 16:45
Move manual tests from test_sampling_ratio.py to the proper test location
and convert them to pytest format following MLflow test conventions.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Document the new sampling_ratio parameter on @mlflow.trace decorator
in the production tracing docs, including use cases and key behaviors.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy
Copy link
Member Author

harupy commented Feb 2, 2026

Tagging a few more folks to have more eyes

@harupy
Copy link
Member Author

harupy commented Feb 2, 2026

/resolve address comments from @serena-ruan


🚀 Resolve workflow started


✅ Workflow completed successfully.

Claude Output

Perfect! I've successfully addressed all the PR review comments from @serena-ruan:

Summary of Changes

  1. Renamed _force_sampler to _force_sample (line 82)

    • Updated the function name for better consistency
    • Updated all 4 call sites throughout the file
  2. Renamed _should_sample to _should_skip_sample and inverted the logic (line 260)

    • Changed the function name to better align with _TRACE_SAMPLING_SKIPPED and _skip_tracing()
    • Inverted all boolean logic for consistency:
      • return Falsereturn True (when skipping)
      • return Truereturn False (when not skipping)
      • random.random() < sampling_ratiorandom.random() >= sampling_ratio
    • Updated all 4 call sites to remove the not operator

The changes have been committed with DCO sign-off and all pre-commit hooks passed successfully. The commit is ready but has not been pushed as per the instructions.

mlflow-app bot and others added 3 commits February 2, 2026 09:50
🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
The test_trace_decorator_sampling_ratio_overrides_global test runs a
subprocess that imports full mlflow, which is not available when only
the lightweight mlflow-tracing SDK is installed.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment on lines +49 to +50
def get_description(self) -> str:
return f"OverridableSampler(ratio={self._ratio_sampler.rate})"
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_OverridableSampler.get_description() reaches into the underlying TraceIdRatioBased instance via self._ratio_sampler.rate. This depends on a non-public attribute/property that may not exist across opentelemetry-sdk versions and can raise at runtime when OTel calls get_description(). Prefer storing the ratio in _OverridableSampler at init time or delegating to self._ratio_sampler.get_description() to avoid relying on internal fields.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@serena-ruan serena-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We can discuss the documentation expression offline :)

Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, #19784 (comment) is not a blocker, we can follow-up if they told us they don't need both.

harupy and others added 5 commits February 4, 2026 19:24
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
- Fix bug where nested @mlflow.trace could re-sample when parent was
  not sampled by global sampler (NonRecordingSpan case)
- Store ratio in _OverridableSampler to avoid accessing internal OTel
  attribute (_ratio_sampler.rate)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

@harupy harupy changed the title Add per-decorator sampling_ratio parameter to @mlflow.trace Add per-decorator sampling_ratio_override parameter to @mlflow.trace Feb 6, 2026
harupy and others added 3 commits February 6, 2026 19:40
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
…io_override

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Replace the previous approach (_FORCE_SAMPLE, _skip_tracing,
_should_skip_sample, random.random()) with a cleaner design where
the _MlflowSampler reads a _SAMPLING_RATIO_OVERRIDE context var
and delegates to TraceIdRatioBased. The decorator just sets the
context var and lets OTel handle sampling decisions natively.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy force-pushed the per-trace-sampling branch from 5548004 to c0a407a Compare February 6, 2026 13:32
harupy and others added 2 commits February 6, 2026 23:15
…ted first

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy added this pull request to the merge queue Feb 7, 2026
Merged via the queue into mlflow:master with commit b3a26e3 Feb 7, 2026
55 checks passed
@harupy harupy deleted the per-trace-sampling branch February 7, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracing MLflow Tracing and its integrations rn/feature Mention under Features in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants