Skip to content

[ML-59816][[2/n] Add new fields to the genai_evaluation event telemetry#19040

Closed
xsh310 wants to merge 4 commits intomlflow:masterfrom
xsh310:stack/ML-59816-part-2
Closed

[ML-59816][[2/n] Add new fields to the genai_evaluation event telemetry#19040
xsh310 wants to merge 4 commits intomlflow:masterfrom
xsh310:stack/ML-59816-part-2

Conversation

@xsh310
Copy link
Collaborator

@xsh310 xsh310 commented Nov 25, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

#xxx

What changes are proposed in this pull request?

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@xsh310 xsh310 force-pushed the stack/ML-59816-part-2 branch from 59ba0d5 to 63a5a08 Compare November 25, 2025 23:58
@xsh310 xsh310 changed the title [ML-59816] Add eval_data_size and eval_data_type fields to genai_evaluate event logging [ML-59816][[2/n] Add new fields to the genai_evaluation event telemetry Nov 26, 2025
Signed-off-by: Xiang Shen <xshen.shc@gmail.com>
…_evaluate event logging

Signed-off-by: Xiang Shen <xshen.shc@gmail.com>
…uate event logging

Signed-off-by: Xiang Shen <xshen.shc@gmail.com>
…t logging

Signed-off-by: Xiang Shen <xshen.shc@gmail.com>
@xsh310 xsh310 force-pushed the stack/ML-59816-part-2 branch from 63a5a08 to 6f9390c Compare November 26, 2025 05:59
@xsh310 xsh310 requested a review from serena-ruan November 26, 2025 07:45
This function is not thread-safe. Please do not use it in multi-threaded
environments.
"""
is_managed_dataset = isinstance(data, (EvaluationDataset, EntityEvaluationDataset))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19018 (comment)

It's not efficient to convert data to df twice (line 246 and inside _run_harness), can you refactor?

@serena-ruan Do you think possible that I move all the logic in the evaluate function before the _run_harness into _run_harness?

I want to make sure that

data = data if is_managed_dataset else df

happens in _run_harness so that we can preserve the original input type at the event logging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move all the logic in the evaluate function before the _run_harness into _run_harness?

Yes you can do that, I don't think there's any difference

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serena-ruan I think one potential downside of moving the logic into _run_harness is that there might be a sudden increase in the genai_evaluate event volume since some of the invalidate that are early returned now logs the genai_evaluate event.

# Check if records are loaded to avoid triggering expensive load
if data.has_records():
df = data.to_df()
eval_data_size = len(df)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19018 (comment)

Apparently this is too heavy to compute the dataset size by converting it to dataframe. Reading the code, this data is converted to pandas dataframe when evaluating anyways, can we calculate the dataset size after that?

@serena-ruan, I'm not sure whether my understanding is correct but I don't think we will have access to the pd df since the dataset is converted in the middle of _run_harness and the event logging only have access to the input param of _run_harness?

I think one workaround is to add the df as another return type of _run_harness and use parse_result to add these fields with the df

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to use @record_usage_event on the function, you can use _record_event to add record with parameters directly :) The decorator is just a simplified approach for general purposes.

We can collect all needed parameters within _run_harness and avoid duplicate data size computation, then send records once. While _record_event doesn't handle function exception like record_usage_event does, feel free to update it with try finally if necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serena-ruan Thanks for the explanation! A quick question tho, since from the logging spec, the goal is to log the new field into genai_evaluate, if we add another _record_event call in the middle of _run_harness, will it create duplicate genai_evaluate event, or is there a way we can merge the params with the event created by @record_usage_event?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use either the decorator or _record_event but not both, otherwise there will be two events created :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants