Skip to content

[4/4] Documentation for DeepEval scorers#19409

Merged
smoorjani merged 47 commits intomlflow:masterfrom
smoorjani:stack/deepeval-docs
Dec 18, 2025
Merged

[4/4] Documentation for DeepEval scorers#19409
smoorjani merged 47 commits intomlflow:masterfrom
smoorjani:stack/deepeval-docs

Conversation

@smoorjani
Copy link
Collaborator

@smoorjani smoorjani commented Dec 15, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19409/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19409/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19409/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Introduces documentation for the DeepEval scorers

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
,
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
@github-actions github-actions bot added area/docs Documentation issues area/evaluation MLflow Evaluation rn/none List under Small Changes in Changelogs. labels Dec 15, 2025
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
@smoorjani smoorjani requested a review from B-Step62 December 16, 2025 22:38
"mlflow-test-plugin",
]
constraints = [{ name = "xgboost", specifier = "<3.1.0" }]
excludes = ["databricks-connect"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change included in this PR? Ditto fro the other databricks-connect changes in the uv.lock

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved!

.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
@smoorjani smoorjani requested a review from harupy December 17, 2025 04:18
Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one suggestion to handle multiple 3p integrations.

Comment on lines +447 to +451
{
type: 'doc',
id: 'eval-monitor/scorers/third-party',
label: 'Third-party Scorers',
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we will introduce more integrations in the same release, does it make sense to have tree structure?

Third-party Scorers
  |- Deepeval
  |- Ragas
  ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of using tabs - see #19451 - WDYT? I don't have a strong preference

},
{
type: 'category',
label: 'Supported Scorers',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not scope of this PR] @AveshCSingh @smoorjani I think it is a part of documentation organization, but please feel free to adjust the structure of judge documentation drastically. Admittedly as an original author the current structure is not clearly describe which one to use when and also draws incorrect boundary e.g., most of "Predefined Scorers" are actually "LLM-as-a-Judge", also there is overlap between Agentic judge and template-based (make_judge). My sense is we should eliminate all jargons and call everything either of "LLM-as-a-Judge" or "Code scorer"🙂

tensorflow-cpu<=2.12.0; platform_system!="Darwin" or platform_machine!="arm64"
tensorflow-macos<=2.12.0; platform_system=="Darwin" and platform_machine=="arm64"
pyspark
pyspark<4.1.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewer: had to change this to get CI to pass

pyproject.toml Outdated
"tensorflow",
"keras",
"pyspark",
"pyspark<4.1.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this in constraint-dependencies

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once #19409 (comment) is addressed

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
@smoorjani smoorjani enabled auto-merge December 18, 2025 06:27
@smoorjani smoorjani added this pull request to the merge queue Dec 18, 2025
Merged via the queue into mlflow:master with commit 454df87 Dec 18, 2025
46 checks passed
@smoorjani smoorjani deleted the stack/deepeval-docs branch December 18, 2025 06:39
WeichenXu123 pushed a commit to WeichenXu123/mlflow that referenced this pull request Dec 22, 2025
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
WeichenXu123 pushed a commit that referenced this pull request Dec 22, 2025
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs Documentation issues area/evaluation MLflow Evaluation rn/none List under Small Changes in Changelogs. v3.8.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants