Support for conversational datasets with persona, goal, and context by SomtochiUmeh · Pull Request #19686 · mlflow/mlflow

SomtochiUmeh · 2025-12-29T23:29:10Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19686/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19686/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19686/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Adds persona, goal, and context fields for multiturn evaluation datasets. These can be nested directly inside inputs:

dataset.merge_records([
    {"inputs": {"persona": "Student", "goal": "Find articles"}},
])

Validations:

Custom fields must go inside context, not alongside multiturn fields

All records in a single merge_records() call must use the same schema (multiturn or regular)

New records must match the existing dataset's schema type

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

github-actions · 2025-12-29T23:37:40Z

Documentation preview for b91c2a8 is available at:

https://pr-19686--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

smoorjani

mostly looks good! just a few small stylistic nits. we should get approval from a MLflow maintainer for this part of the codebase. can you also do some quick manual tests to make sure everything works fine?

mlflow/entities/evaluation_dataset.py

smoorjani · 2025-12-30T23:07:28Z

mlflow/entities/evaluation_dataset.py

+            record_type = self._classify_input_fields(input_keys)
+
+            if record_type == DatasetSchemaType.UNKNOWN:
+                custom_fields = input_keys - MULTITURN_INPUT_FIELDS


can this be UNKNOWN for reasons other than multiturn? maybe we should make the error message more generic?

UNKNOWN happens with mixed schemas (both multiturn and custom) present or if there's nothing in the records. We continue to the next record if the record is empty:

if not input_keys: continue

So UNKNOWN here means mixed schema

smoorjani · 2025-12-30T23:08:53Z

mlflow/entities/evaluation_dataset.py

+
+            if batch_schema_type is None:
+                batch_schema_type = record_type
+            elif batch_schema_type != record_type:


It'd be good to compute the schema of each row and then do this comparison so the user can tell if there's a significant number of mismatches. e.g., All records must use the same schema type. Found N records for ... and M records for ....

smoorjani · 2025-12-30T23:09:17Z

mlflow/entities/evaluation_dataset.py

+        batch_schema_type = batch_schema_type or DatasetSchemaType.UNKNOWN
+        existing_schema_type = self._get_existing_schema_type()
+
+        if DatasetSchemaType.UNKNOWN in {batch_schema_type, existing_schema_type}:


shouldn't this thrown an error?

Nah, this shouldn't be an error state
UNKNOWN means the records are either empty or have mixed schema types.
At this point, the validation for mixed schema types has already been done earlier in _validate_schema:

if record_type == DatasetSchemaType.UNKNOWN: custom_fields = input_keys - MULTITURN_INPUT_FIELDS raise MlflowException.invalid_parameter_value(

So the only reason to still have UNKNOWN is that either the existing or new schema is empty

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

smoorjani

LGTM, but let's hold for approval from someone on the MLflow team as well.

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

smoorjani

left a few minor questions/comments - mostly LGTM. thanks for iterating!

mlflow/entities/evaluation_dataset.py

smoorjani · 2026-01-08T19:30:41Z

mlflow/entities/evaluation_dataset.py

+            if record_type == DatasetGranularity.UNKNOWN:
+                session_fields = input_keys & SESSION_IDENTIFIER_FIELDS
+                other_fields = input_keys - SESSION_INPUT_FIELDS
+                raise MlflowException.invalid_parameter_value(


nit: this case can also happen if inputs has no keys, so this error message may not make sense.

the loop skips if no input keys (lines 361-362):

if not input_keys: continue

so it shouldn't error

ah you're right - should we error in this case? it seems unintended to have a row without inputs

Filed a ticket: https://databricks.atlassian.net/browse/ML-61094
Will check with Ben/Yuki whether it's possible for regular datasets to have empty inputs

smoorjani · 2026-01-08T19:37:10Z

mlflow/entities/evaluation_dataset.py

+            return DatasetGranularity.UNKNOWN
+        try:
+            schema = json.loads(self._schema)
+            input_keys = set(schema.get("inputs", {}).keys())


I don't fully understand this part - how is it that the schema contains input keys? wouldn't you need to get the actual records?

I set the schema in lines 276, after getting an existing dataset:

try: existing_dataset = tracking_store.get_dataset(self.dataset_id) self._schema = existing_dataset.schema except Exception as e:

The schema will look like this, for example:

{"inputs": {"goal": "string", "context": "object", "persona": "string"}, "outputs": {}, "expectations": {"expected_output": "string", "quality": "string"}, "version": "1.0"}

So we can extract the input keys

smoorjani · 2026-01-08T19:38:31Z

tests/genai/datasets/test_fluent.py

    assert isinstance(dataset, WrapperEvaluationDataset)
    assert not isinstance(dataset, EntityEvaluationDataset)
    assert isinstance(dataset, (WrapperEvaluationDataset, EntityEvaluationDataset))
+


is this in the wrong file? should this in test_evaluation_dataset.py?

Most of the existing merge_records tests are in this file. The only merge_records test in test_evaluation_dataset.py only checks that the correct dataset instance is returned:

mlflow/tests/genai/datasets/test_evaluation_dataset.py

Line 165 in 7261486

def test_evaluation_dataset_merge_records(mock_managed_dataset):

smoorjani · 2026-01-08T19:39:09Z

mlflow/entities/evaluation_dataset.py

+        except (json.JSONDecodeError, TypeError):
+            return DatasetGranularity.UNKNOWN
+
+    def _classify_input_fields(self, input_keys: set[str]) -> DatasetGranularity:


I know this is private, but could we add some tests for this?

Good call; added

mlflow/entities/evaluation_dataset.py

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

smoorjani

LGTM! Thanks for iterating! left one minor comment

smoorjani · 2026-01-09T19:11:44Z

mlflow/entities/evaluation_dataset.py

+            if record_type == DatasetGranularity.UNKNOWN:
+                session_fields = input_keys & SESSION_IDENTIFIER_FIELDS
+                other_fields = input_keys - SESSION_INPUT_FIELDS
+                raise MlflowException.invalid_parameter_value(


ah you're right - should we error in this case? it seems unintended to have a row without inputs

…lflow#19686) Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

Support for conversational datasets with persona, goal, and context

3a11ff8

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

github-actions bot added area/evaluation MLflow Evaluation rn/feature Mention under Features in Changelogs. labels Dec 29, 2025

SomtochiUmeh requested review from B-Step62, BenWilson2 and smoorjani and removed request for B-Step62 and BenWilson2 December 30, 2025 01:05

parameterize tests

1f55d08

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

smoorjani requested changes Dec 30, 2025

View reviewed changes

github-actions bot assigned smoorjani Dec 31, 2025

addressing comments

7d02d26

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

SomtochiUmeh requested review from B-Step62, BenWilson2 and smoorjani January 2, 2026 20:50

smoorjani approved these changes Jan 2, 2026

View reviewed changes

SomtochiUmeh added 2 commits January 8, 2026 09:40

remove top-level session fields

26af6e4

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

update docstring and remove normalize method

157ea3d

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

smoorjani requested changes Jan 8, 2026

View reviewed changes

add tests for _classify_input_fields method

b91c2a8

Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

SomtochiUmeh requested a review from smoorjani January 9, 2026 18:21

smoorjani approved these changes Jan 9, 2026

View reviewed changes

SomtochiUmeh added this pull request to the merge queue Jan 9, 2026

Merged via the queue into master with commit 821f123 Jan 9, 2026
52 checks passed

SomtochiUmeh deleted the ML-59709 branch January 9, 2026 21:47

debu-sinha pushed a commit to debu-sinha/mlflow that referenced this pull request Jan 15, 2026

Support for conversational datasets with persona, goal, and context (m…

88ebfc8

…lflow#19686) Signed-off-by: SomtochiUmeh <somtochiumeh@gmail.com>

This was referenced Jan 16, 2026

[UI] Prevent adding traces to multiturn datasets #20071

Merged

Prevent empty inputs for conversational datasets #20108

Merged

Conversation

SomtochiUmeh commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smoorjani left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SomtochiUmeh Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smoorjani left a comment

Choose a reason for hiding this comment

Uh oh!

smoorjani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SomtochiUmeh Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SomtochiUmeh Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smoorjani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

SomtochiUmeh commented Dec 29, 2025 •

edited

Loading

github-actions bot commented Dec 29, 2025 •

edited

Loading

smoorjani left a comment •

edited

Loading

SomtochiUmeh Jan 2, 2026 •

edited

Loading

SomtochiUmeh Jan 8, 2026 •

edited

Loading

SomtochiUmeh Jan 8, 2026 •

edited

Loading