Skip to content

Handle CrewAI v1 in CI#18786

Merged
B-Step62 merged 6 commits intomlflow:masterfrom
B-Step62:fix-crewai-dev
Nov 14, 2025
Merged

Handle CrewAI v1 in CI#18786
B-Step62 merged 6 commits intomlflow:masterfrom
B-Step62:fix-crewai-dev

Conversation

@B-Step62
Copy link
Collaborator

@B-Step62 B-Step62 commented Nov 11, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18786/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18786/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18786/merge

What changes are proposed in this pull request?

Fix https://github.com/mlflow/dev/actions/runs/19266812224/job/55084667648

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
@B-Step62 B-Step62 added the enable-dev-tests Enables cross-version tests for dev versions label Nov 11, 2025
@github-actions github-actions bot added area/tracing MLflow Tracing and its integrations rn/none List under Small Changes in Changelogs. labels Nov 11, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 11, 2025

Documentation preview for db24e91 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
"json_dict": None,
"pydantic": None,
"raw": _LLM_ANSWER,
"tasks_output": [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crewAIInc/crewAI@6b52587 introduced new messages field 4 days ago. We can choose to wait for 1.4.2 and condition value here, but I think we don't need exact match here anyway.

assert len(traces) == 1
assert traces[0].info.status == "OK"
assert len(traces[0].data.spans) == 9
assert len(traces[0].data.spans) == 10 if _IS_CREWAI_V1 else 9
Copy link
Collaborator

@TomeHirata TomeHirata Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the additional span? Shall we add assertion for the new span?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's another LLM call. I don't think adding assertion has a value here.

Copy link
Collaborator

@TomeHirata TomeHirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, otherwise LGTM

_IS_CREWAI_V1 = Version(crewai.__version__) >= Version("1.0.0")


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use autouse=True?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture needs to be applied before other fixtures (e.g. agent). It seems autouse=True does not guarantee that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Have you tried changes like this? I have and all the tests passed

diff --git a/tests/crewai/test_crewai_autolog.py b/tests/crewai/test_crewai_autolog.py
index 7ca71e7388..09e33f8423 100644
--- a/tests/crewai/test_crewai_autolog.py
+++ b/tests/crewai/test_crewai_autolog.py
@@ -22,7 +22,7 @@ _LLM_ANSWER = "What about Tokyo?"
 _IS_CREWAI_V1 = Version(crewai.__version__).major >= 1
 
 
-@pytest.fixture
+@pytest.fixture(autouse=True)
 def set_api_key(monkeypatch):
     monkeypatch.setenv("OPENAI_API_KEY", "000")
 
@@ -130,7 +130,7 @@ _AGENT_1_BACKSTORY = "An expert in analyzing travel data to pick ideal destinati
 
 
 @pytest.fixture
-def simple_agent_1(set_api_key):
+def simple_agent_1():
     return Agent(
         role="City Selection Expert",
         goal=_AGENT_1_GOAL,
@@ -144,7 +144,7 @@ _AGENT_2_GOAL = "Provide the BEST insights about the selected city"
 
 
 @pytest.fixture
-def simple_agent_2(set_api_key):
+def simple_agent_2():
     return Agent(
         role="Local Expert at this city",
         goal=_AGENT_2_GOAL,
@@ -164,7 +164,7 @@ class SampleTool(BaseTool):
 
 
 @pytest.fixture
-def tool_agent_1(set_api_key):
+def tool_agent_1():
     return Agent(
         role="City Selection Expert",
         goal=_AGENT_1_GOAL,

Copy link
Member

@harupy harupy Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixture execution order:

% uv run --with 'crewai,litellm' pytest --setup-plan tests/crewai/test_crewai_autolog.py::test_kickoff_enable_disable_autolog 
...                                                                                                                                     

tests/crewai/test_crewai_autolog.py::test_kickoff_enable_disable_autolog[autolog] 
SETUP    S event_loop_policy
SETUP    S enable_mlflow_testing
SETUP    S enable_test_mode_by_default_for_autologging_integrations
SETUP    S tmp_path_factory
SETUP    S serve_wheel (fixtures used: tmp_path_factory)
    SETUP    M clean_up_envs
        SETUP    F clean_up_last_active_run
        SETUP    F clean_up_last_logged_model_id
        SETUP    F clean_up_leaked_runs
        SETUP    F clean_up_mlruns_directory
        SETUP    F clean_up_telemetry_threads
        SETUP    F prevent_infer_pip_requirements_fallback
        SETUP    F reset_active_experiment_id
        SETUP    F reset_active_model_context
        SETUP    F reset_mlflow_uri
        SETUP    F reset_tracing
        SETUP    F tmp_path (fixtures used: tmp_path_factory)
        SETUP    F tracking_uri_mock (fixtures used: tmp_path)
        SETUP    F validate_trace_finish
        SETUP    F monkeypatch
        SETUP    F set_api_key (fixtures used: monkeypatch)  👈
        SETUP    F simple_agent_1                            👈
        SETUP    F task_1 (fixtures used: simple_agent_1)    👈
        SETUP    F autolog[<function autolog at 0x14daab400>]
        tests/crewai/test_crewai_autolog.py::test_kickoff_enable_disable_autolog[autolog]

Copy link
Collaborator Author

@B-Step62 B-Step62 Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you're right. The order is not guaranteed between fixtures that are marked with autouse, but if all other fixtures are not marked as autouse, the one marked with autouse runs first.

@harupy
Copy link
Member

harupy commented Nov 12, 2025

/review


🚀 Review workflow started


✅ Review completed.

Review Output

Now let me review the changes against the Python style guide and check for issues. Let me analyze the diff:

Key changes in this PR:

  1. Update CrewAI maximum version from 0.203.1 to 1.4.1 in version configuration files
  2. Add compatibility handling for CrewAI v1 in tests
  3. Replace strict output assertions with more flexible assert span.outputs is not None
  4. Add LLM configuration to agents and API key setup via fixtures
  5. Remove a skipped test
  6. Remove redundant monkeypatch.setenv calls that are now in fixtures

Let me check the changes for style guide violations and potential issues:

Issues found:

  1. Line 30-38: The llm() function is missing a docstring. While the project instructions say "only add docstrings in tests when they provide additional context", this function has non-obvious logic (conditional LLM initialization based on version) that warrants explanation.

  2. Line 113: Using ANY for tasks_output might be too permissive in testing - it completely disables validation of this field, which could hide bugs.

  3. Line 649: The conditional assertion assert len(traces[0].data.spans) == 10 if _IS_CREWAI_V1 else 9 could be clearer using pattern matching or separate conditionals.

However, looking more carefully at the context and the nature of these changes:

  • The llm() function DOES have a comment explaining the non-obvious logic (lines 31-32)
  • The use of ANY for tasks_output is intentional because the structure differs between CrewAI versions, and the PR is focused on making tests pass with v1
  • The conditional assertion is clear enough given the context

Let me check if there are any actual bugs or style violations that need to be addressed:

After careful review of the changes against the Python style guide, I found:

No significant issues. The changes are appropriate for handling CrewAI v1 compatibility:

  1. The version bump is correct
  2. The llm() helper function properly handles version differences with a clear comment explaining why (lines 31-32)
  3. The fixture approach for API key setup is cleaner than repeating monkeypatch.setenv
  4. Relaxing output assertions to is not None is reasonable when output structure changes between versions
  5. Removing the skipped test is appropriate if the underlying issue was resolved

The code follows the project's Python style guide and implements a pragmatic solution for supporting both CrewAI v0.x and v1.x versions.

No issues found

🤖 Generated with Claude Code

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>


@pytest.fixture
def set_api_key(monkeypatch):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
@B-Step62 B-Step62 enabled auto-merge November 14, 2025 13:33
@B-Step62 B-Step62 added this pull request to the merge queue Nov 14, 2025
Merged via the queue into mlflow:master with commit 10a7c79 Nov 14, 2025
48 checks passed
@B-Step62 B-Step62 deleted the fix-crewai-dev branch November 14, 2025 14:03
jackiehimel pushed a commit to jackiehimel/mlflow that referenced this pull request Nov 21, 2025
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: Jackie Himel <jacqueline.himel@vanderbilt.edu>
mprahl pushed a commit to opendatahub-io/mlflow that referenced this pull request Nov 21, 2025
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Tian-Sky-Lan pushed a commit to Tian-Sky-Lan/mlflow that referenced this pull request Nov 24, 2025
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: Tian Lan <sky.blue266000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracing MLflow Tracing and its integrations enable-dev-tests Enables cross-version tests for dev versions rn/none List under Small Changes in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants