Skip to content

Releases: mlflow/mlflow

v3.10.1

05 Mar 14:47
cadc323

Choose a tag to compare

MLflow 3.10.1 is a patch release that contains some minor feature enhancements, bug fixes, and documentation updates.

Features:

Bug fixes:

  • [UI] Fix "View full dashboard" link in gateway usage tab when workspace is enabled (#21191, @copilot-swe-agent)
  • [UI] Persist AI Gateway default passphrase security banner dismissal to localStorage (#21292, @copilot-swe-agent)
  • [Evaluation] Demote unused parameters log message from WARNING to DEBUG in instructions judge (#21294, @copilot-swe-agent)
  • [UI] Clear "All" time selector when switching to overview tab (#21371, @daniellok-db)
  • [Prompts / UI] Fix Traces view in Prompts tab not being scrollable (#21282, @TomeHirata)
  • [UI] Fix judge builder instruction textarea (#21299, @daniellok-db)
  • [UI] Fix group mode to aggregate "Additional runs" as "Unassigned" group in charts (#21155, @copilot-swe-agent)
  • [UI] Fix artifact download when workspaces are enabled (#21074, @timsolovev)
  • [Tracing] Fix NOT NULL constraint on assessments.trace_id during trace export (#21348, @dbczumar)
  • [Tracking] Fix 403 Forbidden for artifact list via query param when default_permission=NO_PERMISSIONS (#21220, @copilot-swe-agent)
  • [UI] [ML-63097] Fix broken LLM judge documentation links (#21347, @smoorjani)
  • [Tracing] Fix Run Judge failed with litellm.InternalServerError: Invalid response object. (#21262, @PattaraS)
  • [Tracing / UI] Update Action menu: indentation to avoid confusion (#21266, @PattaraS)
  • [Model Registry] Fix MlflowClient.copy_model_version for the case that copy UC model across workspaces (#21212, @WeichenXu123)
  • [UI] Fix empty description box rendering for sanitized-empty experiment descriptions (#21223, @copilot-swe-agent)
  • [Artifacts] Fix single artifact downloading through HttpArtifactRepository (#12955, @Koenkk)
  • [Tracing] Fix find_last_user_message_index skipping skill content injections (#21119, @alkispoly-db)
  • [Tracing] Fix retrieval context extraction when span outputs are stored as strings (#21213, @smoorjani)
  • [UI] Fix visibility toggle button in chart tooltip not working (#21071, @daniellok-db)
  • [UI] Move gateway experiment filtering to server-side query to fix inconsistent page sizes (#21138, @copilot-swe-agent)
  • [Gateway] Downgrade spurious warning to debug log for gateway endpoints with fallback_config but no FALLBACK models (#21123, @copilot-swe-agent)
  • [Tracing] Fix MCP fn_wrapper to pass None for optional params with UNSET defaults (#21051, @yangbaechu)
  • [Tracking] Add CASCADE to logged_model tables experiment_id foreign keys (#20185, @harupy)
  • [Tracing] Fix MCP fn_wrapper handling of Click UNSET defaults (#20953) (#20962, @yangbaechu)

Documentation updates:

  • [Docs] Update SSO oidc plugin doc: add google identity platform / AWS cognito / Azure Entra ID configuration guide (#20591, @WeichenXu123)
  • [Docs / Tracing] Fix distributed tracing rendering and improve doc (#21070, @B-Step62)
  • [Docs] docs: Add single quotes to install commands with extras to prevent zsh errors (#21227, @mshavliuk)
  • [Docs / Model Registry] Fix outdated docstring claiming models:/ URIs are unsupported in register_model (#21197, @copilot-swe-agent)
  • [Docs] Replace MinIO with RustFS in docker-compose setup (#21099, @jmaggesi)

Small bug fixes and documentation updates:

#20740, #21148, #21149, #21096, @TomeHirata; #21368, #21118, @B-Step62; #21384, #21345, #21236, #21106, #21033, #21115, #21034, @smoorjani; #21326, #21133, #21036, @copilot-swe-agent; #21293, @daniellok-db; #21175, @caponetto; #21305, #21264, @serena-ruan; #21216, @justinwei-db; #21038, #21082, @bbqiu; #21143, #20733, @mprahl; #20488, @mdalvz0000; #21142, @EPgg92; #21094, @PattaraS

v3.10.0

20 Feb 16:05
d0b9741

Choose a tag to compare

We're excited to announce MLflow 3.10.0, which includes several notable updates:

Major New Features:

🏢 Organization Support in MLflow Tracking Server: MLflow now supports multi-workspace environments. Users can organize experiments, models, prompts, with a coarser level of unit and logically isolate them in a single tracking server. (#20702, #20657, @mprahl, @Gkrumbach07, @B-Step62)

💬 Multi-turn Evaluation & Conversation Simulation: MLflow now supports multi-turn evaluation, including evaluating existing conversations with session-level scorers and simulating conversations to test new versions of your agent, without the toil of regenerating conversations. Use the session-level scorers introduced in MLflow 3.8.0 and the brand new session UIs to evaluate the quality of your conversational agents and enable automatic scoring to monitor quality as traces are ingested. (#20243, #20377, #20289, @smoorjani)

💰 Trace Cost Tracking: Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. (#20327, #20330, @serena-ruan)

🎯 Navigation bar redesign: We've redesigned the navigation to provide a frictionless experience. A new workflow type selector in the top-level navbar lets you quickly switch between GenAI and Classical ML contexts, with streamlined sidebars that reduce visual clutter. (#20158, #20160, #20161, #20699, @ispoljari, @daniellok-db)

🎮 MLflow Demo Experiment: New to MLflow GenAI? With one click, launch a pre-populated demo and explore tracing, evaluation, and prompt management in action. No configuration, no code required. (#19994, #19995, #20046, #20047, #20048, #20162, @BenWilson2)

📊 Gateway Usage Tracking: Monitor your AI Gateway endpoints with detailed usage analytics. A new usage page shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end observability. (#20357, #20358, #20642, @TomeHirata)

In-UI Trace Evaluation: Users can now run custom or pre-built LLM judges directly from the traces and sessions UI. This enables quick evaluation of individual traces and individual without context switching to the python SDK. (#20360, @hubertzub-db, @danielseong1)

Features:

Bug fixes:

  • [Tracing / UI] Fix infinite fetch loop in trace detail view when num_spans metadata mismatches (#20596, @coldzero94)
  • [UI] fix:implement dark mode in experiment correctly (#20974, @intelliking)
  • [Evaluation] Fix 'Select traces' do not show new traces in Judge UI (#20991, @PattaraS)
  • [Tracing / Tracking] Fix RecursionError in strands, semantic_kernel, and haystack autologgers with shared tracer provider (#20809, @cgrierson-smartsheet)
  • [Tracking] fix(tracking): Fix IntegrityError in log_batch when duplicate metrics span multiple key batches (#20807, @aws-khatria)
  • [Tracing] Support native tool calls in CrewAI 1.9.0+ autolog tests (#20742, @TomeHirata)
  • [Evaluation] Fix retrieval_relevance assessments logged to wrong span with missing chunk index (#20998, @smoorjani)
  • [Evaluation] Fix missing session metadata on failed session-level scorer assessments (#20988, @smoorjani)
  • [Tracking] Enhance path validation in check_tarfile_security for windows (#20924, @TomeHirata)
  • [Docs] Fix admonition link underlines not rendering (#20990, @copilot-swe-agent)
  • [Tracking] Rebuild SearchTraces V2 request body on ENDPOINT_NOT_FOUND fallback (#20963, @brendanmaguire)
  • [Build] Add model version search filtering based on user permissions (#20964, @TomeHirata)
  • [Tracing] Display notebook trace viewer when workspace is on (#20947, @TomeHirata)
  • [Tracing] Add MLFLOW_GATEWAY_RESOLVE_API_KEY_FROM_FILE flag to prevent local file inclusion in API gateway (#20965, @TomeHirata)
  • [Tracking] Fix Claude Agent SDK tracing by capturing messages from receive_messages (#20778, @smoorjani)
  • [Build / Tracking] Add missing authentication for fastapi routes (#20920, @TomeHirata)
  • [Evaluation] Fix guardrails scorer compatibility with guardrails-ai 0.9.0 (#20934, @smoorjani)
  • [UI] Fix duplicated title and add icons to Experiments/Prompts page headers (#20813, @B-Step62)
  • [Tracing] Trace UI papercut: highlight searched text and change search box hint's wording. (#20841, @PattaraS)
  • [Prompts] Fix arbitrary file read via prompt tag validation bypass in Model Registry (#20833, @TomeHirata)
  • [Tracking] Fix RestException crash on null error_code and incorrect except clause (#20903, @copilot-swe-agent)
  • [UI] Fix Disable action button in Traces Tab (#20883, @joelrobin18)
  • [UI] Fix experiment rename modal not refreshing experiment details (#20882, @joelrobin18)
  • [Build] Skip workspace header when workspace is disabled (#20904, @TomeHirata)
  • [UI] Block CORS for ajax paths (#20832, @TomeHirata)
  • [UI] [UI] Improve empty states across Experiments, Models, Prompts, and Gateway pages (#20044, @ridgupta26)
  • [UI] UI: Improve empty states for Traces and Sessions tabs (#20034, @ridgupta26)
  • [Build] Validate webhook url to fix SSRF vulnerability (#20747, @TomeHirata)
  • [Scoring / Tracing] Fix TypeError in online scoring config endpoint when basic-auth is enabled (#20783, @copilot-swe-agent)
    ...
Read more

v3.10.0rc0

12 Feb 05:01
9b0f106

Choose a tag to compare

v3.10.0rc0 Pre-release
Pre-release

We're excited to announce MLflow 3.10.0rc0, which includes several notable updates:

Major New Features:

  • 🏢 Organization Support in MLflow Tracking Server: MLflow now supports multi-workspace environments! You can organize your experiments and resources across different workspaces with a new landing page that lets you navigate between them seamlessly. (#20702, #20657, @mprahl, @Gkrumbach07, @B-Step62)
  • 💬 Multi-turn Conversation Simulation: Building on the conversation simulator introduced in 3.9, we've made it fully public and easily subclassable. You can now create custom simulation scenarios, compare sessions with goal/persona matching, and distill conversations into reusable goal/persona pairs for comprehensive agent testing. (#20243, #20377, #20289, @smoorjani)
  • 💰 Trace Cost Tracking: Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. (#20327, #20330, @serena-ruan)
  • 🎯 Top-level GenAI/Classical ML Split: We've redesigned the navigation to provide a frictionless experience. A new workflow type selector in the top-level navbar lets you quickly switch between GenAI and Classical ML contexts, with streamlined sidebars that reduce visual clutter. (#20158, #20160, #20161, #20699, @ispoljari, @daniellok-db)
  • 🎮 MLflow Demo Experiment: Get started with MLflow faster than ever! The new mlflow demo CLI command generates a fully-populated demo environment with sample traces, prompts, and evaluation data so you can explore MLflow's features hands-on without any setup. (#19994, #19995, #20046, #20047, #20048, #20162, @BenWilson2)
  • 📊 Gateway Usage Tracking: Monitor your AI Gateway endpoints with detailed usage analytics. A new usage page shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end observability. (#20357, #20358, #20642, @TomeHirata)

Stay tuned for the full release, which will be packed with even more features and bugfixes.

To try out this release candidate, please run:

pip install mlflow==3.10.0rc0

v.3.9.0

29 Jan 08:49

Choose a tag to compare

We're excited to announce MLflow 3.9.0, which includes several notable updates:

Major New Features:

  • 🔮 MLflow Assistant: Figuring out the next steps to debug your apps and agents can be challenging. We're excited to introduce the MLflow Assistant, an in-product chatbot that can help you identify, diagnose, and fix issues. The assistant is backed by Claude Code, and directly passes context from the MLflow UI to Claude. Click on the floating "Assistant" button in the bottom right of the MLflow UI to get started!
  • 📈 Trace Overview Dashboard: You can now get insights into your agent's performance at a glance with the new "Overview" tab in GenAI experiments. Many pre-built statistics are available out of the box, including performance metrics (e.g. latency, request count), quality metrics (based on assessments), and tool call summaries. If there are any additional charts you'd like to see, please feel free to raise an issue in the MLflow repository!
  • AI Gateway: We're revamping our AI Gateway feature! AI Gateway provides a unified interface for your API requests, allowing you to route queries to your LLM provider(s) of choice. In MLflow 3.9.0, the Gateway server is now located directly in the tracking server, so you don't need to spin up a new process. Additional features such as passthrough endpoints, traffic splits, and fallback models are also available, with more to come soon! For more detailed information, please take a look at the docs.
  • 🔎 Online Monitoring with LLM Judges: Configure LLM judges to automatically run on your traces, without having to write a line of code! You can either use one of our pre-defined judges, or provide your own prompt and instructions to create custom metrics. Head to the new "Judges" tab within the GenAI Experiment UI to get started.
  • 🤖 Judge Builder UI: Define and iterate on custom LLM judge prompts directly from the UI! Within the new "Judges" tab, you can create your own prompt for an LLM judge, and test-run it on your traces to see what the output would be. Once you're happy with it, you can either use it for online monitoring (as mentioned above), or use it via the Python SDK for your evals.
  • 🔗 Distributed Tracing: Trace context can now be propagated across different services and processes, allowing you to truly track request lifecycles from end to end. The related APIs are defined in the mlflow.tracing.distributed module (with more documentation to come soon).
  • 📚 MemAlign - a new judge optimizer algorithm: We're excited to introduce MemAlignOptimizer, a new algorithm that makes your judges smarter over time. It learns general guidelines from past feedback while dynamically retrieving relevant examples at runtime, giving you more accurate evaluations.

Features:

Bug fixes:

Read more

v3.9.0rc0

16 Jan 04:48
d8c9d8c

Choose a tag to compare

v3.9.0rc0 Pre-release
Pre-release

We're excited to announce MLflow 3.9.0rc0, a pre-release including several notable updates:

Major New Features:

  • 🔮 MLflow Assistant: Figuring out the next steps to debug your apps and agents can be challenging. We're excited to introduce the MLflow Assistant, an in-product chatbot that can help you identify, diagnose, and fix issues. The assistant is backed by Claude Code, and directly passes context from the MLflow UI to Claude. Click on the floating "Assistant" button in the bottom right of the MLflow UI to get started!
  • 📈 Trace Overview Dashboard: You can now get insights into your agent's performance at a glance with the new "Overview" tab in GenAI experiments. Many pre-built statistics are available out of the box, including performance metrics (e.g. latency, request count), quality metrics (based on assessments), and tool call summaries. If there are any additional charts you'd like to see, please feel free to raise an issue in the MLflow repository!
  • AI Gateway: We're revamping our AI Gateway feature! AI Gateway provides a unified interface for your API requests, allowing you to route queries to your LLM provider(s) of choice. In MLflow 3.9.0rc0, the Gateway server is now located directly in the tracking server, so you don't need to spin up a new process. Additional features such as passthrough endpoints, traffic splits, and fallback models are also available, with more to come soon! For more detailed information, please take a look at the docs.
  • 🔎 Online Monitoring with LLM Judges: Configure LLM judges to automatically run on your traces, without having to write a line of code! You can either use one of our pre-defined judges, or provide your own prompt and instructions to create custom metrics. Head to the new "Judges" tab within the GenAI Experiment UI to get started.
  • 🤖 Judge Builder UI: Define and iterate on custom LLM judge prompts directly from the UI! Within the new "Judges" tab, you can create your own prompt for an LLM judge, and test-run it on your traces to see what the output would be. Once you're happy with it, you can either use it for online monitoring (as mentioned above), or use it via the Python SDK for your evals.
  • 🔗 Distributed Tracing: Trace context can now be propagated across different services and processes, allowing you to truly track request lifecycles from end to end. The related APIs are defined in the mlflow.tracing.distributed module (with more documentation to come soon).
  • 📚 MemAlign - a new judge optimizer algorithm: We're excited to introduce MemAlignOptimizer, a new algorithm that makes your judges smarter over time. It learns general guidelines from past feedback while dynamically retrieving relevant examples at runtime, giving you more accurate evaluations.

Stay tuned for the full release, which will be packed with even more features and bugfixes.

To try out this release candidate, please run:

pip install mlflow==3.9.0rc0

Please try it out and report any issues on the issue tracker.

v3.8.1

27 Dec 02:56
4cc9d5b

Choose a tag to compare

MLflow 3.8.1 includes several bug fixes and documentation updates.

Bug fixes:

  • [Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
  • [Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
  • [Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
  • [UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)

Small bug fixes and documentation updates:

#19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db

v3.8.0

22 Dec 02:37
55ef1e2

Choose a tag to compare

MLflow 3.8.0 includes several major features and improvements

Major Features

  • ⚙️ Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
  • In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
  • ⚖️ DeepEval and RAGAS Judges Integration: New get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
  • 🛡️ Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
  • Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)

Important Notice

  • Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables: MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.

Features:

Bug fixes:

  • [Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
  • [Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
  • [Tracking] Fix SQLAlchemy alias conflict in _search_runs for dataset filters (#19498, @fredericosantos)
  • [Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
  • [] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
  • [UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
  • [Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
  • [Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
  • [Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
  • [UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
  • [Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
  • [Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
  • [Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
  • [Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
  • [Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
  • [Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
  • [Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
  • [Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
  • [UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
  • [Models] Add context loading for ChatModel (#19250, @BenWilson2)
  • [Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
  • [Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
  • [Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)

Documentation updates:

Small bug fixes and documentation updates:

#19497, #19358, #19322, #19383, #19288, #19287, #19230, #19225, @xsh310; #19504, @WeichenXu123; #19499, #19465, #19241, @B-Step62; #19479, #19385, #19297, #19347, #19314, #19286, #19269, @TomeHirata; #18894, @BnnaFish; #19480, #19427, #19351, #19312, #19292, #19303, #19291, #19418, #19395, #19240, #19267, #19102, #19082, #19076, @daniellok-db; #19463, #19370, #19369, #19368, #19367, #19366, #19363, #19354, #19302, #19272, #19266, #19258, #19255, #19242, #19236, #19235, #19203, #19214, #19212, #19210, #19204, #19197, #19196, #19194, #19190, #19182, #19178, #19179, #19163, #19157, #19150, #19137, #19132, #19114, #19115, #19113, #19112, #19111, #19110, #19107, #19091, #19090, #19078, @copilot-swe-agent; #19437, @SomtochiUmeh; #19420, #19329, #19317, #19207, #19086, @kevin-lyn; #19339, #19263, #19438, #19412, #19411, #19355, #19341, #19034, #19029, #19252, @smoorjani; #19416, #19399, #19402, #19353, #19313, #19296, #19294, #19264, #19202, #19206, #19165, #19161, #19158, #19126, #19147, #19099, @harupy; #19357, #19343, #19342, #19335, #19261, #19226, #19227, @BenWilson2; #19344, #19331, #19270, #19239, #19211, @serena-ruan; #19323, @bbqiu; #19373, @alkispoly-db; #19320, #19311, @kriscon-db; #19309, @stefanwayon; #19063, @cyficowley; #19160, @Killian-fal; #19142, #19141, @dbczumar; #19089, @hubertzub-db; #19098, @achen530

v3.8.0rc0

15 Dec 08:20
23ec3fb

Choose a tag to compare

v3.8.0rc0 Pre-release
Pre-release

MLflow 3.8.0rc0 includes several major features and improvements. More features to come in the final 3.8.0 release!

To try out this release candidate:

pip install mlflow==3.8.0rc0

Major Features

  • ⚙️ Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
  • In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
  • ⚖️ DeepEval Judges Integration: New get_judge API enables using DeepEval's evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani)
  • 🛡️ Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
  • Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)

v3.7.0

05 Dec 17:30

Choose a tag to compare

MLflow 3.7.0 includes several major features and improvements for GenAI Observability, Evaluation, and Prompt Management.

Major Features

  • 📝 Experiment Prompts UI: New prompts functionality in the experiment UI allows you to manage and search prompts directly within experiments, with support for filter strings and prompt version search in traces. (#19156, #18919, #18906, @TomeHirata)
  • 💬 Multi-turn Evaluation Support: Enhanced mlflow.genai.evaluate now supports multi-turn conversations, enabling comprehensive assessment of conversational AI applications with DataFrame and list inputs. (#18971, @AveshCSingh)
  • ⚖️ Trace Comparison: New side-by-side comparison view in the Traces UI allows you to analyze and debug LLM application behavior across different runs, making it easier to identify regressions and improvements. (#17138, @joelrobin18)
  • 🌐 Gemini TypeScript SDK: Auto-tracing support for Google's Gemini in TypeScript, expanding MLflow's observability capabilities for JavaScript/TypeScript AI applications. (#18207, @joelrobin18)
  • 🎯 Structured Outputs in Judges: The make_judge API now supports structured outputs, enabling more precise and programmatically consumable evaluation results. (#18529, @TomeHirata)
  • 🔗 VoltAgent Tracing: Added auto-tracing support for VoltAgent, extending MLflow's observability to this AI agent framework. (#19041, @joelrobin18)

Breaking Changes

Features

Bug Fixes

Documentation Updates

Small bug fixes and documentation updates:

#19220, #19140, #19141, #18984, #18985, #18822, @dbczumar; #19148, @ingo-stallknecht; #19183, #19201, #19130, #19049, #19030, #18778, #18780, #18556, #18555, @serena-ruan; #19153, #19181, #18784, #18783, #18802, #18881, #18695, #18879, #18782, #18845, #18787, #18786, #18590, @B-Step62; #19208, #19021, #19023, #18723, #18622, @smoorjani; #13314, @alokshenoy; #19138, #19171, #19146, #19067, #19064, #19045, #18968, #18967, #19018, #18966, #18990, #18912, @xsh310; #19168, @mcompen; #19145, #18702, #18642, @BenWilson2; #19126, #19022, #18951, #18887, #18954, #18949, #18934, #18914, #18903, #18877, #18859, #18838, #18828, #18821, #18717, #18710, #18756, #18713, @harupy; #18890, #18862, #18836, #18792, #18818, #18579, @TomeHirata; #19084, #18886, #18911, #18904, #18885, #18837, #18795, #18646, @daniellok-db; #18992, #19025, #19020, #18950, @kevin-lyn; #19069, #19072, #19043, #19027, #19028, #19019, #18995, #18997, #18989, #18991, #18987, #18983, #18980, #18979, #18974, #18972, #18969, #18948, #18940, #18942, #18939, #18938, #18933, #18932, #18931, #18915, #18882, #18865, #18861, #18860, #18846, #18841, #18830, #18824, #18823, #18819, #18789, #18804, #18779, #18775, #18772, #18704, #18606, #18748, #18746, #18745, #18743, #18732, #18737, #18736, #18729, #18718, #18703, #18693, #18686, #18682, #18633, #18675, #18671, #18653, #18652, @copilot-swe-agent; #19001, #18945, @danielseong1; #18815, @kevin-wangg; #19039, #18898, @AveshCSingh; #18742, @Killian-fal; #18923, @HomeLH; #18922, #18920, @UnfixedMold; #18798, @WeichenXu123; #18776, @pcliupc; #18417, @shaperilio

v2.22.4

05 Dec 10:43
2b5aa12

Choose a tag to compare

Version 2.22.4 is a patch release to backport several important fixes to MLflow 2.

  • Fix mlflow.spark.load_model to handle Unity Catalog Volumes paths correctly (#18672)
  • Introduce MLFLOW_CREATE_MODEL_VERSION_SOURCE_REGEX to validate source parameter of /model-versions/create request (#16081)
  • Fix spark udf on Databricks multi driver clusters (#18410)