Agno V2 fixes by joelrobin18 · Pull Request #18345 · mlflow/mlflow

joelrobin18 · 2025-10-16T06:59:29Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18345/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18345/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18345/merge

Related Issues/PRs

Fix #18335

What changes are proposed in this pull request?

Agno v2 introduces several breaking changes, including full support for OpenTelemetry instrumentation. This PR updates the tracing implementation to be compatible with Agno v2 by using MLflow’s native integration with OTel-based tracing.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

github-actions · 2025-10-16T13:01:34Z

Documentation preview for f7e2f52 is available at:

https://pr-18345--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

mlflow/agno/__init__.py

mlflow/agno/autolog.py

mlflow/agno/__init__.py

mlflow/agno/autolog.py

BenWilson2 · 2025-10-17T18:07:10Z

@joelrobin18 we might want to think about creating a v2 autologging module for the scope of the breaking changes to simplify things here. We could do version validation handling (there are other autologging integrations where this has been done) to prevent having to complicate the maintainability with embedding large amounts of try/catch logic or conditional logic within a single implementation.

joelrobin18 · 2025-10-17T18:14:30Z

Hi @BenWilson2 Thank you for the feedbacks. Im trying to refactor the code to use lesser try/catch blocks as well as address the above comments as well.

ashdam · 2025-11-11T18:06:15Z

Very much appreatiated if we could fully integrate v2.
imho it not worth trying to be compatible with v1 agno is updating very fast :)

joelrobin18 · 2025-11-12T15:04:32Z

Hi @ashdam Could you please add the below code at the top of your agent and check?

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from openinference.instrumentation.agno import AgnoInstrumentor

# Configure OTLP to export to MLflow
exporter = OTLPSpanExporter(
    endpoint="http://localhost:5000/v1/traces",
    headers={"x-mlflow-experiment-id": "0"}
)

tracer_provider = TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tracer_provider)

AgnoInstrumentor().instrument()

ashdam · 2025-11-12T16:19:11Z

@joelrobin18 i have tested it.

Not really an expert on MLFlow but currenly im using MLflow 3.6.0 OSS with PostgreSQL backend with Agno v2.2.6

Its agno is capturing calls but it raises the following error

i got this error in agno-os:

{"TimeStamp": "2025-11-12T16:55:42.7841868+00:00", "Log": "error Not Implemented encountered while exporting span batch, retrying in 0.96s."}
{"TimeStamp": "2025-11-12T16:55:43.74478+00:00", "Log": "error Not Implemented encountered while exporting span batch, retrying in 1.63s."}
{"TimeStamp": "2025-11-12T16:55:45.3754366+00:00", "Log": "error Not Implemented encountered while exporting span batch, retrying in 3.62s."}
{"TimeStamp": "2025-11-12T16:55:49.0009375+00:00", "Log": "to export span batch code: 501, reason: {\"detail\":\"REST OTLP span logging is not supported by FileStore\"}"}

joelrobin18 · 2025-11-13T00:56:36Z

It looks like we are using FileStore as backend here. Can you share a simple repo code for the same?

Signed-off-by: joelrobin18 <joelrobin1818@gmail.com>

joelrobin18 · 2025-11-13T07:12:59Z

mlflow/utils/autologging_utils/safety.py

+_AUTOLOGGING_CLEANUP_CALLBACKS = {}
+
+
+def register_cleanup_callback(autologging_integration, callback):


This is needed to clean up the OTel instrumentation after we disable them using mlflow.autolog(disable=True). Let me know if there is any better way to do this.

Does the approach we do in some flavors like DSPy works? https://github.com/mlflow/mlflow/blob/master/mlflow/dspy/autolog.py#L54-L60

Basically

Add an empty _autolog function

Decorate it with@autologging_integration, instead of the main autolog function.

Call that function inside the main autolog function.

This is hacky, but in this way, we can let autolog() function to be called when disable=True is specified.

ashdam · 2025-11-13T07:34:56Z

Hi @joelrobin18,

Thanks for looking into this! Yes, I'm definitely using PostgreSQL as the backend store, not FileStore.
I would like to add that im not python expert and I am using Sonnet 4.5 to help me out :P

Here's the configuration:

MLflow Server Setup

Deployment: Azure Container Apps running MLflow v3.6.0
Start Command:

mlflow server \
  --host 0.0.0.0 \
  --port 5000 \
  --backend-store-uri postgresql://agnoadmin:****@psql-agno-storage.postgres.database.azure.com:5432/mlflow_db?sslmode=require \
  --default-artifact-root wasbs://mlflow-artifacts@saagenticaifinancedemo.blob.core.windows.net/ \
  --serve-artifacts

Verified Backend Configuration:

$ az containerapp show --name mlflow-server --query "properties.template.containers[0].env"
[
  {
    "name": "MLFLOW_BACKEND_STORE_URI",
    "secretRef": "postgres-uri"  # Points to PostgreSQL connection string
  },
  ...
]

Agno v2.2.6 Integration Code

Following your recommendation, I'm using OpenInference AgnoInstrumentor:

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from openinference.instrumentation.agno import AgnoInstrumentor

mlflow_tracking_uri = os.getenv("MLFLOW_TRACKING_URI")
mlflow_experiment_id = os.getenv("MLFLOW_EXPERIMENT_ID", "0")

exporter = OTLPSpanExporter(
    endpoint=f"{mlflow_tracking_uri}/v1/traces",
    headers={"x-mlflow-experiment-id": mlflow_experiment_id},
)

tracer_provider = TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tracer_provider)

AgnoInstrumentor().instrument()

Dependencies:

agno==2.2.6
mlflow==3.6.0
openinference-instrumentation-agno==0.1.22
opentelemetry-sdk==1.38.0
opentelemetry-exporter-otlp-proto-http==1.38.0

The Issue

✅ What's Working:

OpenInference AgnoInstrumentor successfully hooks into Agno v2.2.6 Team/Model calls
Spans are being generated and batched
OTLP exporter attempts to send to MLflow

❌ The Error:

error Not Implemented encountered while exporting span batch, retrying in 0.96s.
error Not Implemented encountered while exporting span batch, retrying in 1.63s.
error Not Implemented encountered while exporting span batch, retrying in 3.62s.
Failed to export span batch code: 501, reason: {"detail":"REST OTLP span logging is not supported by FileStore"}

From Agent Execution Logs:
The instrumentation is definitely working - I can see the OpenInference wrappers in the stack traces:

File "/app/.venv/lib/python3.12/site-packages/openinference/instrumentation/agno/_runs_wrapper.py", line 512, in arun_stream
    async for response in wrapped(*args, **kwargs):
File "/app/.venv/lib/python3.12/site-packages/agno/team/team.py", line 2452, in _arun_stream
    async for event in self._ahandle_model_response_stream(
...
File "/app/.venv/lib/python3.12/site-packages/openinference/instrumentation/agno/_model_wrapper.py", line 493, in arun_stream
    async for chunk in wrapped(*args, **kwargs):

The Paradox

MLflow is configured with PostgreSQL backend (--backend-store-uri postgresql://...), but the 501 error indicates traces are still using FileStore. This suggests MLflow 3.6.0 has a separate trace storage layer that defaults to FileStore even when the main backend is PostgreSQL.

Is this a known limitation, or is there a missing configuration flag for enabling database trace storage?

I'm happy to provide a minimal reproducible repo if that helps debug this further!

Signed-off-by: joelrobin18 <joelrobin1818@gmail.com>

ashdam · 2025-11-14T09:50:53Z

Thank you very much for your work @joelrobin18 . this is highly anticipated in my company :)

ashdam · 2025-11-17T09:03:49Z

@BenWilson2 @joelrobin18 any news? :)

B-Step62

Overall looks good!

mlflow/agno/autolog.py

B-Step62 · 2025-11-18T10:25:44Z

mlflow/utils/autologging_utils/safety.py

+_AUTOLOGGING_CLEANUP_CALLBACKS = {}
+
+
+def register_cleanup_callback(autologging_integration, callback):


Does the approach we do in some flavors like DSPy works? https://github.com/mlflow/mlflow/blob/master/mlflow/dspy/autolog.py#L54-L60

Basically

Add an empty _autolog function

Decorate it with@autologging_integration, instead of the main autolog function.

Call that function inside the main autolog function.

This is hacky, but in this way, we can let autolog() function to be called when disable=True is specified.

B-Step62 · 2025-11-18T10:34:27Z

mlflow/agno/autolog.py

+        _logger.info("OpenTelemetry instrumentation enabled for Agno V2")
+
+    except ImportError as exc:
+        _logger.warning(


Can we raise this as an exception (with the current message)? Enable tracing is the single purpose of calling mlflow.agno.autolog(), so it does not make much sense to pass through if we fail to do that.

B-Step62 · 2025-11-18T10:45:32Z

Is this a known limitation, or is there a missing configuration flag for enabling database trace storage?

@ashdam I can see Agno traces logged successfuly via otel on my local, could not reproduce the error. Could you double check if the tracking URI points to the correct MLflow instance. The error message indicates the backend is actually file store.

You can also test it locally to see if this is related to your azure app container settings or not.

pip install mlflow==3.6.0
mlflow ui --backend-store-uri sqlite:///mlruns.db

ashdam · 2025-11-18T10:55:40Z

Yes, we only have 1 server of MLFlow and it had postgreSQL configured. Its really hard (devops + security) for me to replicate everything , including agno, in local tbh :(

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

ashdam · 2025-11-21T11:20:10Z

@B-Step62 @joelrobin18 Thank you guys for your work :)

B-Step62 · 2025-11-21T12:02:52Z

@ashdam I still believe what happens inside the app container is that MLflow is started with file store. The error message incldues the class name of the store and

mlflow/mlflow/server/otel_api.py

Line 128 in 82edf51

detail=f"REST OTLP span logging is not supported by {store_name}",

If the store is configured to SQL properly, you should see server logs like this

Registry store URI not provided. Using sqlite:///mlruns.db
2025/11/21 20:56:58 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/21 20:56:58 INFO mlflow.store.db.utils: Updating database tables

One common gotcha is that the multi-line commands are not properly formatted in the YAML file and it only runs the first line mlflow ui, which defaults to a file store. If that happens, you should see a server log like this instead:

Backend store URI not provided. Using ./mlruns
Registry store URI not provided. Using ./mlruns
.../server/handlers.py:258: FutureWarning: The filesystem tracking backend (e.g., './mlruns') will be deprecated in February 2026. Consider transitioning to a database backend (e.g., 'sqlite:///mlflow.db') to take advantage of the latest MLflow features. See https://github.com/mlflow/mlflow/issues/18534 for more details and migration guidance.
  return FileStore(store_uri, artifact_uri)

For example, this works

    command:
      - /bin/bash
      - -c
      - |
        mlflow server \
            --backend-store-uri postgresql://... \
            --port 5000

but this does not work (only mlflow server will be executed).

    command: >
      /bin/bash -c "
        mlflow server \
            --backend-store-uri postgresql://... \
            --port 5000
        "

ashdam · 2025-11-21T15:18:23Z

Thank you for the help :) I will test it next week :D thank you!

Signed-off-by: joelrobin18 <joelrobin1818@gmail.com> Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Co-authored-by: B-Step62 <yuki.watanabe@databricks.com>

Signed-off-by: joelrobin18 <joelrobin1818@gmail.com> Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Co-authored-by: B-Step62 <yuki.watanabe@databricks.com> Signed-off-by: Tian Lan <sky.blue266000@gmail.com>

github-actions bot added area/tracing MLflow Tracing and its integrations rn/bug-fix Mention under Bug Fixes in Changelogs. labels Oct 16, 2025

serena-ruan force-pushed the master branch from 69145f3 to 69928b4 Compare October 16, 2025 07:10

BenWilson2 reviewed Oct 16, 2025

View reviewed changes

Agno V2

2a1df06

Signed-off-by: joelrobin18 <joelrobin1818@gmail.com>

joelrobin18 force-pushed the fix_18335 branch from c79701b to 2a1df06 Compare November 13, 2025 07:10

joelrobin18 commented Nov 13, 2025

View reviewed changes

joelrobin18 requested a review from BenWilson2 November 13, 2025 09:14

Agno V2 Test suite

9034a55

Signed-off-by: joelrobin18 <joelrobin1818@gmail.com>

B-Step62 approved these changes Nov 18, 2025

View reviewed changes

B-Step62 reviewed Nov 18, 2025

View reviewed changes

B-Step62 added 6 commits November 21, 2025 17:00

fix test

099122c

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

clean up

260b1fc

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

import fix

44a41a7

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

missing files

c968ba0

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

lint

83c0501

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

fix

f7e2f52

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 added this pull request to the merge queue Nov 21, 2025

Merged via the queue into mlflow:master with commit 43f06f9 Nov 21, 2025
50 checks passed

		_AUTOLOGGING_CLEANUP_CALLBACKS = {}


		def register_cleanup_callback(autologging_integration, callback):

Conversation

joelrobin18 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenWilson2 commented Oct 17, 2025

Uh oh!

joelrobin18 commented Oct 17, 2025

Uh oh!

ashdam commented Nov 11, 2025

Uh oh!

joelrobin18 commented Nov 12, 2025

Uh oh!

ashdam commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joelrobin18 commented Nov 13, 2025

Uh oh!

joelrobin18 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

B-Step62 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ashdam commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MLflow Server Setup

Agno v2.2.6 Integration Code

The Issue

The Paradox

Uh oh!

ashdam commented Nov 14, 2025

Uh oh!

ashdam commented Nov 17, 2025

Uh oh!

B-Step62 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

B-Step62 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

B-Step62 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

B-Step62 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashdam commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashdam commented Nov 21, 2025

Uh oh!

B-Step62 commented Nov 21, 2025

Uh oh!

Uh oh!

ashdam commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

joelrobin18 commented Oct 16, 2025 •

edited

Loading

github-actions bot commented Oct 16, 2025 •

edited

Loading

ashdam commented Nov 12, 2025 •

edited

Loading

ashdam commented Nov 13, 2025 •

edited

Loading

B-Step62 commented Nov 18, 2025 •

edited

Loading

ashdam commented Nov 18, 2025 •

edited

Loading