Skip to content

Add passthrough support for Gemini generateContent and streamGenerateContent APIs#19425

Merged
TomeHirata merged 4 commits intomlflow:masterfrom
TomeHirata:stack/gateway/model-passthrough/gemini
Dec 18, 2025
Merged

Add passthrough support for Gemini generateContent and streamGenerateContent APIs#19425
TomeHirata merged 4 commits intomlflow:masterfrom
TomeHirata:stack/gateway/model-passthrough/gemini

Conversation

@TomeHirata
Copy link
Collaborator

@TomeHirata TomeHirata commented Dec 16, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

n/a

What changes are proposed in this pull request?

Add following Gemini model passthrough endpoints where request body is just propagated to the provider endpoint:

  • /gateway/gemini/v1beta/models/{endpoint_name}:generateContent
  • /gateway/gemini/v1beta/models/{endpoint_name}:streamGenerateContent
from google import genai
from google.genai import types
import mlflow

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Gateway")
from mlflow.tracking._tracking_service.utils import _get_store

store = _get_store()

secret = store.create_gateway_secret(
    secret_name="gemini_api",
    secret_value={"api_key": "<secret>"},
    provider="gemini",
)
model_def = store.create_gateway_model_definition(
    name="gemini-2.5-flash",
    secret_id=secret.secret_id,
    provider="gemini",
    model_name="gemini-2.5-flash",
)
endpoint = store.create_gateway_endpoint(
    name="gemini-v1", model_definition_ids=[model_def.model_definition_id]
)

client = genai.Client(
    api_key='dummy',
    http_options={
        'base_url': "http://localhost:5000/gateway/gemini",
    })
response = client.models.generate_content(
    model="gemini-v1",
    contents='Hello',
)
client.close()
print(response.candidates[0].content.parts[0].text)

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 16, 2025

Documentation preview for 880b16b is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds passthrough support for Gemini's generateContent and streamGenerateContent APIs as part of a larger effort to provide native API support for multiple LLM providers. The implementation follows the patterns established in the stack for OpenAI and Anthropic passthrough endpoints.

Key Changes

  • Added two new Gemini passthrough endpoints that accept raw Gemini API format
  • Implemented LiteLLM provider as a fallback for unsupported providers
  • Extended base provider class with passthrough methods for multiple providers

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mlflow/server/gateway_api.py Added Gemini passthrough endpoints (generateContent and streamGenerateContent) and helper function for extracting endpoint names
mlflow/gateway/providers/gemini.py Implemented passthrough methods for Gemini generateContent APIs
mlflow/gateway/providers/base.py Added abstract passthrough method definitions for all provider types
mlflow/gateway/providers/anthropic.py Implemented Anthropic Messages API passthrough endpoint
mlflow/gateway/providers/openai.py Implemented OpenAI passthrough endpoints for chat, embeddings, and responses APIs
mlflow/gateway/providers/litellm.py New provider using LiteLLM library as fallback for unsupported providers
mlflow/gateway/config.py Added LiteLLMConfig class and LITELLM provider enum value
mlflow/gateway/provider_registry.py Registered LiteLLM provider in the provider registry
tests/server/test_gateway_api.py Comprehensive tests for all new passthrough endpoints including streaming
tests/gateway/providers/test_gemini.py Unit tests for Gemini passthrough methods
tests/gateway/providers/test_anthropic.py Unit tests for Anthropic passthrough methods
tests/gateway/providers/test_openai.py Unit tests for OpenAI passthrough methods
tests/gateway/providers/test_litellm.py Complete test suite for new LiteLLM provider
docs/api_reference/api_inventory.txt Updated API inventory with LiteLLMConfig references

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
response = await provider.passthrough_anthropic_messages(payload)

assert payload["model"] == "claude-2.1"
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion validates that the payload has been mutated in place by the passthrough_anthropic_messages method (the model field is added). While the test currently passes, asserting on side effects of the function under test rather than its return value makes the test more fragile. Consider testing only the return value and the mock call arguments.

Suggested change
assert payload["model"] == "claude-2.1"

Copilot uses AI. Check for mistakes.
Accepts raw Anthropic request format and returns raw Anthropic response format.
Supports streaming if the 'stream' parameter is set to True.
"""
# Add model name from config
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating the input payload dictionary in place could cause unexpected side effects for callers. The payload is modified by adding the 'model' field, which could affect code that reuses the same dictionary. Consider creating a copy of the payload before modifying it, or explicitly document this mutation as part of the function's contract.

Suggested change
# Add model name from config
# Add model name from config
payload = dict(payload) # Avoid mutating the input dictionary

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +120
detail=f"The passthrough Anthropic messages route is not implemented for {self.NAME}"
"models.",
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space between string literals in error message. The error message will read "...for {self.NAME}models." instead of "...for {self.NAME} models."

Suggested change
detail=f"The passthrough Anthropic messages route is not implemented for {self.NAME}"
"models.",
detail=f"The passthrough Anthropic messages route is not implemented for {self.NAME} models.",

Copilot uses AI. Check for mistakes.
if "api_base" in auth_config:
litellm_config["litellm_api_base"] = auth_config["api_base"]
provider_config = LiteLLMConfig(**litellm_config)
model_config.provider = Provider.LITELLM
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating the model_config.provider field could have unexpected side effects. The model_config object likely comes from the database store and modifying it in-place could affect other parts of the code that reference it. Consider creating a new variable or ensuring this mutation doesn't propagate beyond this function's scope.

Copilot uses AI. Check for mistakes.
}
response = await provider.passthrough_anthropic_messages(payload)

assert payload["model"] == "claude-2.1"
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion validates that the payload has been mutated in place by the passthrough_anthropic_messages method (the model field is added). While the test currently passes, asserting on side effects of the function under test rather than its return value makes the test more fragile. Consider testing only the return value and the mock call arguments.

Copilot uses AI. Check for mistakes.
@TomeHirata TomeHirata force-pushed the stack/gateway/model-passthrough/gemini branch 2 times, most recently from 20b0386 to cf3d4f8 Compare December 17, 2025 04:07
@TomeHirata TomeHirata force-pushed the stack/gateway/model-passthrough/gemini branch 4 times, most recently from d991ed4 to f07f11e Compare December 17, 2025 08:48
Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM, left a few minor comments

@TomeHirata TomeHirata force-pushed the stack/gateway/model-passthrough/gemini branch 2 times, most recently from 0a9ed11 to b23b045 Compare December 17, 2025 13:36
…Content APIs

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata force-pushed the stack/gateway/model-passthrough/gemini branch from b23b045 to 880b16b Compare December 17, 2025 16:13
@TomeHirata TomeHirata enabled auto-merge December 17, 2025 16:14
Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@TomeHirata
Copy link
Collaborator Author

Bypassing unrelated test failures

@TomeHirata TomeHirata disabled auto-merge December 18, 2025 00:35
@TomeHirata TomeHirata merged commit b72d49a into mlflow:master Dec 18, 2025
46 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rn/feature Mention under Features in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants