Skip to content

Add log_stream API for logging binary streams as artifacts#19104

Merged
harupy merged 14 commits intomlflow:masterfrom
harupy:add-log-stream-api
Jan 6, 2026
Merged

Add log_stream API for logging binary streams as artifacts#19104
harupy merged 14 commits intomlflow:masterfrom
harupy:add-log-stream-api

Conversation

@harupy
Copy link
Member

@harupy harupy commented Nov 28, 2025

Related Issues/PRs

Resolves #19050

What changes are proposed in this pull request?

Add a new mlflow.log_stream API that allows logging binary file-like objects (e.g., io.BytesIO) directly as artifacts without needing to create temporary files manually.

import io
import mlflow

with mlflow.start_run():
    # Log a BytesIO stream
    bytes_stream = io.BytesIO(b"binary content")
    mlflow.log_stream(bytes_stream, "binary_file.bin")

The API:

  • Supports binary streams (io.BufferedIOBase | io.RawIOBase)
  • Writes content in chunks (8KB) to avoid excessive memory usage
  • Available in both fluent API (mlflow.log_stream) and client API (MlflowClient().log_stream)

How is this PR tested?

  • New unit/integration tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.

Release Notes

Is this a user-facing change?

  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Add mlflow.log_stream API for logging binary file-like objects (e.g., io.BytesIO) as artifacts without creating temporary files.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging

How should the PR be classified in the release notes? Choose one:

  • rn/feature - A new user-facing feature worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

🤖 Generated with Claude Code

@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Nov 28, 2025
@harupy harupy added the team-review Trigger a team review request label Nov 28, 2025
@harupy harupy force-pushed the add-log-stream-api branch 3 times, most recently from ff8a089 to e8f88a3 Compare November 28, 2025 11:34
harupy and others added 2 commits December 2, 2025 14:20
Resolves mlflow#19050

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy force-pushed the add-log-stream-api branch from e8f88a3 to ecb5ce4 Compare December 2, 2025 05:22
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

Documentation preview for 52d92db is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@harupy harupy changed the title Add log_stream API for logging file-like objects as artifacts Add log_stream API for logging file-like objects as artifacts Dec 2, 2025
harupy and others added 2 commits December 2, 2025 17:30
Simplify log_stream to only support binary streams (IO[bytes]).
This removes the text stream (IO[str]) handling which adds complexity
without significant benefit since users can easily convert text to bytes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy changed the title Add log_stream API for logging file-like objects as artifacts Add log_stream API for logging binary streams as artifacts Dec 2, 2025
Comment on lines +2771 to +2772
# TODO: The current implementation creates a temporary file. Consider adding
# a direct upload API to artifact repositories to avoid this overhead.
Copy link
Member Author

@harupy harupy Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other log-in-memory-object APIs (e.g., log_text) can benefit from this too.

@serena-ruan
Copy link
Collaborator

serena-ruan commented Dec 9, 2025

/review


🚀 Review workflow started


✅ Review completed.

Review Output

Now I'll review the changes carefully against the Python style guide and general code quality standards.

Let me analyze the PR changes:

Style Guide Checklist:

  1. ✅ Avoid redundant docstrings - Docstrings provide value, not redundant
  2. ⚠️ Prefer typing.Literal for fixed-string parameters - Not applicable
  3. ✅ Minimize try-catch block scope - No try-catch blocks
  4. ✅ Use dataclasses instead of complex tuples - No complex tuples
  5. ✅ Use pathlib methods instead of os - Code uses pathlib correctly
  6. ✅ Pass pathlib.Path objects directly to subprocess - Not applicable
  7. ✅ Use next() to find first match - Not applicable
  8. ✅ Use pattern matching for string splitting - Not applicable
  9. ✅ Always verify mock calls - Not applicable (no mocks in tests)
  10. ✅ Set mock behaviors in patch declaration - Not applicable
  11. ✅ Parametrize tests with multiple input cases - Already using parametrize
  12. ✅ Avoid custom messages in test asserts - No custom messages
  13. ✅ Preserve function metadata in decorators - Not applicable

Code Quality Review:

Looking at the implementation, I found one issue:

  1. Tests: The tests don't cover the chunked reading behavior mentioned in the PR description and implementation. While test_log_stream_large_content exists, it doesn't verify that the stream position is properly handled or that partial reads work correctly.

However, after carefully reviewing all the added lines (marked with +), I don't see any style guide violations or significant code quality issues. The code:

  • Has appropriate docstrings with examples
  • Uses proper type hints
  • Follows the existing codebase patterns
  • Uses walrus operator appropriately
  • Tests cover various scenarios (subdirectories, empty content, large content)

No issues found.

Copilot AI review requested due to automatic review settings December 9, 2025 08:12
@harupy
Copy link
Member Author

harupy commented Dec 9, 2025

/resolve


🚀 Resolve workflow started


✅ Workflow completed successfully.

Claude Output

Perfect! I've successfully resolved all 3 unresolved PR review comments. Here's a summary of what was done:

Summary

I successfully addressed all unresolved PR review comments for PR #19104:

Changes Made:

  1. mlflow/tracking/client.py:2820 - Added @experimental(version="3.8.0") decorator to the MlflowClient.log_stream() method
  2. mlflow/tracking/fluent.py:1644 - Added @experimental(version="3.8.0") decorator to the log_stream() fluent API function

Review Comments Resolved:

  • All 3 comments from reviewer harupy requesting the experimental decorator have been addressed
  • The decorator uses version "3.8.0" as specified in the review comments

Commit Details:

  • Changes committed with DCO sign-off (required by the project)
  • All pre-commit hooks passed successfully
  • Commit message follows project conventions

The changes have been committed locally but not pushed to the remote repository, as per the instructions. The commit is ready for you to push when appropriate.

Add @experimental(version="3.8.0") decorator to log_stream API in both
MlflowClient and fluent API as requested in review comments.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new mlflow.log_stream() API that allows logging binary file-like objects (e.g., io.BytesIO) directly as artifacts without needing to create temporary files manually. This addresses issue #19050 by providing a convenient method for users who work with in-memory binary streams.

Key Changes:

  • New log_stream() API in both fluent (mlflow.log_stream) and client (MlflowClient().log_stream) interfaces
  • Chunked reading (8KB) to handle large streams efficiently without excessive memory usage
  • Comprehensive test coverage including subdirectories, empty streams, and large content

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/tracking/test_tracking.py Adds three test functions covering various scenarios: subdirectory handling, empty streams, and large content to verify chunked reading
mlflow/tracking/fluent.py Implements the fluent API log_stream() function with proper documentation and type hints, following the same pattern as log_text() and log_dict()
mlflow/tracking/client.py Implements the client API log_stream() method with chunked reading logic (8KB chunks) and comprehensive documentation
mlflow/init.py Exports log_stream function in both the import statement and __all__ list for public API access
docs/api_reference/api_inventory.txt Adds API documentation entries for both mlflow.log_stream and MlflowClient.log_stream

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@harupy
Copy link
Member Author

harupy commented Dec 9, 2025

/autoformat

Copy link
Collaborator

@serena-ruan serena-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

harupy and others added 2 commits December 11, 2025 16:19
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@Vigilans
Copy link

Will this land in next release?

@harupy
Copy link
Member Author

harupy commented Dec 22, 2025

/resolve


🚀 Resolve workflow started


✅ Workflow completed successfully.

Claude Output

Excellent! I've successfully addressed both PR review comments:

Summary

I resolved 2 unresolved PR review comments:

  1. mlflow/tracking/fluent.py:1643 - Added missing import for experimental decorator

    • Added from mlflow.utils.annotations import experimental at line 84
  2. mlflow/tracking/client.py:2846 - Updated TODO comment to mention that other log-in-memory-object APIs can benefit from direct upload

    • Extended the TODO comment to note that log_text and similar APIs can also benefit from the direct upload API

All changes have been committed locally with DCO sign-off. The commit passed all pre-commit checks including ruff, format, and other linters. The changes are ready but have not been pushed (as per instructions).

mlflow-app bot and others added 3 commits December 22, 2025 09:51
🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
harupy added 2 commits January 5, 2026 16:10
Signed-off-by: Harutaka Kawamura <hkawamura0130@gmail.com>
@harupy harupy added this pull request to the merge queue Jan 6, 2026
Merged via the queue into mlflow:master with commit 6966d43 Jan 6, 2026
46 checks passed
@harupy harupy deleted the add-log-stream-api branch January 6, 2026 07:10
omarfarhoud pushed a commit to omarfarhoud/mlflow that referenced this pull request Jan 20, 2026
…19104)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Signed-off-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] Support for logging file-like objects in mlflow.log_artifact and mlflow.log_artifacts

4 participants