fix(dataset): add support for file cells in datasets with upload and external URL linking capabilities #3462

galzilber · 2025-11-24T16:11:54Z

… external URL linking capabilities

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Adds support for file attachments in datasets, including local uploads and external URL linking, with updates to models, tests, and examples.

Behavior:
- Adds support for file attachments in datasets, including local uploads and external URL linking.
- Updates create() in datasets.py to handle Attachment and ExternalAttachment objects.
- Introduces Attachment and ExternalAttachment classes in attachment.py for handling file uploads and external URLs.
Models:
- Adds FileCellType, FileStorageType, and related models to model.py for handling file metadata.
- Updates Dataset class in dataset.py to support new attachment features.
Tests:
- Adds tests for datasets with attachments in test_dataset_with_attachments.py.
- Updates existing tests to accommodate new attachment functionality in test_create_dataset.py and test_datasets_operations.py.
Misc:
- Renames dataset to datasets in various test paths and imports.
- Adds examples for using attachments in dataset_attachments_example.py.

^{This description was created by}^{for fb871d9. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Added dataset creation API with support for multiple attachment types: local file uploads, external URL references, and in-memory data.
- Introduced comprehensive attachment handling capabilities for datasets including images, videos, audio files, and generic files with automatic metadata tracking.
Documentation
- Added sample examples demonstrating dataset creation with various attachment scenarios and usage patterns.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Fixes TLP-1196

… external URL linking capabilities

coderabbitai · 2025-11-24T16:12:07Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Reorganized dataset exports from traceloop.sdk.dataset to traceloop.sdk.datasets. Introduced attachment support via Attachment, ExternalAttachment, and AttachmentReference classes for file uploads. Enhanced Datasets.create() with attachment extraction, processing, and upload workflows. Added file-type models and expanded test coverage with cassettes and examples.

Changes

Cohort / File(s)	Summary
Module restructuring `traceloop/sdk/dataset/__init__.py`	Removed public re-exports (Dataset, Column, Row, BaseDatasetEntity, ColumnType, DatasetMetadata) that were previously re-exported from submodules.
Datasets package initialization `traceloop/sdk/datasets/__init__.py`	Added public re-exports of Dataset, Column, Row, BaseDatasetEntity, ColumnType, DatasetMetadata, FileCellType, FileStorageType, Attachment, ExternalAttachment, and AttachmentReference.
Attachment support `traceloop/sdk/datasets/attachment.py`	New module introducing `Attachment` (local/in-memory file upload), `ExternalAttachment` (external URL linking), and `AttachmentReference` (attachment metadata and download handling) with S3 upload, file validation, and download capabilities.
Dataset creation with attachments `traceloop/sdk/datasets/datasets.py`	Added `Datasets.create()` method orchestrating three-step flow: extract attachments from rows, create dataset without attachments, then upload/attach files and update row metadata. Added helper methods `_extract_attachments()`, `_prepare_request_for_creation()`, and `_process_attachments()`.
File-type models `traceloop/sdk/datasets/model.py`	Added enums `FileCellType` (IMAGE, VIDEO, AUDIO, FILE) and `FileStorageType` (INTERNAL, EXTERNAL); new models `FileCellMetadata`, `FileCellValue`, `UploadURLRequest`, `UploadURLResponse`, `UploadStatusRequest`, `ExternalURLRequest`.
Import path updates `traceloop/sdk/datasets/dataset.py`, `tests/datasets/test_*.py`, `sample_app/dataset_example.py`	Updated imports to reference new module locations (e.g., `traceloop.sdk.datasets.model` instead of `traceloop.sdk.dataset.model`).
Comprehensive test module `tests/datasets/test_dataset_with_attachments.py`	New test file with five scenarios: external attachments (video, documents), file uploads (image, PDF), in-memory attachments (bytes), mixed attachments, and baseline (no attachments). Uses VCR cassettes and mocks S3 operations.
Test fixtures (VCR cassettes) `tests/datasets/cassettes/test_dataset_with_attachments/*.yaml`	Five YAML cassettes recording API interactions for dataset creation and attachment workflows: external attachments, file uploads, in-memory uploads, mixed types, and baseline creation.
Sample code `sample_app/dataset_attachments_example.py`	New example file demonstrating four attachment patterns: external URLs (ExternalAttachment with videos/documents), file uploads (Attachment with images/PDFs), in-memory payloads (Attachment with byte data), and mixed attachment types; includes cleanup and error handling.

Sequence Diagram

sequenceDiagram
    participant User
    participant Datasets
    participant HTTPClient
    participant S3
    participant Traceloop API
    
    User->>Datasets: create(dataset_request with attachments)
    Note over Datasets: _extract_attachments()
    Datasets->>Datasets: Collect Attachment/ExternalAttachment objects per row
    
    Note over Datasets: _prepare_request_for_creation()
    Datasets->>Datasets: Replace attachments with None in rows
    
    Datasets->>Traceloop API: POST /v2/datasets (without attachments)
    Traceloop API-->>Datasets: Create dataset, return rows with IDs
    
    Note over Datasets: _process_attachments()
    
    alt Attachment (file/bytes)
        Datasets->>Traceloop API: POST upload-url endpoint
        Traceloop API-->>Datasets: Return signed S3 URL + storage key
        Datasets->>S3: PUT file to signed URL
        S3-->>Datasets: Upload success
        Datasets->>Traceloop API: PUT upload-status (success)
        Traceloop API-->>Datasets: Confirm status
    else ExternalAttachment (URL)
        Datasets->>Traceloop API: POST attach external URL endpoint
        Traceloop API-->>Datasets: Confirm attachment
    end
    
    Datasets->>Datasets: Update row metadata with attachment references
    Datasets-->>User: Return Dataset with attached files

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Areas requiring extra attention:

attachment.py — Complex S3 upload logic, file validation, and error handling; verify proper resource cleanup and exception propagation
datasets.py create() method — Multi-step attachment workflow with row/column indexing and metadata updates; ensure attachment extraction and processing order is correct
Test cassettes — Verify VCR interactions accurately reflect actual API request/response payloads and HTTP methods (PUT vs POST)
Module migration — Confirm all import paths are consistently updated across test files and samples

Possibly related PRs

feat(datasets): add dataset and datasets functionality #3247 — Related module restructuring and dataset package re-exports (Dataset/Datasets class surface changes)
fix(traceloop-sdk): add type-checking support with mypy #3463 — Related SDK dataset public API surface and import path adjustments

Suggested reviewers

doronkopit5
nirga

Poem

🐰 Hops of file and URLs so bright,
Attachments float through the SDK night,
S3 buckets, metadata gleam,
Datasets dance in a streaming dream,
Upload, attach, then download with delight! 📎✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	❓ Inconclusive	The PR includes some architectural changes beyond the core feature: removing exports from traceloop.sdk.dataset and adding them to traceloop.sdk.datasets, plus updating import paths across samples and tests.	Clarify whether the namespace reorganization (moving from dataset to datasets package) is intentional refactoring or scope creep, and document any breaking changes to the public API.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding file cell support to datasets with upload and external URL linking capabilities, which aligns with the core modifications.
Linked Issues check	✅ Passed	The PR implements file attachment support for datasets with upload and external URL capabilities, directly addressing the linked issue TLP-1196 for file support in the Python SDK.
Docstring Coverage	✅ Passed	Docstring coverage is 91.18% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gz/add-files-to-dataset

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 5c18471 in 2 minutes and 53 seconds. Click for details.

Reviewed 957 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/README.md:18

Draft comment:
Clarify error handling in file cell examples by indicating specific exceptions (e.g., ValueError, FileNotFoundError).
Reason this comment was not posted:
Comment looked like it was already resolved.

2. packages/traceloop-sdk/traceloop/sdk/dataset/row.py:247

Draft comment:
Handle potential exceptions from os.path.getsize(file_path) in case the file becomes unavailable.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is speculative - it's suggesting handling a theoretical race condition where a file could be deleted between being uploaded (lines 224-226) and having its size checked (line 247). This is an extremely unlikely scenario in practice. The file existence is already validated at line 208, and the file is successfully opened and uploaded just moments before line 247. The comment says "in case the file becomes unavailable" which is a speculative "what if" scenario. According to the rules, I should NOT make speculative comments like "If X, then Y is an issue" - only comment if it's definitely an issue. This falls into that category of speculative concerns. However, it's technically possible for a file to be deleted by another process between the upload and the getsize call, especially in concurrent environments. This could cause an OSError that would crash the method. Is this common enough to warrant handling? While technically possible, this is an extremely rare edge case that would require external interference (another process deleting the file) during a very short time window. The code already validates file existence before starting the upload process. This type of defensive programming for every possible edge case is not typically required unless there's evidence it's a real problem. The comment is speculative and doesn't provide strong evidence this is a real issue. This comment should be deleted. It's a speculative suggestion about a theoretical race condition that is extremely unlikely to occur in practice. The rules explicitly state not to make speculative comments like "If X, then Y is an issue" unless it's definitely an issue.

Workflow ID: wflow_guLY3nRUyggh7XgR

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

packages/traceloop-sdk/tests/dataset/test_file_operations.py

packages/traceloop-sdk/traceloop/sdk/dataset/row.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed cc20bc1 in 47 seconds. Click for details.

Reviewed 11 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/tests/dataset/test_file_operations.py:105

Draft comment:
Good to see the trailing newline added. Consider wrapping the temporary file cleanup in a try/finally block to ensure cleanup on test failures.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

Workflow ID: wflow_r5RqHAxBTgVSyWdO

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

packages/traceloop-sdk/tests/dataset/test_file_operations.py (1)

67-105: Ensure temporary file is always cleaned up with try/finally

If set_file_cell or the assertions fail, os.unlink(test_file.name) won’t run and the temp file may be left behind. Wrapping the test body in a try/finally improves robustness.

-    # Create test file
-    test_file = tempfile.NamedTemporaryFile(suffix=".txt", delete=False)
-    test_file.write(b"test content")
-    test_file.close()
-
-    # Mock S3 upload to succeed
-    with patch.object(row, "_upload_file_to_presigned_url", return_value=True):
-        row.set_file_cell(
-            column_slug="file",
-            file_path=test_file.name,
-            file_type=FileCellType.FILE,
-        )
-
-    # Verify
-    assert row.values["file"]["storage"] == "internal"
-    assert row.values["file"]["status"] == "success"
-    assert "storage_key" in row.values["file"]
-
-    # Clean up
-    os.unlink(test_file.name)
+    # Create test file
+    test_file = tempfile.NamedTemporaryFile(suffix=".txt", delete=False)
+    try:
+        test_file.write(b"test content")
+        test_file.close()
+
+        # Mock S3 upload to succeed
+        with patch.object(row, "_upload_file_to_presigned_url", return_value=True):
+            row.set_file_cell(
+                column_slug="file",
+                file_path=test_file.name,
+                file_type=FileCellType.FILE,
+            )
+
+        # Verify
+        assert row.values["file"]["storage"] == "internal"
+        assert row.values["file"]["status"] == "success"
+        assert "storage_key" in row.values["file"]
+    finally:
+        os.unlink(test_file.name)

packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2)

223-231: Act on _confirm_file_upload result when marking uploads failed/successful

In both the failure path (Line 228–230) and the final confirmation (Line 249–251), _confirm_file_upload is called but its boolean return value is ignored. If the backend returns {"success": false} without raising an HTTP error, the SDK will still proceed as if the operation succeeded.

Consider:

Checking the return value and raising an exception (or at least logging) when it is False.
Optionally returning early or not updating self.values[column_slug] if confirmation fails.

For example:

-            if not success:
-                self._confirm_file_upload(column_slug, status="failed", metadata=metadata)
-                raise Exception(f"Failed to upload file to S3: {file_name}")
+            if not success:
+                confirmed = self._confirm_file_upload(
+                    column_slug, status="failed", metadata=metadata
+                )
+                if not confirmed:
+                    raise Exception(
+                        f"Failed to upload file to S3 and backend did not confirm failure for cell {column_slug}"
+                    )
+                raise Exception(f"Failed to upload file to S3: {file_name}")
@@
-            self._confirm_file_upload(
-                column_slug=column_slug, status="success", metadata=final_metadata
-            )
+            if not self._confirm_file_upload(
+                column_slug=column_slug, status="success", metadata=final_metadata
+            ):
+                raise Exception(
+                    f"Upload confirmation failed for cell {column_slug} after successful upload"
+                )

Also applies to: 249-251

232-244: Use logging instead of print for thumbnail warnings

The warnings for missing thumbnail file and failed thumbnail upload currently use print, which makes it harder to route or filter in real environments.

Prefer the standard logging module, e.g.:

-import os
+import os
+import logging
@@
-                if not os.path.exists(thumbnail_path):
-                    print(f"Warning: Thumbnail file not found: {thumbnail_path}")
+                if not os.path.exists(thumbnail_path):
+                    logging.warning("Thumbnail file not found: %s", thumbnail_path)
@@
-                    if not thumb_success:
-                        print(f"Warning: Failed to upload thumbnail for {file_name}")
+                    if not thumb_success:
+                        logging.warning(
+                            "Failed to upload thumbnail for %s", file_name
+                        )

🧹 Nitpick comments (3)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (1)

12-72: File-cell models are consistent; consider tightening status representation

The new FileCellType, FileStorageType, and related models (FileCellMetadata, FileCellValue, UploadURLRequest/Response, UploadStatusRequest, ExternalURLRequest) line up with the row logic and test expectations, so the overall shape looks solid.

If you want stronger validation, you might later replace the free-form status: str fields with a small enum or shared constants (e.g., "in_progress", "success", "failed") to avoid accidental typos, but that’s not required for correctness.
packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2)
122-149: Optional: derive local file-cell state from API response

_set_external_file_url currently ignores the server response (beyond None/error) and reconstructs the local cell dict manually. If the backend ever adds fields (e.g., normalized URL, derived metadata), this method might drift from the canonical representation.

If the API returns the full cell payload, consider updating self.values[column_slug] from that response instead of hand-building the dict, or at least centralizing the shape in a single helper to avoid duplication.

245-261: Avoid mutating caller-provided metadata when adding size_bytes

final_metadata = metadata or {} followed by final_metadata["size_bytes"] = ... will mutate the original dict when metadata is provided by the caller, which can be surprising if that dict is reused elsewhere.

Consider copying:
-            final_metadata = metadata or {}
-            final_metadata["size_bytes"] = os.path.getsize(file_path)
+            final_metadata = {**(metadata or {})}
+            final_metadata["size_bytes"] = os.path.getsize(file_path)
This keeps the API safer while preserving behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ba1aa4c and cc20bc1.

📒 Files selected for processing (6)

packages/traceloop-sdk/README.md (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_external_url_file_cell.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/test_file_operations.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/dataset/model.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/tests/dataset/test_file_operations.py
packages/traceloop-sdk/traceloop/sdk/dataset/model.py
packages/traceloop-sdk/traceloop/sdk/dataset/row.py

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml
packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_external_url_file_cell.yaml

🧠 Learnings (1)

📚 Learning: 2025-08-04T15:35:30.188Z

Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.

Applied to files:

packages/traceloop-sdk/traceloop/sdk/dataset/row.py

🧬 Code graph analysis (2)

packages/traceloop-sdk/tests/dataset/test_file_operations.py (2)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (1)

FileCellType (15-19)

packages/traceloop-sdk/traceloop/sdk/dataset/row.py (3)

set_file_cell (151-261)

Row (19-261)

delete (37-47)

packages/traceloop-sdk/traceloop/sdk/dataset/row.py (3)

packages/traceloop-sdk/traceloop/sdk/dataset/base.py (1)

BaseDatasetEntity (6-23)

packages/traceloop-sdk/traceloop/sdk/client/http.py (4)

HTTPClient (7-91)

post (23-37)

put (74-91)

get (39-58)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (6)

FileCellType (15-19)

FileStorageType (22-24)

UploadURLRequest (46-51)

UploadURLResponse (54-60)

UploadStatusRequest (63-65)

ExternalURLRequest (68-71)

🪛 Checkov (3.2.334)

packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml

[high] 83-84: AWS Access Key

(CKV_SECRET_2)

🪛 Gitleaks (8.29.0)

packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml

[high] 83-83: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/dataset/row.py

83-83: Create your own exception

(TRY002)

83-83: Avoid specifying long messages outside the exception class

(TRY003)

100-100: Probable use of requests call without timeout

(S113)

118-118: Create your own exception

(TRY002)

118-118: Avoid specifying long messages outside the exception class

(TRY003)

140-140: Create your own exception

(TRY002)

140-140: Avoid specifying long messages outside the exception class

(TRY003)

190-190: Create your own exception

(TRY002)

190-190: Avoid specifying long messages outside the exception class

(TRY003)

194-194: Avoid specifying long messages outside the exception class

(TRY003)

196-196: Avoid specifying long messages outside the exception class

(TRY003)

209-209: Avoid specifying long messages outside the exception class

(TRY003)

230-230: Create your own exception

(TRY002)

230-230: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (6)

packages/traceloop-sdk/README.md (1)

18-127: File-cell usage docs are consistent with the new API

The examples for uploads, external URLs, thumbnails, and validation errors line up with Row.set_file_cell’s signature and behavior; no changes needed.

packages/traceloop-sdk/tests/dataset/test_file_operations.py (2)

10-40: External URL file-cell test matches expected behavior

This test cleanly exercises the external URL path for a FILE column and asserts the right fields (storage="external", url, type="video"), which matches the Row/set_file_cell semantics.

42-65: Validation test correctly covers key argument error paths

The two cases (both file_path and url, and neither provided) map directly to the validation logic in Row.set_file_cell, and the regexes are resilient to small wording changes.

packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_external_url_file_cell.yaml (1)

1-238: External-URL cassette looks clean and matches test flow

The recorded interactions align with test_external_url_file_cell and don’t expose secrets or PII; this fixture looks good as-is.

packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2)

60-85: Upload URL helper follows established HTTP error-handling pattern

_request_file_upload checks for None from self._http.post and raises on failure, matching the existing pattern where HTTP client methods return None on error and callers translate that into exceptions. This keeps behavior consistent across dataset operations.

Based on learnings, this aligns with the Dataset HTTP error-handling convention.

151-197: Public set_file_cell API and validation look good

The set_file_cell signature and docstring are clear, and the mutual‑exclusion checks for file_path/url plus the deleted‑row guard follow existing patterns and give good feedback for misuse.

...raceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml

packages/traceloop-sdk/traceloop/sdk/dataset/row.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 70ba304 in 1 minute and 50 seconds. Click for details.

Reviewed 27 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/dataset/row.py:1

Draft comment:
Good move importing 'requests' at the module level to avoid re-importing in the function.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/traceloop-sdk/traceloop/sdk/dataset/row.py:100

Draft comment:
Consider logging the caught RequestException to aid in debugging instead of silently returning False.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This comment is suggesting a code quality improvement - adding logging to aid debugging. The comment is about the new exception handling code added in the diff. However, I need to consider: 1) Is this clearly required or is it a suggestion/opinion? 2) The caller does handle the failure case by raising an exception with context. 3) The comment uses "Consider" which is softer language, but it's still actionable. 4) Adding logging for caught exceptions is generally a good practice, especially for network operations. 5) However, the rules state "Do NOT comment unless there is clearly a code change required" - is logging clearly required here? It's more of a best practice suggestion than a clear requirement. The code works without it, and the failure is handled at the call site. The comment is a reasonable suggestion for code quality, but it's not clearly required. The code functions correctly without logging, and the failure is already handled by the caller. This might be considered an optional improvement rather than a necessary change. The use of "Consider" suggests it's not mandatory. While logging exceptions is a best practice, the rules explicitly state not to comment unless there is clearly a code change required. This is more of a "nice to have" suggestion rather than a clear requirement. The failure case is already handled appropriately by the caller, which raises an exception with context. This comment should be deleted. While adding logging is a reasonable suggestion, it's not clearly required - the code works correctly without it, and the failure is already handled at the call site. This falls into the category of optional improvements rather than necessary changes.

Workflow ID: wflow_xq5aAMGOpYlYpqHz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nirga

@galzilber this should match the existing APIs that we have. It can't be a separate API

packages/traceloop-sdk/README.md

…nd attachments, and clean up unused methods in Row class

ellipsis-dev

Caution

Changes requested ❌

Reviewed a8ad5fd in 1 minute and 59 seconds. Click for details.

Reviewed 2389 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:286

Draft comment:
Consider refactoring the loop in _prepare_request_for_creation into a dictionary comprehension for brevity and clarity.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

2. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:95

Draft comment:
Attachment processing is done synchronously. For scenarios with large files or many attachments, consider an asynchronous or background upload mechanism to improve responsiveness.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

Workflow ID: wflow_Vt6cCV3LoyHndPh5

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

coderabbitai

Actionable comments posted: 8

♻️ Duplicate comments (2)

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml (1)

85-85: Verify AWS credentials in presigned URLs are scrubbed.

Similar to test_create_dataset_with_file_attachments_mocked.yaml, this cassette contains AWS S3 presigned URLs with temporary security tokens. Per coding guidelines, ensure these are properly scrubbed.

packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml (1)

85-85: Verify AWS credentials in presigned URLs are scrubbed.

This cassette also contains AWS S3 presigned URLs with temporary security tokens. Ensure compliance with the coding guideline to never commit secrets in VCR cassettes.

Also applies to: 261-261

🧹 Nitpick comments (3)

packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py (1)
8-20: Consider sorting __all__ for consistency.

The static analysis tool suggests applying an isort-style sorting to __all__. While not critical, alphabetical ordering improves maintainability and reduces merge conflicts.

Apply this diff to sort the exports:
 __all__ = [
-    "Dataset",
-    "Column",
-    "Row",
-    "BaseDatasetEntity",
-    "ColumnType",
-    "DatasetMetadata",
-    "FileCellType",
-    "FileStorageType",
     "Attachment",
+    "AttachmentReference",
+    "BaseDatasetEntity",
+    "Column",
+    "ColumnType",
+    "Dataset",
+    "DatasetMetadata",
     "ExternalAttachment",
-    "AttachmentReference",
+    "FileCellType",
+    "FileStorageType",
+    "Row",
 ]
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
319-319: Consider using logging instead of print statements.

Library code should use Python's logging module rather than print statements for warnings, allowing consumers to control log output and levels.

Add logging at the module level:
import logging

logger = logging.getLogger(__name__)
Then replace print statements:
-                print(f"Warning: Row index {row_idx} out of range, skipping attachments")
+                logger.warning(f"Row index {row_idx} out of range, skipping attachments")
-                    print(f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}")
+                    logger.warning(f"Failed to process attachment for row {row_idx}, column {col_slug}: {e}")
Also applies to: 346-346
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (1)

465-469: TODO noted: Internal file download URL generation.

The comment acknowledges that getting presigned download URLs for internal files requires backend API support. This is a known limitation.

Do you want me to open a tracking issue for implementing the backend endpoint and completing this functionality?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cc20bc1 and a8ad5fd.

📒 Files selected for processing (11)

packages/traceloop-sdk/README.md (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_external_attachment.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml (1 hunks)
packages/traceloop-sdk/tests/dataset/test_attachments.py (1 hunks)
packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)

✅ Files skipped from review due to trivial changes (1)

packages/traceloop-sdk/README.md

🧰 Additional context used

📓 Path-based instructions (2)

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_external_attachment.yaml

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py
packages/traceloop-sdk/tests/dataset/test_attachments.py

🧬 Code graph analysis (1)

packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py (2)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)

ColumnType (7-12)

DatasetMetadata (150-158)

FileCellType (15-19)

FileStorageType (22-24)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)

Attachment (24-274)

ExternalAttachment (277-361)

AttachmentReference (364-478)

🪛 Checkov (3.2.334)

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml

[high] 85-86: AWS Access Key

(CKV_SECRET_2)

packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml

[high] 85-86: AWS Access Key

(CKV_SECRET_2)

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml

[high] 86-87: AWS Access Key

(CKV_SECRET_2)

🪛 Flake8 (7.3.0)

packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py

[error] 6-6: 'unittest.mock.Mock' imported but unused

(F401)

[error] 7-7: 'traceloop.sdk.dataset.FileStorageType' imported but unused

(F401)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

[error] 287-287: 'copy' imported but unused

(F401)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py

[error] 11-11: 'datetime.datetime' imported but unused

(F401)

packages/traceloop-sdk/tests/dataset/test_attachments.py

[error] 6-6: 'unittest.mock.Mock' imported but unused

(F401)

🪛 Gitleaks (8.29.0)

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml

[high] 85-85: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml

[high] 86-86: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

[high] 202-202: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

345-345: Do not catch blind exception: Exception

(BLE001)

packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py

8-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py

67-67: Avoid specifying long messages outside the exception class

(TRY003)

69-69: Avoid specifying long messages outside the exception class

(TRY003)

121-121: Avoid specifying long messages outside the exception class

(TRY003)

125-125: Avoid specifying long messages outside the exception class

(TRY003)

174-174: Create your own exception

(TRY002)

174-174: Avoid specifying long messages outside the exception class

(TRY003)

221-221: Create your own exception

(TRY002)

221-221: Avoid specifying long messages outside the exception class

(TRY003)

230-230: Probable use of requests call without timeout

(S113)

231-231: Consider moving this statement to an else block

(TRY300)

232-232: Do not catch blind exception: Exception

(BLE001)

249-249: Probable use of requests call without timeout

(S113)

250-250: Consider moving this statement to an else block

(TRY300)

251-251: Do not catch blind exception: Exception

(BLE001)

272-272: Create your own exception

(TRY002)

272-272: Avoid specifying long messages outside the exception class

(TRY003)

352-352: Create your own exception

(TRY002)

352-352: Avoid specifying long messages outside the exception class

(TRY003)

428-428: Create your own exception

(TRY002)

428-428: Avoid specifying long messages outside the exception class

(TRY003)

430-430: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.12)
GitHub Check: Lint
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (6)

packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_external_attachment.yaml (1)

1-242: LGTM! Clean cassette with no sensitive data.

The VCR cassette correctly captures the external attachment workflow without exposing secrets or PII. The example YouTube URLs and test metadata are appropriate for test fixtures.

packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml (1)

1-245: LGTM! External attachments cassette is clean.

The cassette correctly captures external attachment workflows using only external URLs without any sensitive credentials or PII.

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)

61-103: Well-structured attachment workflow!

The implementation correctly separates concerns: extract attachments, clean the request, create the dataset, then process attachments. The docstring with example usage is helpful.

254-269: LGTM! Clean attachment extraction logic.

The nested dictionary structure correctly maps row indices to column slugs to attachment objects, making it easy to process attachments after dataset creation.

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (2)

24-99: Excellent validation and initialization logic!

The constructor properly validates mutual exclusivity of file_path and data, and intelligently determines filename, content type, and file type with sensible defaults.

277-361: Clean and straightforward external attachment implementation!

The ExternalAttachment class provides a simple, declarative API for linking external URLs. The implementation correctly delegates to the backend API for validation and storage.

...assettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml

packages/traceloop-sdk/tests/dataset/test_attachments.py

packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

…matting in attachment tests

ellipsis-dev

Important

Looks good to me! 👍

Reviewed c284acf in 1 minute and 17 seconds. Click for details.

Reviewed 191 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/README.md:17

Draft comment:
The attachments documentation was completely removed from the README. Since the feature adds file cell support, please ensure that updated documentation or a link to external docs is provided.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/traceloop-sdk/README.md:185

Draft comment:
Please add a newline at the end of the file to adhere to best practices.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_sdTbQh8kYcjnKsRj

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 469700c in 1 minute and 14 seconds. Click for details.

Reviewed 100 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/tests/dataset/test_attachments.py:6

Draft comment:
Removed unused 'Mock' import; only 'patch' is used.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it simply states that an unused import was removed. It doesn't provide a suggestion or ask for confirmation about a specific change. According to the rules, purely informative comments should be removed.

2. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:6

Draft comment:
Removed unused 'Mock' and 'FileStorageType' imports for cleaner test code.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, stating that unused imports were removed for cleaner code. It doesn't provide any actionable feedback or suggestions for improvement.

3. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:7

Draft comment:
Removed unused 'datetime' import to reduce clutter.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:88

Draft comment:
Refactored content type assignment with explicit parenthesization for better readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:168

Draft comment:
Improved formatting of _confirm_upload() call parameters for consistency and clarity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:182

Draft comment:
Standardized multi-line formatting in _confirm_upload() call with metadata for improved readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_eiiyOan97ZWggHaR

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 01d0f22 in 1 minute and 47 seconds. Click for details.

Reviewed 89 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:1

Draft comment:
Reordered typing imports for improved readability. This is a pure style change.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:10

Draft comment:
Grouped and re-ordered imports (HTTPClient, Attachment, ExternalAttachment, Dataset) for clarity; duplicate/remnant imports removed.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:251

Draft comment:
Refactored _extract_attachments signature to a multi-line format for enhanced readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:287

Draft comment:
Inserted a blank line after 'import copy' in _prepare_request_for_creation to improve code clarity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:304

Draft comment:
Added a trailing comma in the CreateDatasetRequest call for consistency and easier future diffing.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:321

Draft comment:
Reformatted the warning print for out-of-range row indexes into a multi-line format. Consider using a logging framework for better control over warning outputs.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

7. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:348

Draft comment:
Reformatted the print in the exception handling block to enhance readability. Also, consider replacing print with a logging library in production.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

8. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:359

Draft comment:
Added a trailing comma in the error dictionary literal to conform with common style guidelines.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_aM3JOjEEDUDZxk7I

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed cc0c5b6 in 2 minutes and 1 seconds. Click for details.

Reviewed 126 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:101

Draft comment:
Consider using a context manager or a fixture for creating temporary files to ensure cleanup even if a test fails.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:39

Draft comment:
Consider adding assertions to verify that the metadata provided in ExternalAttachment is correctly processed.
Reason this comment was not posted:
Comment was on unchanged code.

3. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:136

Draft comment:
It might be beneficial to add a test case for S3 upload failure to ensure graceful error handling.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_TaBldfGZcToFyriU

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed fd39dfe in 46 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:286

Draft comment:
Good removal of unused 'import copy'. Consider updating the comment to indicate that a shallow copy is being built, to avoid confusion if a true deep copy is required.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_OLwGKWOfyqH6mFeY

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)

320-322: Use logging instead of print() for diagnostics.

This warning should use a proper logging mechanism (e.g., logging.warning) for better production diagnostics and log aggregation.

As per past review comments, this has already been flagged.

349-351: Use logging instead of print() for diagnostics.

This warning should use a proper logging mechanism (e.g., logging.warning) for better production diagnostics and log aggregation.

As per past review comments, this has already been flagged.

🧹 Nitpick comments (1)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)

288-297: Consider using copy.deepcopy for truly nested-safe copying.

The manual iteration creates a shallow copy for non-attachment values. If row values contain nested dicts or lists (e.g., {"col": {"nested": "value"}}), the nested objects would be shared between the original and cleaned request. This could lead to unintended mutations if the original request is modified elsewhere.

If nested values are possible, apply this diff:

+        import copy
+
         # Create a deep copy of rows to avoid modifying the original
-        clean_rows = []
-        for row in request.rows:
-            clean_row = {}
-            for col_slug, value in row.items():
-                if isinstance(value, (Attachment, ExternalAttachment)):
-                    clean_row[col_slug] = None
-                else:
-                    clean_row[col_slug] = value
-            clean_rows.append(clean_row)
+        clean_rows = copy.deepcopy(request.rows)
+        for row in clean_rows:
+            for col_slug, value in list(row.items()):
+                if isinstance(value, (Attachment, ExternalAttachment)):
+                    row[col_slug] = None

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cc0c5b6 and fd39dfe.

📒 Files selected for processing (1)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

348-348: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint
GitHub Check: Test Packages (3.12)

🔇 Additional comments (4)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)

3-23: LGTM!

The new imports are clean and all are used by the attachment handling functionality added in this file.

61-103: LGTM!

The three-phase workflow (extract, create, post-process) is well-structured and the comprehensive docstring with example usage is helpful.

254-271: LGTM!

The extraction logic correctly identifies and collects attachment objects from row values.

328-357: Broad exception catch is acceptable for graceful degradation.

While static analysis flags the broad Exception catch at line 348, it's appropriate here—you want to prevent one attachment failure from breaking the entire dataset creation. The error state is properly recorded in row values (lines 352-357) for debugging.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 443c64f in 2 minutes and 38 seconds. Click for details.

Reviewed 1857 lines of code in 7 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:32

Draft comment:
Consider supporting file-like objects (e.g. BinaryIO) along with bytes for the 'data' parameter. Removing BinaryIO may limit use cases with streaming or large file uploads.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is making a suggestion about API design - that supporting BinaryIO would be beneficial for certain use cases. However, looking at the rules, I need to consider: 1) Is this about a code change? Yes, the type was explicitly changed. 2) Is this clearly a code change required? This is more of a "consider" suggestion, not a definitive issue. 3) Is this speculative? The comment says "may limit use cases" which is somewhat speculative. 4) The PR appears to be a simplification (as noted in the docstring change "Simplified implementation"), so removing BinaryIO support might be intentional. The comment doesn't point to a clear bug or problem, just suggests reconsidering a design decision that was deliberately made. The author explicitly simplified this code and removed BinaryIO support, which suggests this was an intentional design decision. The comment is asking them to reconsider that decision without strong evidence that it's wrong. This falls into the category of "speculative" comments about potential use cases rather than identifying a concrete problem. While the removal was likely intentional, the comment does raise a valid point about API capabilities. However, without evidence that BinaryIO support is actually needed (e.g., existing code that breaks, or a clear requirement), this is more of a design preference suggestion rather than identifying a clear issue that needs to be fixed. This comment should be deleted. It's a "consider" suggestion about a design decision that was deliberately made as part of simplification. There's no strong evidence that BinaryIO support is required, and the comment is speculative about "may limit use cases" without concrete examples.

2. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:138

Draft comment:
Thumbnail upload logic now only uses 'thumbnail_path'. The 'thumbnail_data' parameter is ignored. Consider supporting in-memory thumbnail data as well.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This appears to be a valid observation - the code accepts thumbnail_data as a parameter but never uses it. However, I need to consider the rules: 1) This is about code that was changed (the thumbnail upload logic was simplified), 2) The comment is suggesting a feature addition ("Consider supporting...") rather than pointing out a bug, 3) The PR seems to be intentionally simplifying the code (removing BinaryIO support, removing examples, etc.), so ignoring thumbnail_data might be intentional, 4) The comment doesn't clearly indicate a required code change - it says "Consider supporting" which is speculative/suggestive rather than definitive. The comment might be pointing out intentional simplification rather than an oversight. The PR appears to be deliberately simplifying the implementation (removing BinaryIO support, removing helper methods, etc.). The author may have intentionally removed thumbnail_data support while keeping the parameter for backward compatibility or future use. While the simplification might be intentional, accepting a parameter that is completely ignored is poor API design and could confuse users. However, the comment uses "Consider supporting" language which makes it a suggestion rather than identifying a clear bug. According to the rules, I should not keep speculative or suggestive comments that don't clearly require a code change. The comment is suggestive ("Consider supporting...") rather than identifying a definite issue. While it's true that thumbnail_data is ignored, this appears to be part of an intentional simplification. The comment doesn't clearly indicate required action and reads more as a feature request than a bug report.

3. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:268

Draft comment:
The get_url() method for internal storage always returns None. Consider adding a TODO note or implementing presigned URL generation for internal attachments.
Reason this comment was not posted:
Comment looked like it was already resolved.

4. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:147

Draft comment:
The upload confirmation call (HTTP PUT) doesn't validate the response. It might be beneficial to check the server response or handle errors more explicitly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment uses language like "might be beneficial" which is speculative and suggestive rather than definitive. According to the rules, I should not keep comments that are speculative or that don't clearly require a code change. The comment doesn't say there's definitely a bug or problem - it just suggests it "might be beneficial" to add validation. This is a code quality suggestion but not a clear requirement. Looking at the diff, the previous version had a _confirm_upload method that did check result.get("success", False), but the new simplified version removes this check. However, the comment doesn't point out that functionality was lost - it just makes a general suggestion about validation. The rules state "Do NOT comment unless there is clearly a code change required" and this seems more like an optional improvement suggestion. However, looking at the diff more carefully, the old code DID validate the response with return result.get("success", False) and would raise an exception if result was None. The new code removes this validation entirely. So this might actually be pointing out that error handling was removed, which could be a legitimate issue rather than just a speculative suggestion. While the old code had validation, the PR author intentionally simplified this code (as stated in the docstring "Simplified implementation"). The fact that they removed the validation suggests it was intentional. The comment doesn't definitively say this is wrong - it uses weak language like "might be beneficial". Without strong evidence that this is definitely a bug or problem, and given the speculative nature of the comment, it should be deleted per the rules. The comment should be deleted because it uses speculative language ("might be beneficial") and doesn't clearly identify a required code change. While validation was removed from the previous version, the simplified implementation appears intentional, and the comment doesn't provide strong evidence that this is definitely problematic.

5. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:166

Draft comment:
Consider logging exceptions in _upload_to_s3 instead of silently returning False. This could help diagnose issues during file uploads.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is suggesting a code quality improvement - adding logging instead of silently swallowing exceptions. However, looking at the diff, the exception handling pattern in _upload_to_s3 wasn't actually changed - it existed before and still exists after. The diff shows the method was simplified (removed file-like object handling), but the try/except pattern returning False on exception was present in both versions. Since the comment is about code that wasn't changed in this diff, it violates the rule that comments should be about changes made by the diff. While the suggestion to add logging is reasonable from a code quality perspective, I need to verify whether this exception handling pattern was actually introduced or modified in this diff. If it was pre-existing and unchanged, the comment should be deleted per the rules. Looking at the diff more carefully, the _upload_to_s3 method's exception handling appears in both the old and new versions. The diff shows the method was refactored but the except Exception: return False pattern was not introduced or modified in this change - it was already there. Therefore, this comment is about unchanged code and should be deleted. The comment should be deleted because it's about code that wasn't changed in this diff. The exception handling pattern in _upload_to_s3 existed before this PR and remains the same after.

Workflow ID: wflow_fvA1MYY4b1ZaohJC

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (2)
109-179: Add timeouts and tighten error handling for upload and thumbnail requests

The requests.put calls for the main file upload (Line 173) and thumbnail upload (Line 150) have no timeout, so a hung S3/presigned endpoint can block indefinitely. Static analysis (S113) and prior review comments already flagged this. Also, _upload_to_s3 swallows all exceptions with except Exception, losing the root cause.

I recommend:

Add explicit timeouts for both uploads (e.g., 300s for main file, 60s for thumbnail).

Narrow the exception handler in _upload_to_s3 to requests.RequestException so file/OS errors still surface.

Example diff:
-            if thumb_bytes is not None:
-                requests.put(upload_response.thumbnail_upload_url, data=thumb_bytes)
+            if thumb_bytes is not None:
+                requests.put(
+                    upload_response.thumbnail_upload_url,
+                    data=thumb_bytes,
+                    timeout=60,
+                )

@@
-        try:
-            file_data = self._get_file_data()
-            response = requests.put(
-                upload_url, data=file_data, headers={"Content-Type": self.content_type}
-            )
-            return response.status_code in [200, 201, 204]
-        except Exception:
-            return False
+        try:
+            file_data = self._get_file_data()
+            response = requests.put(
+                upload_url,
+                data=file_data,
+                headers={"Content-Type": self.content_type},
+                timeout=300,
+            )
+            return response.status_code in [200, 201, 204]
+        except requests.RequestException:
+            return False
Optionally, you might also check the result of the upload-status http_client.put and raise if it fails, so callers are aware when confirmation does not persist.

255-264: Add timeout and consider a dedicated helper/exception for downloads

The data property issues a raw requests.get(download_url) without a timeout (Line 261), so a stuck endpoint can hang indefinitely. This was previously flagged for _download_data; the same concern applies here, and Ruff reports S113.

Suggested change:
-            response = requests.get(download_url)
+            response = requests.get(download_url, timeout=300)
             response.raise_for_status()
             self._cached_data = response.content
You might also want to:

Extract this into a small _download_data() helper (as you previously had) to centralize future tweaks (retries, streaming, etc.).

Replace raise Exception("No download URL available") with a more specific custom exception (e.g., AttachmentError) to make error handling clearer.

🧹 Nitpick comments (3)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)

90-107: Consider streaming large files instead of fully loading into memory

_get_file_data and _get_file_size read the entire file into memory (f.read() / len(self.data)), which is fine for small/medium attachments but can become costly for large files. A future improvement would be to support streaming (e.g., passing a file-like object/chunk iterator to the upload path) to avoid holding the whole file in memory at once. This can be deferred if datasets are expected to carry relatively small blobs.

186-227: Clarify use of content_type/filename and server response in ExternalAttachment.attach

A couple of minor design points here:

content_type and filename are stored on the instance but never sent to the API or surfaced in the returned AttachmentReference. If those fields are meant to influence server-side behavior or downstream display, consider wiring them into metadata or the request model; otherwise they are effectively unused.

The attach method only checks that result is truthy and then ignores any details the server might return (e.g., a storage key or normalized URL). If the API ever exposes richer information, you may want to capture it in the AttachmentReference instead of just echoing self.url and self.metadata.

These are not blockers but worth aligning with how you expect external attachments to be represented long-term.

276-281: Internal attachment downloads are currently unsupported; consider making this explicit

get_url returns the external URL for FileStorageType.EXTERNAL, but always None for internal storage, which makes AttachmentReference.data unusable for internal attachments created by Attachment.upload (it will raise “No download URL available”).

If that’s intentional for now, consider:

Documenting explicitly that AttachmentReference.data/download() only work for external attachments today, or

Raising a more explicit NotImplementedError/custom exception for INTERNAL storage, or

Implementing presigned URL generation for internal storage using storage_key and http_client when available.

This will make the API behavior less surprising for callers who obtain an AttachmentReference from Attachment.upload.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3543f6a and cee7fb7.

📒 Files selected for processing (1)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py

🧬 Code graph analysis (1)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (2)

packages/traceloop-sdk/traceloop/sdk/client/http.py (2)

HTTPClient (7-91)

put (74-91)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (6)

ExternalURLRequest (69-72)

FileCellType (16-20)

FileStorageType (23-25)

UploadStatusRequest (64-66)

UploadURLRequest (47-52)

UploadURLResponse (55-61)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py

44-44: Avoid specifying long messages outside the exception class

(TRY003)

46-46: Avoid specifying long messages outside the exception class

(TRY003)

96-96: Avoid specifying long messages outside the exception class

(TRY003)

99-99: Avoid specifying long messages outside the exception class

(TRY003)

132-132: Create your own exception

(TRY002)

132-132: Avoid specifying long messages outside the exception class

(TRY003)

138-138: Create your own exception

(TRY002)

138-138: Avoid specifying long messages outside the exception class

(TRY003)

150-150: Probable use of requests call without timeout

(S113)

173-173: Probable use of requests call without timeout

(S113)

176-176: Consider moving this statement to an else block

(TRY300)

177-177: Do not catch blind exception: Exception

(BLE001)

220-220: Create your own exception

(TRY002)

220-220: Avoid specifying long messages outside the exception class

(TRY003)

260-260: Create your own exception

(TRY002)

260-260: Avoid specifying long messages outside the exception class

(TRY003)

261-261: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (1)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (1)

30-78: Constructor validation and type inference look good

The mutual exclusion and required checks for file_path vs data, plus filename/content-type and FileCellType inference, are coherent and defensive. No changes needed here.

nirga

Add sample app

nirga · 2025-11-25T08:23:48Z

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

+                        "metadata": ref.metadata,
+                    }
+                except Exception as e:
+                    print(


do not print, use logger

Wait for doron

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 77b8c26 in 57 seconds. Click for details.

Reviewed 363 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/examples/dataset_attachments_example.py:32

Draft comment:
Consider centralizing the SDK initialization (Traceloop.init) in main() rather than only in one example, to ensure consistent setup across all examples.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/traceloop-sdk/examples/dataset_attachments_example.py:109

Draft comment:
Consider using a context manager for temporary files, if possible, to manage cleanup automatically rather than relying on manual os.unlink calls.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. packages/traceloop-sdk/examples/dataset_attachments_example.py:180

Draft comment:
When cleaning up temporary files (using os.unlink), consider checking for file existence to gracefully handle any unexpected cleanup errors.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_ypNf8Db7jyOhg3dy

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

packages/traceloop-sdk/examples/dataset_attachments_example.py (3)
1-10: Align shebang usage with how the script is invoked

Ruff flags EXE001 because the file has a shebang but (in the repo) isn’t executable. If this script is meant to be run as python dataset_attachments_example.py, consider dropping the shebang; if you want ./dataset_attachments_example.py, make sure the file is committed with the executable bit.

Example change if you choose to remove it:
-#!/usr/bin/env python3
28-35: Consider centralizing Traceloop.init in main() instead of inside one example

Right now only example_external_attachments() calls Traceloop.init, so calling other example_* functions directly (e.g., from a REPL) will likely fail if Traceloop wasn’t initialized elsewhere. Initializing once in main() and assuming it in the examples would be more robust and clearer.

Suggested change inside example_external_attachments:
-    # Initialize Traceloop
-    Traceloop.init(app_name="attachment-demo")
-    datasets = Datasets()
+    # Datasets client (assumes Traceloop.init(...) was called by the caller)
+    datasets = Datasets()
And then in main() (outside this hunk), initialize once:
def main():
    """Run all examples."""
    print("=" * 60)
    print("Traceloop SDK Attachment Feature Examples")
    print("=" * 60)

    Traceloop.init(app_name="attachment-demo")
    ...
334-336: Document API key configuration without implying hardcoding in code

To better align with the “no hardcoded secrets” guideline, it’s clearer to show setting the API key via the shell environment rather than via os.environ in code, even in comments.
-    # Set your API key
-    # os.environ["TRACELOOP_API_KEY"] = "your-api-key-here"
+    # Ensure TRACELOOP_API_KEY is set in your environment, e.g.:
+    # export TRACELOOP_API_KEY="your-api-key-here"
As per coding guidelines, this keeps secrets configuration outside the code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cee7fb7 and 77b8c26.

📒 Files selected for processing (1)

packages/traceloop-sdk/examples/dataset_attachments_example.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/examples/dataset_attachments_example.py

🧬 Code graph analysis (1)

packages/traceloop-sdk/examples/dataset_attachments_example.py (2)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)

Attachment (24-178)

ExternalAttachment (181-227)

data (255-264)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)

FileCellType (16-20)

ColumnDefinition (75-78)

ColumnType (8-13)

CreateDatasetRequest (84-89)

🪛 Flake8 (7.3.0)

packages/traceloop-sdk/examples/dataset_attachments_example.py

[error] 239-239: f-string is missing placeholders

(F541)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/examples/dataset_attachments_example.py

1-1: Shebang is present but file is not executable

(EXE001)

239-239: f-string without any placeholders

Remove extraneous f prefix

(F541)

351-351: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Lint
GitHub Check: Build Packages (3.11)

🔇 Additional comments (1)

packages/traceloop-sdk/examples/dataset_attachments_example.py (1)

28-310: Example workflows look consistent with the attachment API

The four example_* functions use ColumnDefinition, ColumnType.FILE, Attachment, ExternalAttachment, and FileCellType in a way that’s consistent with the dataset and attachment models, and the column slugs/row keys line up correctly. No functional issues jump out here.

packages/traceloop-sdk/examples/dataset_attachments_example.py

packages/sample-app/sample_app/dataset_attachments_example.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 65c207d in 1 minute and 11 seconds. Click for details.

Reviewed 40 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/examples/dataset_attachments_example.py:23

Draft comment:
Reordering the import of 'Datasets' is fine, but please ensure consistent grouping of related imports per project style guidelines.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/traceloop-sdk/examples/dataset_attachments_example.py:184

Draft comment:
The header print for 'In-Memory Data Attachments' has been removed. For consistency with other examples, consider re-adding a header to clearly delineate this example.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/traceloop-sdk/examples/dataset_attachments_example.py:236

Draft comment:
Changed from an f-string to a plain string for printing attachment status. Since no interpolation is needed, this change is purely stylistic.
Reason this comment was not posted:
Confidence changes required: 10% <= threshold 50% None

Workflow ID: wflow_9FXxNmCd5xEXc87u

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/traceloop-sdk/examples/dataset_attachments_example.py (1)
339-352: Broad except Exception in main still trips BLE001 (repeat of prior feedback)

The catch‑all around the example runner is reasonable for UX, but it continues to trigger Ruff BLE001 and was already flagged in an earlier review.

To keep the friendly error while satisfying lint, you can explicitly document/suppress the broad catch:
-    except Exception as e:
-        print(f"\nError: {e}")
-        print("Make sure to set TRACELOOP_API_KEY environment variable")
+    except Exception as e:  # noqa: BLE001  # Broad catch is intentional for this example script
+        print(f"\nError: {e}")
+        print("Make sure to set TRACELOOP_API_KEY environment variable")
If the SDK exposes specific exception types for auth/config errors, narrowing the except to those would be even better.

🧹 Nitpick comments (3)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
276-309: Preserve future CreateDatasetRequest fields when cleaning rows

Right now _prepare_request_for_creation reconstructs CreateDatasetRequest by manually listing fields. If new fields are later added to CreateDatasetRequest (e.g., tags, options), they’ll be silently dropped here.

You can make this future‑proof by cloning the model and only updating rows:
-        # Create a new request with cleaned rows
-        return CreateDatasetRequest(
-            slug=request.slug,
-            name=request.name,
-            description=request.description,
-            columns=request.columns,
-            rows=clean_rows,
-        )
+        # Create a new request with cleaned rows, preserving any additional fields
+        return request.model_copy(update={"rows": clean_rows})
+        # If you are still on Pydantic < 2, use instead:
+        # return request.copy(update={"rows": clean_rows})
Please pick model_copy vs copy according to the Pydantic version actually used in this project.

311-360: Tighten or explicitly document the broad Exception catch in _process_attachments

Catching bare Exception here triggers Ruff BLE001 and can hide unexpected programmer errors, even though you likely want best‑effort behavior per cell.

A balanced approach is to (a) keep the catch‑all for UX, (b) use structured logging rather than an f-string, and (c) explicitly mark the broad catch as intentional:
-                except Exception as e:
-                    logger.warning(
-                        f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}"
-                    )
+                except Exception as e:  # noqa: BLE001  # Broad catch is intentional: attachment failures shouldn't break dataset creation
+                    logger.warning(
+                        "Failed to process attachment for row %s, column %s: %s",
+                        row_idx,
+                        col_slug,
+                        e,
+                    )
If the attachment methods expose well-defined exception types, narrowing the except clause to those would be even better; otherwise the above makes the catch‑all explicit and lint‑clean.
packages/traceloop-sdk/examples/dataset_attachments_example.py (1)
1-1: Address shebang vs executability (EXE001)

Ruff flags the shebang because this file isn’t marked executable. For a library example that’s typically run via python dataset_attachments_example.py, it’s simplest to drop the shebang:
-#!/usr/bin/env python3
Alternatively, ensure the file is installed with executable permissions so the shebang is meaningful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 77b8c26 and 65c207d.

📒 Files selected for processing (2)

packages/traceloop-sdk/examples/dataset_attachments_example.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
packages/traceloop-sdk/examples/dataset_attachments_example.py

🧬 Code graph analysis (1)

packages/traceloop-sdk/examples/dataset_attachments_example.py (4)

packages/traceloop-sdk/traceloop/sdk/__init__.py (1)

init (49-206)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)

Attachment (24-178)

ExternalAttachment (181-227)

data (255-264)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)

FileCellType (16-20)

ColumnDefinition (75-78)

ColumnType (8-13)

CreateDatasetRequest (84-89)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)

Datasets (29-360)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

351-351: Do not catch blind exception: Exception

(BLE001)

packages/traceloop-sdk/examples/dataset_attachments_example.py

1-1: Shebang is present but file is not executable

(EXE001)

350-350: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)

🔇 Additional comments (3)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (3)

2-26: Imports and logger wiring look correct

Using logging plus a module-level logger and importing Attachment/ExternalAttachment and ValuesMap cleanly set up the rest of the attachment-aware workflow. No issues here.

64-106: Attachment-aware create() flow is sound

The create() method cleanly separates concerns: it extracts attachments, sends a “clean” CreateDatasetRequest without attachment objects, then post-processes the created Dataset to upload/link attachments and update row values. This preserves server-side validation and keeps attachment failures localized.

257-275: Attachment extraction logic is straightforward and correct

_extract_attachments correctly walks request.rows, capturing any Attachment/ExternalAttachment instances into a {row_idx: {col_slug: attachment}} map without mutating the original request. This matches how _process_attachments later consumes the structure.

packages/sample-app/sample_app/dataset_attachments_example.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
282-315: Consider using the request model’s copy functionality instead of rebuilding by hand.

Manually reconstructing CreateDatasetRequest with only slug, name, description, columns, and rows makes _prepare_request_for_creation brittle if the model gains new fields (e.g., tags, settings) in the future. You can future‑proof this by cloning and updating just rows via the Pydantic copy API instead of re‑passing individual fields.

For example (adapt to your Pydantic version):
# Pydantic v1 style:
clean_request = request.copy(update={"rows": clean_rows})

# or Pydantic v2 style:
clean_request = request.model_copy(update={"rows": clean_rows})
This keeps all other fields intact while still stripping attachments from rows.

317-366: Align broad except Exception with Ruff BLE001 and improve logging detail.

Catching a bare Exception here is intentional (best‑effort attachment processing), but Ruff flags this as BLE001. Making this explicit and using parameterized logging would both satisfy the linter and improve diagnostics.

One option that preserves behavior:
-                except Exception as e:
-                    logger.warning(
-                        f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}"
-                    )
+                except Exception as e:  # noqa: BLE001
+                    logger.warning(
+                        "Failed to process attachment for row %s, column %s: %s",
+                        row_idx,
+                        col_slug,
+                        e,
+                    )
This keeps the “catch‑all” semantics while making the linter suppression explicit and using logger‑style formatting. If you know the concrete exception types upload/attach can raise, narrowing the except to those would be even better.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 65c207d and cd925a4.

📒 Files selected for processing (1)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

🪛 Flake8 (7.3.0)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

[error] 5-5: redefinition of unused 'List' from line 3

(F811)

[error] 5-5: redefinition of unused 'Optional' from line 3

(F811)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

5-5: Redefinition of unused List from line 3

Remove definition: List

(F811)

5-5: Redefinition of unused Optional from line 3

Remove definition: Optional

(F811)

357-357: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (2)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)

65-107: Datasets.create attachment flow looks coherent and side‑effect free.

The sequence of extracting attachments, building a cleaned request, creating the dataset, and then post‑processing attachments is logically sound and leaves the original CreateDatasetRequest unmodified. Using the new helpers keeps the method readable.

263-280: Attachment extraction helper is clear and non‑mutating.

_extract_attachments correctly walks request.rows, identifies Attachment/ExternalAttachment instances, and builds a {row_idx: {col_slug: attachment}} map without mutating the original rows. This matches how _process_attachments later consumes the structure.

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed e08cf4b in 41 seconds. Click for details.

Reviewed 14 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:3

Draft comment:
Good consolidation of duplicate 'typing' imports into a single statement. Consider ordering the imported names alphabetically for consistency.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_zRYJMHIXbQxQuJuz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
262-279: LGTM! Attachment extraction logic is sound.

The method correctly identifies and extracts Attachment and ExternalAttachment objects from row values.

Optional: Consider a more specific return type for improved type safety:
-    ) -> Dict[int, Dict[str, Any]]:
+    ) -> Dict[int, Dict[str, Union[Attachment, ExternalAttachment]]]:
You would need to add Union to the typing imports.

316-365: Consider more specific exception handling for better error visibility.

The broad Exception catch on line 356 enables resilient bulk processing (logging failures and continuing), but it may mask unexpected errors.

Consider catching more specific exceptions or adding an exception type check:
-                except Exception as e:
+                except (IOError, OSError, RuntimeError, ValueError) as e:
                     logger.warning(
                         f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}"
                     )
If you need to catch all exceptions for resilience, document the rationale:
-                except Exception as e:
+                except Exception as e:  # Catch all to continue processing remaining attachments
                     logger.warning(
                         f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}"
                     )
Alternatively, consider collecting and returning failed attachments so callers can decide how to handle them.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cd925a4 and e08cf4b.

📒 Files selected for processing (1)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

🧬 Code graph analysis (1)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)

packages/traceloop-sdk/traceloop/sdk/client/http.py (1)

HTTPClient (7-91)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (4)

Attachment (24-178)

ExternalAttachment (181-227)

upload (109-167)

attach (200-227)

packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py (2)

Dataset (20-112)

from_create_dataset_response (39-52)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)

ColumnDefinition (75-78)

ColumnType (8-13)

CreateDatasetRequest (84-89)

CreateDatasetResponse (99-108)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

356-356: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (3)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (3)

2-26: LGTM! Import consolidation and logger initialization look good.

The duplicate typing imports have been properly fixed, and logger initialization follows Python best practices.

64-106: LGTM! Well-structured dataset creation flow with attachment support.

The orchestration logic cleanly separates concerns: extract attachments, sanitize the request, create the dataset, then process attachments. The comprehensive docstring with example aids maintainability.

281-314: LGTM! Request sanitization preserves immutability and correctly replaces attachments.

The manual deep copy approach is appropriate, and the method leaves the original request unmodified while preparing a clean version for dataset creation.

…g in attachment and datasets modules

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 5cee01d in 1 minute and 12 seconds. Click for details.

Reviewed 62 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:249

Draft comment:
Added explicit type annotation for _cached_data for clarity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:265

Draft comment:
Renamed local variable to 'file_data' in download() for clearer intent.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:271

Draft comment:
Added explicit type annotation for 'attachments' in _extract_attachments.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:297

Draft comment:
Explicitly typed 'clean_row' in _prepare_request_for_creation.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:323

Draft comment:
Added guard in _process_attachments to immediately exit if dataset.rows is empty.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_qsHKx3HMawwyLToo

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Skipped PR review on 9b00146 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (5)

packages/sample-app/sample_app/dataset_attachments_example.py (5)
350-352: Address broad except Exception for better error handling.

The broad exception catch makes it difficult to distinguish between different failure modes (e.g., TypeError from incorrect Datasets() instantiation vs. network errors vs. authentication failures).

For a demo script, if you intentionally want a catch-all for user experience, make it explicit:
-    except Exception as e:
+    except Exception as e:  # noqa: BLE001  # Broad catch is intentional for this example script
         print(f"\nError: {e}")
-        print("Make sure to set TRACELOOP_API_KEY environment variable")
+        print("Make sure TRACELOOP_API_KEY environment variable is set")
+        print("For detailed error information, run examples individually")
Alternatively, narrow to expected exception types:
except (TypeError, RuntimeError, requests.RequestException) as e:
    print(f"\nError: {e}")
    print("Make sure TRACELOOP_API_KEY environment variable is set")
34-35: Fix Datasets() instantiation - missing required HTTPClient parameter.

The Datasets class requires an HTTPClient parameter in its __init__ method (see packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py line 36). Calling Datasets() without arguments will raise TypeError: __init__() missing 1 required positional argument: 'http'.

The correct pattern is to use the datasets attribute from the initialized Traceloop client:

Apply this diff:
-    Traceloop.init(app_name="attachment-demo")
-    datasets = Datasets()
+    client = Traceloop.init(app_name="attachment-demo")
+    datasets = client.datasets
This same issue also affects lines 119, 188, and 251.

Based on learnings, the established pattern in the Traceloop SDK is to obtain properly initialized instances from the Client class rather than constructing them directly.

119-119: Fix Datasets() instantiation - missing required HTTPClient parameter.

Same issue as line 35. Replace with:
client = Traceloop.init(app_name="attachment-demo")
datasets = client.datasets
188-188: Fix Datasets() instantiation - missing required HTTPClient parameter.

Same issue as lines 35 and 119. Replace with:
client = Traceloop.init(app_name="attachment-demo")
datasets = client.datasets
251-251: Fix Datasets() instantiation - missing required HTTPClient parameter.

Same issue as lines 35, 119, and 188. Replace with:
client = Traceloop.init(app_name="attachment-demo")
datasets = client.datasets

🧹 Nitpick comments (2)

packages/sample-app/sample_app/dataset_attachments_example.py (2)
1-1: Make the file executable to match the shebang.

The shebang indicates this file is intended to be run directly, but the file lacks executable permissions.

Apply this command to fix:
chmod +x packages/sample-app/sample_app/dataset_attachments_example.py
186-189: Add missing example header print statement for consistency.

Unlike the other three examples (lines 31, 106, 243), this function doesn't print a header banner. This makes the console output less clear when running all examples.

Add this after the function definition:
 def example_in_memory_attachments():
     """Example: Creating a dataset with in-memory data attachments."""
+    print("\n=== Example 3: In-Memory Attachments ===")
+
     datasets = Datasets()
Also, fix the Datasets() instantiation issue (see separate comment).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5cee01d and 9b00146.

📒 Files selected for processing (1)

packages/sample-app/sample_app/dataset_attachments_example.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/sample-app/sample_app/dataset_attachments_example.py

🧠 Learnings (1)

📚 Learning: 2025-08-04T15:35:30.188Z

Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.

Applied to files:

packages/sample-app/sample_app/dataset_attachments_example.py

🧬 Code graph analysis (1)

packages/sample-app/sample_app/dataset_attachments_example.py (4)

packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)

Attachment (24-178)

ExternalAttachment (181-227)

data (255-264)

packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)

FileCellType (16-20)

ColumnDefinition (75-78)

ColumnType (8-13)

CreateDatasetRequest (84-89)

packages/traceloop-sdk/tests/conftest.py (1)

datasets (227-233)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)

Datasets (29-368)

🪛 Ruff (0.14.5)

packages/sample-app/sample_app/dataset_attachments_example.py

1-1: Shebang is present but file is not executable

(EXE001)

350-350: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint
GitHub Check: Build Packages (3.11)

🔇 Additional comments (2)

packages/sample-app/sample_app/dataset_attachments_example.py (2)

12-26: LGTM!

Imports are well-organized and follow security best practices with no hardcoded credentials.

355-356: LGTM!

Standard Python entry point pattern is correctly implemented.

…metry into gz/add-files-to-dataset

…dule and update imports accordingly

ellipsis-dev

Important

Looks good to me! 👍

Reviewed fb871d9 in 1 minute and 26 seconds. Click for details.

Reviewed 184 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/sample_app/dataset_attachments_example.py:16

Draft comment:
Import paths updated to use the 'traceloop.sdk.datasets' module. This change is consistent with the new package structure.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/sample-app/sample_app/dataset_example.py:10

Draft comment:
Updated import to use 'traceloop.sdk.datasets' instead of the old 'dataset' module. This ensures consistency with the new API structure.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py:8

Draft comment:
Imports for Attachment, ExternalAttachment, and FileCellType have been updated to the new datasets package. The changes look correct.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:297

Draft comment:
When preparing the request for creation (_prepare_request_for_creation), consider using a deep copy (e.g. via copy.deepcopy) to handle nested structures in row values, ensuring the original request remains fully unmodified.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

5. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:360

Draft comment:
In the attachment processing section (_process_attachments), consider using logger.exception(e) within the exception block to capture the full traceback, which can aid in debugging failures during attachment uploads.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

6. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:240

Draft comment:
The _slugify method works well; however, consider adding unit tests to cover edge cases (e.g., input with only special characters or excessive whitespace) to ensure robust slug generation.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_Y3jeBiARa9zDuyem

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (4)

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml (1)
85-85: Scrub AWS credentials from upload_url in cassette.

This is the same issue flagged in the previous review. The upload_url JSON field still embeds AWS credentials (X-Amz-Credential, X-Amz-Security-Token, X-Amz-Signature), which violates the coding guideline "Never commit secrets or PII in VCR cassettes." Even if expired, these must be replaced with clearly fake placeholders to pass security scans (Checkov, Gitleaks).

Apply this diff to scrub the sensitive values while preserving the URL structure and JSON deserialization:
-      string: '{"upload_url":"https://traceloop-staging-files.s3.amazonaws.com/c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=ASIAQEMAC2MSQIN6EEGS%2F20251124%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20251124T212508Z\u0026X-Amz-Expires=3600\u0026X-Amz-Security-Token=IQoJb3JpZ2luX2VjEJb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIH3%2Bh%2BsH0KvYJqXGuQ0GF0xk8%2BkRicgYH8Ltvo08DLW%2FAiAfcHu0cnrxCIkB7hd2UBATZ1ZSbOQq8L38mh6%2BkJf18Cr5BAheEAAaDDAwOTM5NTQyNjA4NSIMCJbAo9CFNseeyr20KtYEipuqeoYbzFZDx8VMzfnyRA58M7AkLLzZ14X6E8WNz3Fo%2BRbtOZeDaIgG6Q1vvqFtg%2B3JxRcNUyU3Q372FdJsfQZnWlI%2BpmZLykoja%2FasIHEz%2B8h9eTz8tl97XYnWafZTgVHFWD2SJXmFchA0rLVVyRi0YvG6gq4X02ns0Xgn718hzuBvSqbnJ82oa0RvhrAyKeGvCPQwxhghqyNpG0ntXxJzmkgezFZSelFnz7T4FwKXr8pKgEu%2BRCPBG%2BSHGDT1DhZ3sG2F5ORq8aJyj%2FKkGhENcyogtju9Kgaq8oVMJtV%2FrQEkxuftMFrnvaO9vR4e2tgk6p%2FJLv7akwI61%2Fn89pdSBjB0pklC4gGk5zBWwnPq814OhIGcCzmlf1gdD8k4DR1AUCy9Q2wGwnWScpsSEipr3r7i0kvXZIRlx2YVGXOnk0XJHKd32RM9wrpcFb0LSxkABelmMGsYll%2BTktuBkOKJidBLBjzE2kU4oKbbmtcv8FKyXTwXQALerpO54UVzuABh6gmLAWAZspWEcJ%2B2M8tC1n5GxFICqvhgXzFHL7x%2FilwJcjhH9I%2Bt28K7RGIqQpQfTffC2gJtUAUcohLFtmdahNQ6DHLuPiBSKk9XUFd8l5URLndJVNhzMos580RO%2FrcIHQv8FU95kEyrNuaCH6nxk3xC2ef97AyikYu%2BPrdfRDn2JPENF86x4q0Qu1FzErRBQX%2Ft6osQ9TVP3wdnswk41b9EQO4nxzikMp9h1MEoZUETf1nWpelQNxgva7PvCm7ci0NVDNrm20PfbIdNSmnaoYxFsjDykJPJBjqZASCiHdsuYM4lR%2FDje4Zp0IzfS2KDr8069je%2FGD91v%2FurCR5IQMToDFUyIiHRBZ5F%2FW93oSK6z3OpqOXoxyG2iDmmmY5VxR9iRNqjVFu%2FjZLZh53RetQgplscJRgnqD3q8ICez2qwQJY%2FcPwnVddN6CngNOz%2FPbjZ33eETP8V9IizhuKqBA4ExS75x2n50DpzQZei%2ByGCuvYeZw%3D%3D\u0026X-Amz-SignedHeaders=content-type%3Bhost\u0026X-Amz-Signature=e961a868ba46c0a58b13af2d7a2f56eaeec39c6f177f20050341ec45af00f1fb","storage_key":"c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg","expires_at":"2025-11-24T22:25:08.316243221Z","method":"PUT"}'
+      string: '{"upload_url":"https://traceloop-staging-files.s3.amazonaws.com/c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=REDACTED_CREDENTIAL%2F20251124%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20251124T212508Z\u0026X-Amz-Expires=3600\u0026X-Amz-Security-Token=REDACTED_TOKEN\u0026X-Amz-SignedHeaders=content-type%3Bhost\u0026X-Amz-Signature=REDACTED_SIGNATURE","storage_key":"c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg","expires_at":"2025-11-24T22:25:08.316243221Z","method":"PUT"}'
The diff replaces the actual AWS access credential, session token, and signature with clearly fake placeholders while preserving the JSON structure and URL encoding so test deserialization works unchanged.
packages/sample-app/sample_app/dataset_attachments_example.py (2)
350-352: Broad exception catch acceptable for demo script.

For example code, the catch-all provides a user-friendly fallback. If you want to silence the Ruff warning, add an inline comment.
-    except Exception as e:
+    except Exception as e:  # noqa: BLE001 - Catch-all for user-friendly error in demo
34-35: Datasets() instantiation will fail - requires HTTPClient argument.

The Datasets class requires an HTTPClient parameter in __init__, but this example instantiates it without arguments. This will raise TypeError at runtime.

As noted in a previous review (marked as addressed), you should use the pre-configured datasets attribute from the Traceloop client:
     # Initialize Traceloop
-    Traceloop.init(app_name="attachment-demo")
-    datasets = Datasets()
+    client = Traceloop.init(app_name="attachment-demo")
+    datasets = client.datasets
This same issue applies to lines 119, 188, and 251.
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml (1)

86-86: AWS credentials in presigned URLs should be scrubbed.

This cassette contains AWS credentials (X-Amz-Security-Token, X-Amz-Credential) in the presigned S3 URLs. Per coding guidelines for cassettes, sensitive data should never be committed.

Although a previous review indicated this was addressed, the credentials are still present. Please scrub these by replacing the query parameters with placeholders like ?<REDACTED> while preserving the storage_key, method, and expires_at fields.

Also applies to line 202.

🧹 Nitpick comments (7)

packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1)

3-72: File/attachment models are well-structured; consider tightening status typing

The new file-related enums/models (FileCellType, FileStorageType, FileCellMetadata, FileCellValue, and upload/external URL request/response models) look consistent with the attachment workflows and should integrate cleanly with the rest of the SDK.

If you want stronger guarantees, you could replace the free-form str status fields in FileCellValue and UploadStatusRequest with a dedicated enum or Literal union (e.g., "in_progress" | "success" | "failed"), which would surface invalid states earlier while keeping JSON shapes the same.
packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (1)
100-162: Consider using try/finally for reliable temp file cleanup.

If any assertion fails before lines 161-162, the temporary files won't be cleaned up. Using a try/finally block (as done in test_create_dataset_with_mixed_attachments) would ensure cleanup even on test failures.
+    try:
+        # Create dataset request with file attachments
+        dataset_request = CreateDatasetRequest(
             # ... existing code ...
+        )
+
+        # Mock the S3 upload to succeed
+        with patch.object(Attachment, "_upload_to_s3", return_value=True):
+            dataset = datasets.create(dataset_request)
+
+        # ... assertions ...
+    finally:
+        # Clean up
+        os.unlink(test_image.name)
+        os.unlink(test_pdf.name)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
329-334: Redundant "Warning:" prefix in log message.

The logger.warning() call already indicates this is a warning. Remove the "Warning:" prefix from the message.
-                logger.warning(
-                    f"Warning: Row index {row_idx} out of range, skipping attachments"
-                )
+                logger.warning(
+                    f"Row index {row_idx} out of range, skipping attachments"
+                )
359-367: Broad exception catch is acceptable here, but consider logging at debug level for the full traceback.

The except Exception (Ruff BLE001) is intentional for resilience—one failed attachment shouldn't prevent processing others. The warning log captures the error message, which is appropriate. Consider also logging the full exception at debug level for troubleshooting.
                 except Exception as e:
                     logger.warning(
-                        f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}"
+                        f"Failed to process attachment for row {row_idx}, column {col_slug}: {e}"
                     )
+                    logger.debug("Attachment processing exception details:", exc_info=True)
                     # Mark as failed in row values
                     row.values[col_slug] = {
                         "type": "file",
                         "status": "failed",
                         "error": str(e),
                     }
packages/sample-app/sample_app/dataset_attachments_example.py (1)

1-1: Shebang present but file is not executable.

Either remove the shebang or make the file executable with chmod +x.
packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (1)
17-29: Optional: Sort __all__ for consistency.

Ruff suggests applying isort-style sorting to __all__. This is a minor style preference.
 __all__ = [
+    "Attachment",
+    "AttachmentReference",
+    "BaseDatasetEntity",
+    "Column",
+    "ColumnType",
     "Dataset",
-    "Column",
-    "Row",
-    "BaseDatasetEntity",
-    "ColumnType",
     "DatasetMetadata",
+    "ExternalAttachment",
     "FileCellType",
     "FileStorageType",
-    "Attachment",
-    "ExternalAttachment",
-    "AttachmentReference",
+    "Row",
 ]
packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (1)
194-198: Filename extraction may include query parameters.

url.split("/")[-1] doesn't strip query strings, so a URL like https://example.com/image.png?token=abc would yield image.png?token=abc as the filename.
-        self.filename = filename or url.split("/")[-1]
+        from urllib.parse import urlparse
+        self.filename = filename or urlparse(url).path.split("/")[-1] or "attachment"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b00146 and fb871d9.

📒 Files selected for processing (16)

packages/sample-app/sample_app/dataset_attachments_example.py (1 hunks)
packages/sample-app/sample_app/dataset_example.py (1 hunks)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml (1 hunks)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml (1 hunks)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml (1 hunks)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml (1 hunks)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_without_attachments.yaml (1 hunks)
packages/traceloop-sdk/tests/datasets/test_create_dataset.py (1 hunks)
packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (1 hunks)
packages/traceloop-sdk/tests/datasets/test_datasets_operations.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py (0 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)
packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1 hunks)

💤 Files with no reviewable changes (1)

packages/traceloop-sdk/traceloop/sdk/dataset/init.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py
packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
packages/sample-app/sample_app/dataset_attachments_example.py
packages/traceloop-sdk/tests/datasets/test_create_dataset.py
packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py
packages/sample-app/sample_app/dataset_example.py
packages/traceloop-sdk/traceloop/sdk/datasets/model.py
packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py
packages/traceloop-sdk/tests/datasets/test_datasets_operations.py

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_without_attachments.yaml
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml

🧠 Learnings (4)

📚 Learning: 2025-08-17T15:06:48.109Z

Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to **/cassettes/**/*.{yaml,yml,json} : Never commit secrets or PII in VCR cassettes; scrub sensitive data

Applied to files:

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml

📚 Learning: 2025-08-17T15:06:48.109Z

Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording

Applied to files:

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml

📚 Learning: 2025-08-04T15:35:30.188Z

Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.

Applied to files:

packages/sample-app/sample_app/dataset_attachments_example.py

📚 Learning: 2025-08-17T15:06:48.109Z

Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

packages/traceloop-sdk/tests/datasets/test_datasets_operations.py

🧬 Code graph analysis (7)

packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (6)

packages/traceloop-sdk/tests/conftest.py (1)

datasets (227-233)

packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (3)

Attachment (24-178)

ExternalAttachment (181-227)

data (255-264)

packages/traceloop-sdk/traceloop/sdk/datasets/model.py (4)

FileCellType (16-20)

ColumnDefinition (75-78)

ColumnType (8-13)

CreateDatasetRequest (84-89)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)

create (64-106)

packages/traceloop-sdk/traceloop/sdk/datasets/column.py (1)

delete (32-51)

packages/traceloop-sdk/traceloop/sdk/datasets/row.py (1)

delete (28-38)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)

packages/traceloop-sdk/tests/conftest.py (1)

datasets (227-233)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)

packages/traceloop-sdk/traceloop/sdk/client/http.py (1)

HTTPClient (7-91)

packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (4)

Attachment (24-178)

ExternalAttachment (181-227)

upload (109-167)

attach (200-227)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (2)

Dataset (20-112)

from_create_dataset_response (39-52)

packages/traceloop-sdk/traceloop/sdk/datasets/model.py (5)

ColumnDefinition (75-78)

ColumnType (8-13)

CreateDatasetRequest (84-89)

CreateDatasetResponse (99-108)

DatasetMetadata (151-159)

packages/traceloop-sdk/tests/datasets/test_create_dataset.py (2)

packages/traceloop-sdk/tests/conftest.py (1)

datasets (227-233)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)

Dataset (20-112)

packages/sample-app/sample_app/dataset_example.py (5)

packages/traceloop-sdk/tests/conftest.py (1)

datasets (227-233)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)

Dataset (20-112)

packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1)

ColumnType (8-13)

packages/traceloop-sdk/traceloop/sdk/datasets/column.py (1)

Column (11-77)

packages/traceloop-sdk/traceloop/sdk/datasets/row.py (1)

Row (10-49)

packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (6)

packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (3)

Attachment (24-178)

AttachmentReference (230-288)

ExternalAttachment (181-227)

packages/traceloop-sdk/traceloop/sdk/datasets/base.py (1)

BaseDatasetEntity (6-23)

packages/traceloop-sdk/traceloop/sdk/datasets/column.py (1)

Column (11-77)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)

Dataset (20-112)

packages/traceloop-sdk/traceloop/sdk/datasets/model.py (4)

ColumnType (8-13)

DatasetMetadata (151-159)

FileCellType (16-20)

FileStorageType (23-25)

packages/traceloop-sdk/traceloop/sdk/datasets/row.py (1)

Row (10-49)

packages/traceloop-sdk/tests/datasets/test_datasets_operations.py (2)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)

Dataset (20-112)

packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1)

DatasetMetadata (151-159)

🪛 Checkov (3.2.334)

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml

[high] 86-87: AWS Access Key

(CKV_SECRET_2)

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml

[high] 85-86: AWS Access Key

(CKV_SECRET_2)

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml

[high] 85-86: AWS Access Key

(CKV_SECRET_2)

🪛 Gitleaks (8.29.0)

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml

[high] 85-85: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Ruff (0.14.5)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

359-359: Do not catch blind exception: Exception

(BLE001)

packages/sample-app/sample_app/dataset_attachments_example.py

1-1: Shebang is present but file is not executable

(EXE001)

350-350: Do not catch blind exception: Exception

(BLE001)

packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py

44-44: Avoid specifying long messages outside the exception class

(TRY003)

46-46: Avoid specifying long messages outside the exception class

(TRY003)

96-96: Avoid specifying long messages outside the exception class

(TRY003)

99-99: Avoid specifying long messages outside the exception class

(TRY003)

132-132: Create your own exception

(TRY002)

132-132: Avoid specifying long messages outside the exception class

(TRY003)

138-138: Create your own exception

(TRY002)

138-138: Avoid specifying long messages outside the exception class

(TRY003)

150-150: Probable use of requests call without timeout

(S113)

173-173: Probable use of requests call without timeout

(S113)

176-176: Consider moving this statement to an else block

(TRY300)

177-177: Do not catch blind exception: Exception

(BLE001)

220-220: Create your own exception

(TRY002)

220-220: Avoid specifying long messages outside the exception class

(TRY003)

260-260: Create your own exception

(TRY002)

260-260: Avoid specifying long messages outside the exception class

(TRY003)

261-261: Probable use of requests call without timeout

(S113)

packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py

17-29: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint

🔇 Additional comments (16)

packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)

4-13: Import path realignment to traceloop.sdk.datasets.model looks correct

The updated import source matches the new datasets model module and preserves behavior of Dataset methods that depend on these types.

packages/traceloop-sdk/tests/datasets/test_create_dataset.py (1)

12-12: Dataset import updated to new namespace

Importing Dataset from traceloop.sdk.datasets.dataset aligns these tests with the refactored public datasets namespace; no behavioral change to the tests themselves.

packages/traceloop-sdk/tests/datasets/test_datasets_operations.py (1)

2-3: Operations tests correctly switched to new datasets module paths

Using traceloop.sdk.datasets.dataset.Dataset and traceloop.sdk.datasets.model.DatasetMetadata matches the reorganized SDK layout and keeps the tests in sync with the public API.

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_without_attachments.yaml (1)

1-62: Cassette for dataset without attachments looks clean and compliant

The interaction records the dataset-create call without exposing API keys, auth headers, or PII; header/body content is test-only and appropriate for committing.

packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml (1)

1-245: External-attachments cassette is free of secrets and looks good

The recorded interactions only contain generic external URLs and standard HTTP metadata, with no API keys or sensitive headers; this cassette is safe to keep as-is.

packages/sample-app/sample_app/dataset_example.py (1)

10-10: Sample app now targets the new traceloop.sdk.datasets namespace

Switching Dataset, ColumnType, Column, and Row to import from traceloop.sdk.datasets keeps the example aligned with the refactored public API, without changing its runtime behavior.

packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (5)

1-18: LGTM! Clean test file structure with appropriate imports.

The test module is well-organized with clear separation of concerns. Imports are properly structured from the public API surface (traceloop.sdk.datasets) and model definitions.

20-93: LGTM! Comprehensive test for external attachments.

The test properly validates the external attachment workflow, including storage type verification, URL preservation, file type mapping, and graceful handling of null values.

165-221: LGTM! Good test structure with proper cleanup pattern.

The try/finally block ensures temp files are cleaned up even if assertions fail. The test properly validates mixed attachment types and null handling.

224-260: LGTM! Clean test for in-memory attachments.

Properly validates the in-memory data attachment workflow without needing temp file cleanup.

263-288: LGTM! Important regression test for backward compatibility.

Ensures that the enhanced create() method works correctly for datasets without attachments, maintaining backward compatibility.

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)

2-26: LGTM! Proper logging setup and import organization.

Logger initialization follows Python conventions and the imports are cleanly organized for the new attachment functionality.

64-106: LGTM! Well-designed create method with attachment support.

The three-step pattern (extract → clean → create → process) is clean and maintains backward compatibility. The docstring with example usage is helpful.

262-279: LGTM! Clean extraction logic.

The helper correctly identifies and collects attachment objects from row values.

281-314: LGTM! Proper request preparation with immutability.

The method correctly creates a deep copy to avoid mutating the original request while replacing attachment objects with None for the initial API call.

packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (1)

1-29: ****

The review comment's premise is unfounded. Public documentation shows that users access datasets through the client instance (e.g., client.datasets.from_csv(), client.datasets.from_dataframe(), client.datasets.get_by_slug()), not by importing the Datasets class directly. The Datasets class is an internal implementation detail accessed via the client facade pattern, not intended for direct user imports. Exporting it would create an undocumented parallel API path without corresponding documentation or support.

Likely an incorrect or invalid review comment.

...sets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml

packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py

feat(dataset): add support for file cells in datasets with upload and…

5c18471

… external URL linking capabilities

galzilber changed the title ~~feat(dataset): add support for file cells in datasets with upload and…~~ feat(dataset): add support for file cells in datasets with upload and external URL linking capabilities Nov 24, 2025

nl

cc20bc1

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

packages/traceloop-sdk/tests/dataset/test_file_operations.py Outdated Show resolved Hide resolved

packages/traceloop-sdk/traceloop/sdk/dataset/row.py Outdated Show resolved Hide resolved

packages/traceloop-sdk/traceloop/sdk/dataset/row.py Outdated Show resolved Hide resolved

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

...raceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml Outdated Show resolved Hide resolved

packages/traceloop-sdk/traceloop/sdk/dataset/row.py Outdated Show resolved Hide resolved

fix(dataset): handle request exceptions during file upload to S3

70ba304

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

nirga reviewed Nov 24, 2025

View reviewed changes

packages/traceloop-sdk/README.md Outdated Show resolved Hide resolved

galzilber added 2 commits November 24, 2025 20:19

refactor(dataset): enhance dataset module by adding file cell types a…

f21926c

…nd attachments, and clean up unused methods in Row class

fx

a8ad5fd

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

galzilber added 2 commits November 24, 2025 21:02

rm

c284acf

refactor(tests): remove unused Mock import and clean up assertion for…

469700c

…matting in attachment tests

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

galzilber added 2 commits November 24, 2025 21:04

format

01d0f22

fix

cc0c5b6

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

fx

fd39dfe

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

galzilber added 4 commits November 24, 2025 23:09

rm

443c64f

fx

94659ab

fx

bfe3250

fx

32e15b8

ellipsis-dev bot reviewed Nov 24, 2025

View reviewed changes

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

nirga previously approved these changes Nov 25, 2025

View reviewed changes

Refactor code for improved functionality and performance

77b8c26

ellipsis-dev bot reviewed Nov 25, 2025

View reviewed changes

fx

8b13f98

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

packages/traceloop-sdk/examples/dataset_attachments_example.py Outdated Show resolved Hide resolved

packages/sample-app/sample_app/dataset_attachments_example.py Show resolved Hide resolved

fx

65c207d

ellipsis-dev bot reviewed Nov 25, 2025

View reviewed changes

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

packages/sample-app/sample_app/dataset_attachments_example.py Show resolved Hide resolved

galzilber added 3 commits November 25, 2025 11:40

Merge branch 'main' into gz/add-files-to-dataset

cd925a4

fx

bc8a5ca

fx

e08cf4b

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py Outdated Show resolved Hide resolved

ellipsis-dev bot reviewed Nov 25, 2025

View reviewed changes

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

fix(traceloop-sdk): enhance type annotations and improve data handlin…

5cee01d

…g in attachment and datasets modules

ellipsis-dev bot reviewed Nov 25, 2025

View reviewed changes

fx

9b00146

ellipsis-dev bot reviewed Nov 25, 2025

View reviewed changes

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

galzilber added 4 commits November 25, 2025 13:49

Merge branch 'main' into gz/add-files-to-dataset

2d5f510

fx

75d7541

Merge branch 'gz/add-files-to-dataset' of github.com:traceloop/openll…

8157cfe

…metry into gz/add-files-to-dataset

refactor(sdk): migrate dataset-related classes to a new 'datasets' mo…

fb871d9

…dule and update imports accordingly

ellipsis-dev bot reviewed Nov 25, 2025

View reviewed changes

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

Merge branch 'main' into gz/add-files-to-dataset

992c75c

nirga approved these changes Nov 26, 2025

View reviewed changes

galzilber merged commit 6213c2d into main Nov 26, 2025
12 checks passed

fix(dataset): add support for file cells in datasets with upload and external URL linking capabilities #3462

fix(dataset): add support for file cells in datasets with upload and external URL linking capabilities #3462

Uh oh!

Conversation

galzilber commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

galzilber commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading