-
Notifications
You must be signed in to change notification settings - Fork 868
fix(dataset): add support for file cells in datasets with upload and external URL linking capabilities #3462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… external URL linking capabilities
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughReorganized dataset exports from Changes
Sequence DiagramsequenceDiagram
participant User
participant Datasets
participant HTTPClient
participant S3
participant Traceloop API
User->>Datasets: create(dataset_request with attachments)
Note over Datasets: _extract_attachments()
Datasets->>Datasets: Collect Attachment/ExternalAttachment objects per row
Note over Datasets: _prepare_request_for_creation()
Datasets->>Datasets: Replace attachments with None in rows
Datasets->>Traceloop API: POST /v2/datasets (without attachments)
Traceloop API-->>Datasets: Create dataset, return rows with IDs
Note over Datasets: _process_attachments()
alt Attachment (file/bytes)
Datasets->>Traceloop API: POST upload-url endpoint
Traceloop API-->>Datasets: Return signed S3 URL + storage key
Datasets->>S3: PUT file to signed URL
S3-->>Datasets: Upload success
Datasets->>Traceloop API: PUT upload-status (success)
Traceloop API-->>Datasets: Confirm status
else ExternalAttachment (URL)
Datasets->>Traceloop API: POST attach external URL endpoint
Traceloop API-->>Datasets: Confirm attachment
end
Datasets->>Datasets: Update row metadata with attachment references
Datasets-->>User: Return Dataset with attached files
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 5c18471 in 2 minutes and 53 seconds. Click for details.
- Reviewed
957lines of code in6files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/README.md:18
- Draft comment:
Clarify error handling in file cell examples by indicating specific exceptions (e.g., ValueError, FileNotFoundError). - Reason this comment was not posted:
Comment looked like it was already resolved.
2. packages/traceloop-sdk/traceloop/sdk/dataset/row.py:247
- Draft comment:
Handle potential exceptions from os.path.getsize(file_path) in case the file becomes unavailable. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is speculative - it's suggesting handling a theoretical race condition where a file could be deleted between being uploaded (lines 224-226) and having its size checked (line 247). This is an extremely unlikely scenario in practice. The file existence is already validated at line 208, and the file is successfully opened and uploaded just moments before line 247. The comment says "in case the file becomes unavailable" which is a speculative "what if" scenario. According to the rules, I should NOT make speculative comments like "If X, then Y is an issue" - only comment if it's definitely an issue. This falls into that category of speculative concerns. However, it's technically possible for a file to be deleted by another process between the upload and the getsize call, especially in concurrent environments. This could cause an OSError that would crash the method. Is this common enough to warrant handling? While technically possible, this is an extremely rare edge case that would require external interference (another process deleting the file) during a very short time window. The code already validates file existence before starting the upload process. This type of defensive programming for every possible edge case is not typically required unless there's evidence it's a real problem. The comment is speculative and doesn't provide strong evidence this is a real issue. This comment should be deleted. It's a speculative suggestion about a theoretical race condition that is extremely unlikely to occur in practice. The rules explicitly state not to make speculative comments like "If X, then Y is an issue" unless it's definitely an issue.
Workflow ID: wflow_guLY3nRUyggh7XgR
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed cc20bc1 in 47 seconds. Click for details.
- Reviewed
11lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/tests/dataset/test_file_operations.py:105
- Draft comment:
Good to see the trailing newline added. Consider wrapping the temporary file cleanup in a try/finally block to ensure cleanup on test failures. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
Workflow ID: wflow_r5RqHAxBTgVSyWdO
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
packages/traceloop-sdk/tests/dataset/test_file_operations.py (1)
67-105: Ensure temporary file is always cleaned up with try/finallyIf
set_file_cellor the assertions fail,os.unlink(test_file.name)won’t run and the temp file may be left behind. Wrapping the test body in atry/finallyimproves robustness.- # Create test file - test_file = tempfile.NamedTemporaryFile(suffix=".txt", delete=False) - test_file.write(b"test content") - test_file.close() - - # Mock S3 upload to succeed - with patch.object(row, "_upload_file_to_presigned_url", return_value=True): - row.set_file_cell( - column_slug="file", - file_path=test_file.name, - file_type=FileCellType.FILE, - ) - - # Verify - assert row.values["file"]["storage"] == "internal" - assert row.values["file"]["status"] == "success" - assert "storage_key" in row.values["file"] - - # Clean up - os.unlink(test_file.name) + # Create test file + test_file = tempfile.NamedTemporaryFile(suffix=".txt", delete=False) + try: + test_file.write(b"test content") + test_file.close() + + # Mock S3 upload to succeed + with patch.object(row, "_upload_file_to_presigned_url", return_value=True): + row.set_file_cell( + column_slug="file", + file_path=test_file.name, + file_type=FileCellType.FILE, + ) + + # Verify + assert row.values["file"]["storage"] == "internal" + assert row.values["file"]["status"] == "success" + assert "storage_key" in row.values["file"] + finally: + os.unlink(test_file.name)packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2)
223-231: Act on_confirm_file_uploadresult when marking uploads failed/successfulIn both the failure path (Line 228–230) and the final confirmation (Line 249–251),
_confirm_file_uploadis called but its boolean return value is ignored. If the backend returns{"success": false}without raising an HTTP error, the SDK will still proceed as if the operation succeeded.Consider:
- Checking the return value and raising an exception (or at least logging) when it is
False.- Optionally returning early or not updating
self.values[column_slug]if confirmation fails.For example:
- if not success: - self._confirm_file_upload(column_slug, status="failed", metadata=metadata) - raise Exception(f"Failed to upload file to S3: {file_name}") + if not success: + confirmed = self._confirm_file_upload( + column_slug, status="failed", metadata=metadata + ) + if not confirmed: + raise Exception( + f"Failed to upload file to S3 and backend did not confirm failure for cell {column_slug}" + ) + raise Exception(f"Failed to upload file to S3: {file_name}") @@ - self._confirm_file_upload( - column_slug=column_slug, status="success", metadata=final_metadata - ) + if not self._confirm_file_upload( + column_slug=column_slug, status="success", metadata=final_metadata + ): + raise Exception( + f"Upload confirmation failed for cell {column_slug} after successful upload" + )Also applies to: 249-251
232-244: Use logging instead ofThe warnings for missing thumbnail file and failed thumbnail upload currently use
Prefer the standard logging module, e.g.:
-import os +import os +import logging @@ - if not os.path.exists(thumbnail_path): - print(f"Warning: Thumbnail file not found: {thumbnail_path}") + if not os.path.exists(thumbnail_path): + logging.warning("Thumbnail file not found: %s", thumbnail_path) @@ - if not thumb_success: - print(f"Warning: Failed to upload thumbnail for {file_name}") + if not thumb_success: + logging.warning( + "Failed to upload thumbnail for %s", file_name + )
🧹 Nitpick comments (3)
packages/traceloop-sdk/traceloop/sdk/dataset/model.py (1)
12-72: File-cell models are consistent; consider tightening status representationThe new
FileCellType,FileStorageType, and related models (FileCellMetadata,FileCellValue,UploadURLRequest/Response,UploadStatusRequest,ExternalURLRequest) line up with the row logic and test expectations, so the overall shape looks solid.If you want stronger validation, you might later replace the free-form
status: strfields with a small enum or shared constants (e.g.,"in_progress","success","failed") to avoid accidental typos, but that’s not required for correctness.packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2)
122-149: Optional: derive local file-cell state from API response
_set_external_file_urlcurrently ignores the server response (beyondNone/error) and reconstructs the local cell dict manually. If the backend ever adds fields (e.g., normalized URL, derived metadata), this method might drift from the canonical representation.If the API returns the full cell payload, consider updating
self.values[column_slug]from that response instead of hand-building the dict, or at least centralizing the shape in a single helper to avoid duplication.
245-261: Avoid mutating caller-provided metadata when addingsize_bytes
final_metadata = metadata or {}followed byfinal_metadata["size_bytes"] = ...will mutate the original dict whenmetadatais provided by the caller, which can be surprising if that dict is reused elsewhere.Consider copying:
- final_metadata = metadata or {} - final_metadata["size_bytes"] = os.path.getsize(file_path) + final_metadata = {**(metadata or {})} + final_metadata["size_bytes"] = os.path.getsize(file_path)This keeps the API safer while preserving behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
packages/traceloop-sdk/README.md(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_external_url_file_cell.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/test_file_operations.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/dataset/model.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/dataset/row.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/tests/dataset/test_file_operations.pypackages/traceloop-sdk/traceloop/sdk/dataset/model.pypackages/traceloop-sdk/traceloop/sdk/dataset/row.py
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yamlpackages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_external_url_file_cell.yaml
🧠 Learnings (1)
📚 Learning: 2025-08-04T15:35:30.188Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.
Applied to files:
packages/traceloop-sdk/traceloop/sdk/dataset/row.py
🧬 Code graph analysis (2)
packages/traceloop-sdk/tests/dataset/test_file_operations.py (2)
packages/traceloop-sdk/traceloop/sdk/dataset/model.py (1)
FileCellType(15-19)packages/traceloop-sdk/traceloop/sdk/dataset/row.py (3)
set_file_cell(151-261)Row(19-261)delete(37-47)
packages/traceloop-sdk/traceloop/sdk/dataset/row.py (3)
packages/traceloop-sdk/traceloop/sdk/dataset/base.py (1)
BaseDatasetEntity(6-23)packages/traceloop-sdk/traceloop/sdk/client/http.py (4)
HTTPClient(7-91)post(23-37)put(74-91)get(39-58)packages/traceloop-sdk/traceloop/sdk/dataset/model.py (6)
FileCellType(15-19)FileStorageType(22-24)UploadURLRequest(46-51)UploadURLResponse(54-60)UploadStatusRequest(63-65)ExternalURLRequest(68-71)
🪛 Checkov (3.2.334)
packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml
[high] 83-84: AWS Access Key
(CKV_SECRET_2)
🪛 Gitleaks (8.29.0)
packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml
[high] 83-83: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/dataset/row.py
83-83: Create your own exception
(TRY002)
83-83: Avoid specifying long messages outside the exception class
(TRY003)
100-100: Probable use of requests call without timeout
(S113)
118-118: Create your own exception
(TRY002)
118-118: Avoid specifying long messages outside the exception class
(TRY003)
140-140: Create your own exception
(TRY002)
140-140: Avoid specifying long messages outside the exception class
(TRY003)
190-190: Create your own exception
(TRY002)
190-190: Avoid specifying long messages outside the exception class
(TRY003)
194-194: Avoid specifying long messages outside the exception class
(TRY003)
196-196: Avoid specifying long messages outside the exception class
(TRY003)
209-209: Avoid specifying long messages outside the exception class
(TRY003)
230-230: Create your own exception
(TRY002)
230-230: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (6)
packages/traceloop-sdk/README.md (1)
18-127: File-cell usage docs are consistent with the new APIThe examples for uploads, external URLs, thumbnails, and validation errors line up with
Row.set_file_cell’s signature and behavior; no changes needed.packages/traceloop-sdk/tests/dataset/test_file_operations.py (2)
10-40: External URL file-cell test matches expected behaviorThis test cleanly exercises the external URL path for a FILE column and asserts the right fields (
storage="external",url,type="video"), which matches the Row/set_file_cell semantics.
42-65: Validation test correctly covers key argument error pathsThe two cases (both
file_pathandurl, and neither provided) map directly to the validation logic inRow.set_file_cell, and the regexes are resilient to small wording changes.packages/traceloop-sdk/tests/dataset/cassettes/test_file_operations/test_external_url_file_cell.yaml (1)
1-238: External-URL cassette looks clean and matches test flowThe recorded interactions align with
test_external_url_file_celland don’t expose secrets or PII; this fixture looks good as-is.packages/traceloop-sdk/traceloop/sdk/dataset/row.py (2)
60-85: Upload URL helper follows established HTTP error-handling pattern
_request_file_uploadchecks forNonefromself._http.postand raises on failure, matching the existing pattern where HTTP client methods returnNoneon error and callers translate that into exceptions. This keeps behavior consistent across dataset operations.Based on learnings, this aligns with the Dataset HTTP error-handling convention.
151-197: Publicset_file_cellAPI and validation look goodThe
set_file_cellsignature and docstring are clear, and the mutual‑exclusion checks forfile_path/urlplus the deleted‑row guard follow existing patterns and give good feedback for misuse.
...raceloop-sdk/tests/dataset/cassettes/test_file_operations/test_file_upload_with_mock_s3.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 70ba304 in 1 minute and 50 seconds. Click for details.
- Reviewed
27lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/dataset/row.py:1
- Draft comment:
Good move importing 'requests' at the module level to avoid re-importing in the function. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/dataset/row.py:100
- Draft comment:
Consider logging the caught RequestException to aid in debugging instead of silently returning False. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This comment is suggesting a code quality improvement - adding logging to aid debugging. The comment is about the new exception handling code added in the diff. However, I need to consider: 1) Is this clearly required or is it a suggestion/opinion? 2) The caller does handle the failure case by raising an exception with context. 3) The comment uses "Consider" which is softer language, but it's still actionable. 4) Adding logging for caught exceptions is generally a good practice, especially for network operations. 5) However, the rules state "Do NOT comment unless there is clearly a code change required" - is logging clearly required here? It's more of a best practice suggestion than a clear requirement. The code works without it, and the failure is handled at the call site. The comment is a reasonable suggestion for code quality, but it's not clearly required. The code functions correctly without logging, and the failure is already handled by the caller. This might be considered an optional improvement rather than a necessary change. The use of "Consider" suggests it's not mandatory. While logging exceptions is a best practice, the rules explicitly state not to comment unless there is clearly a code change required. This is more of a "nice to have" suggestion rather than a clear requirement. The failure case is already handled appropriately by the caller, which raises an exception with context. This comment should be deleted. While adding logging is a reasonable suggestion, it's not clearly required - the code works correctly without it, and the failure is already handled at the call site. This falls into the category of optional improvements rather than necessary changes.
Workflow ID: wflow_xq5aAMGOpYlYpqHz
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galzilber this should match the existing APIs that we have. It can't be a separate API
…nd attachments, and clean up unused methods in Row class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed a8ad5fd in 1 minute and 59 seconds. Click for details.
- Reviewed
2389lines of code in9files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:286
- Draft comment:
Consider refactoring the loop in _prepare_request_for_creation into a dictionary comprehension for brevity and clarity. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:95
- Draft comment:
Attachment processing is done synchronously. For scenarios with large files or many attachments, consider an asynchronous or background upload mechanism to improve responsiveness. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
Workflow ID: wflow_Vt6cCV3LoyHndPh5
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
♻️ Duplicate comments (2)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml (1)
85-85: Verify AWS credentials in presigned URLs are scrubbed.Similar to test_create_dataset_with_file_attachments_mocked.yaml, this cassette contains AWS S3 presigned URLs with temporary security tokens. Per coding guidelines, ensure these are properly scrubbed.
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml (1)
85-85: Verify AWS credentials in presigned URLs are scrubbed.This cassette also contains AWS S3 presigned URLs with temporary security tokens. Ensure compliance with the coding guideline to never commit secrets in VCR cassettes.
Also applies to: 261-261
🧹 Nitpick comments (3)
packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py (1)
8-20: Consider sorting__all__for consistency.The static analysis tool suggests applying an isort-style sorting to
__all__. While not critical, alphabetical ordering improves maintainability and reduces merge conflicts.Apply this diff to sort the exports:
__all__ = [ - "Dataset", - "Column", - "Row", - "BaseDatasetEntity", - "ColumnType", - "DatasetMetadata", - "FileCellType", - "FileStorageType", "Attachment", + "AttachmentReference", + "BaseDatasetEntity", + "Column", + "ColumnType", + "Dataset", + "DatasetMetadata", "ExternalAttachment", - "AttachmentReference", + "FileCellType", + "FileStorageType", + "Row", ]packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
319-319: Consider using logging instead of print statements.Library code should use Python's logging module rather than print statements for warnings, allowing consumers to control log output and levels.
Add logging at the module level:
import logging logger = logging.getLogger(__name__)Then replace print statements:
- print(f"Warning: Row index {row_idx} out of range, skipping attachments") + logger.warning(f"Row index {row_idx} out of range, skipping attachments")- print(f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}") + logger.warning(f"Failed to process attachment for row {row_idx}, column {col_slug}: {e}")Also applies to: 346-346
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (1)
465-469: TODO noted: Internal file download URL generation.The comment acknowledges that getting presigned download URLs for internal files requires backend API support. This is a known limitation.
Do you want me to open a tracking issue for implementing the backend endpoint and completing this functionality?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (11)
packages/traceloop-sdk/README.md(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_external_attachment.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml(1 hunks)packages/traceloop-sdk/tests/dataset/test_attachments.py(1 hunks)packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/traceloop-sdk/README.md
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yamlpackages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yamlpackages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yamlpackages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yamlpackages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_external_attachment.yaml
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.pypackages/traceloop-sdk/traceloop/sdk/datasets/datasets.pypackages/traceloop-sdk/traceloop/sdk/dataset/__init__.pypackages/traceloop-sdk/traceloop/sdk/dataset/attachment.pypackages/traceloop-sdk/tests/dataset/test_attachments.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py (2)
packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)
ColumnType(7-12)DatasetMetadata(150-158)FileCellType(15-19)FileStorageType(22-24)packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)
Attachment(24-274)ExternalAttachment(277-361)AttachmentReference(364-478)
🪛 Checkov (3.2.334)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml
[high] 85-86: AWS Access Key
(CKV_SECRET_2)
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_attachment_upload_with_mock_s3.yaml
[high] 85-86: AWS Access Key
(CKV_SECRET_2)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
[high] 86-87: AWS Access Key
(CKV_SECRET_2)
🪛 Flake8 (7.3.0)
packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py
[error] 6-6: 'unittest.mock.Mock' imported but unused
(F401)
[error] 7-7: 'traceloop.sdk.dataset.FileStorageType' imported but unused
(F401)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
[error] 287-287: 'copy' imported but unused
(F401)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py
[error] 11-11: 'datetime.datetime' imported but unused
(F401)
packages/traceloop-sdk/tests/dataset/test_attachments.py
[error] 6-6: 'unittest.mock.Mock' imported but unused
(F401)
🪛 Gitleaks (8.29.0)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml
[high] 85-85: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
[high] 86-86: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
[high] 202-202: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
345-345: Do not catch blind exception: Exception
(BLE001)
packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py
8-20: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py
67-67: Avoid specifying long messages outside the exception class
(TRY003)
69-69: Avoid specifying long messages outside the exception class
(TRY003)
121-121: Avoid specifying long messages outside the exception class
(TRY003)
125-125: Avoid specifying long messages outside the exception class
(TRY003)
174-174: Create your own exception
(TRY002)
174-174: Avoid specifying long messages outside the exception class
(TRY003)
221-221: Create your own exception
(TRY002)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
230-230: Probable use of requests call without timeout
(S113)
231-231: Consider moving this statement to an else block
(TRY300)
232-232: Do not catch blind exception: Exception
(BLE001)
249-249: Probable use of requests call without timeout
(S113)
250-250: Consider moving this statement to an else block
(TRY300)
251-251: Do not catch blind exception: Exception
(BLE001)
272-272: Create your own exception
(TRY002)
272-272: Avoid specifying long messages outside the exception class
(TRY003)
352-352: Create your own exception
(TRY002)
352-352: Avoid specifying long messages outside the exception class
(TRY003)
428-428: Create your own exception
(TRY002)
428-428: Avoid specifying long messages outside the exception class
(TRY003)
430-430: Probable use of requests call without timeout
(S113)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (6)
packages/traceloop-sdk/tests/dataset/cassettes/test_attachments/test_external_attachment.yaml (1)
1-242: LGTM! Clean cassette with no sensitive data.The VCR cassette correctly captures the external attachment workflow without exposing secrets or PII. The example YouTube URLs and test metadata are appropriate for test fixtures.
packages/traceloop-sdk/tests/dataset/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml (1)
1-245: LGTM! External attachments cassette is clean.The cassette correctly captures external attachment workflows using only external URLs without any sensitive credentials or PII.
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
61-103: Well-structured attachment workflow!The implementation correctly separates concerns: extract attachments, clean the request, create the dataset, then process attachments. The docstring with example usage is helpful.
254-269: LGTM! Clean attachment extraction logic.The nested dictionary structure correctly maps row indices to column slugs to attachment objects, making it easy to process attachments after dataset creation.
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (2)
24-99: Excellent validation and initialization logic!The constructor properly validates mutual exclusivity of
file_pathanddata, and intelligently determines filename, content type, and file type with sensible defaults.
277-361: Clean and straightforward external attachment implementation!The
ExternalAttachmentclass provides a simple, declarative API for linking external URLs. The implementation correctly delegates to the backend API for validation and storage.
...assettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
Show resolved
Hide resolved
packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py
Outdated
Show resolved
Hide resolved
…matting in attachment tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed c284acf in 1 minute and 17 seconds. Click for details.
- Reviewed
191lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/README.md:17
- Draft comment:
The attachments documentation was completely removed from the README. Since the feature adds file cell support, please ensure that updated documentation or a link to external docs is provided. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/traceloop-sdk/README.md:185
- Draft comment:
Please add a newline at the end of the file to adhere to best practices. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_sdTbQh8kYcjnKsRj
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 469700c in 1 minute and 14 seconds. Click for details.
- Reviewed
100lines of code in3files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/tests/dataset/test_attachments.py:6
- Draft comment:
Removed unused 'Mock' import; only 'patch' is used. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it simply states that an unused import was removed. It doesn't provide a suggestion or ask for confirmation about a specific change. According to the rules, purely informative comments should be removed.
2. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:6
- Draft comment:
Removed unused 'Mock' and 'FileStorageType' imports for cleaner test code. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, stating that unused imports were removed for cleaner code. It doesn't provide any actionable feedback or suggestions for improvement.
3. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:7
- Draft comment:
Removed unused 'datetime' import to reduce clutter. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:88
- Draft comment:
Refactored content type assignment with explicit parenthesization for better readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:168
- Draft comment:
Improved formatting of _confirm_upload() call parameters for consistency and clarity. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
6. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:182
- Draft comment:
Standardized multi-line formatting in _confirm_upload() call with metadata for improved readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_eiiyOan97ZWggHaR
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 01d0f22 in 1 minute and 47 seconds. Click for details.
- Reviewed
89lines of code in1files - Skipped
0files when reviewing. - Skipped posting
8draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:1
- Draft comment:
Reordered typing imports for improved readability. This is a pure style change. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:10
- Draft comment:
Grouped and re-ordered imports (HTTPClient, Attachment, ExternalAttachment, Dataset) for clarity; duplicate/remnant imports removed. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:251
- Draft comment:
Refactored _extract_attachments signature to a multi-line format for enhanced readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:287
- Draft comment:
Inserted a blank line after 'import copy' in _prepare_request_for_creation to improve code clarity. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:304
- Draft comment:
Added a trailing comma in the CreateDatasetRequest call for consistency and easier future diffing. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
6. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:321
- Draft comment:
Reformatted the warning print for out-of-range row indexes into a multi-line format. Consider using a logging framework for better control over warning outputs. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
7. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:348
- Draft comment:
Reformatted the print in the exception handling block to enhance readability. Also, consider replacing print with a logging library in production. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
8. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:359
- Draft comment:
Added a trailing comma in the error dictionary literal to conform with common style guidelines. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_aM3JOjEEDUDZxk7I
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed cc0c5b6 in 2 minutes and 1 seconds. Click for details.
- Reviewed
126lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:101
- Draft comment:
Consider using a context manager or a fixture for creating temporary files to ensure cleanup even if a test fails. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:39
- Draft comment:
Consider adding assertions to verify that the metadata provided in ExternalAttachment is correctly processed. - Reason this comment was not posted:
Comment was on unchanged code.
3. packages/traceloop-sdk/tests/dataset/test_dataset_with_attachments.py:136
- Draft comment:
It might be beneficial to add a test case for S3 upload failure to ensure graceful error handling. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_TaBldfGZcToFyriU
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed fd39dfe in 46 seconds. Click for details.
- Reviewed
13lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:286
- Draft comment:
Good removal of unused 'import copy'. Consider updating the comment to indicate that a shallow copy is being built, to avoid confusion if a true deep copy is required. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_OLwGKWOfyqH6mFeY
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
320-322: Use logging instead of print() for diagnostics.This warning should use a proper logging mechanism (e.g.,
logging.warning) for better production diagnostics and log aggregation.As per past review comments, this has already been flagged.
349-351: Use logging instead of print() for diagnostics.This warning should use a proper logging mechanism (e.g.,
logging.warning) for better production diagnostics and log aggregation.As per past review comments, this has already been flagged.
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
288-297: Consider usingcopy.deepcopyfor truly nested-safe copying.The manual iteration creates a shallow copy for non-attachment values. If row values contain nested dicts or lists (e.g.,
{"col": {"nested": "value"}}), the nested objects would be shared between the original and cleaned request. This could lead to unintended mutations if the original request is modified elsewhere.If nested values are possible, apply this diff:
+ import copy + # Create a deep copy of rows to avoid modifying the original - clean_rows = [] - for row in request.rows: - clean_row = {} - for col_slug, value in row.items(): - if isinstance(value, (Attachment, ExternalAttachment)): - clean_row[col_slug] = None - else: - clean_row[col_slug] = value - clean_rows.append(clean_row) + clean_rows = copy.deepcopy(request.rows) + for row in clean_rows: + for col_slug, value in list(row.items()): + if isinstance(value, (Attachment, ExternalAttachment)): + row[col_slug] = None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
348-348: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.12)
🔇 Additional comments (4)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)
3-23: LGTM!The new imports are clean and all are used by the attachment handling functionality added in this file.
61-103: LGTM!The three-phase workflow (extract, create, post-process) is well-structured and the comprehensive docstring with example usage is helpful.
254-271: LGTM!The extraction logic correctly identifies and collects attachment objects from row values.
328-357: Broad exception catch is acceptable for graceful degradation.While static analysis flags the broad
Exceptioncatch at line 348, it's appropriate here—you want to prevent one attachment failure from breaking the entire dataset creation. The error state is properly recorded in row values (lines 352-357) for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 443c64f in 2 minutes and 38 seconds. Click for details.
- Reviewed
1857lines of code in7files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:32
- Draft comment:
Consider supporting file-like objects (e.g. BinaryIO) along with bytes for the 'data' parameter. Removing BinaryIO may limit use cases with streaming or large file uploads. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is making a suggestion about API design - that supporting BinaryIO would be beneficial for certain use cases. However, looking at the rules, I need to consider: 1) Is this about a code change? Yes, the type was explicitly changed. 2) Is this clearly a code change required? This is more of a "consider" suggestion, not a definitive issue. 3) Is this speculative? The comment says "may limit use cases" which is somewhat speculative. 4) The PR appears to be a simplification (as noted in the docstring change "Simplified implementation"), so removing BinaryIO support might be intentional. The comment doesn't point to a clear bug or problem, just suggests reconsidering a design decision that was deliberately made. The author explicitly simplified this code and removed BinaryIO support, which suggests this was an intentional design decision. The comment is asking them to reconsider that decision without strong evidence that it's wrong. This falls into the category of "speculative" comments about potential use cases rather than identifying a concrete problem. While the removal was likely intentional, the comment does raise a valid point about API capabilities. However, without evidence that BinaryIO support is actually needed (e.g., existing code that breaks, or a clear requirement), this is more of a design preference suggestion rather than identifying a clear issue that needs to be fixed. This comment should be deleted. It's a "consider" suggestion about a design decision that was deliberately made as part of simplification. There's no strong evidence that BinaryIO support is required, and the comment is speculative about "may limit use cases" without concrete examples.
2. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:138
- Draft comment:
Thumbnail upload logic now only uses 'thumbnail_path'. The 'thumbnail_data' parameter is ignored. Consider supporting in-memory thumbnail data as well. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This appears to be a valid observation - the code acceptsthumbnail_dataas a parameter but never uses it. However, I need to consider the rules: 1) This is about code that was changed (the thumbnail upload logic was simplified), 2) The comment is suggesting a feature addition ("Consider supporting...") rather than pointing out a bug, 3) The PR seems to be intentionally simplifying the code (removing BinaryIO support, removing examples, etc.), so ignoringthumbnail_datamight be intentional, 4) The comment doesn't clearly indicate a required code change - it says "Consider supporting" which is speculative/suggestive rather than definitive. The comment might be pointing out intentional simplification rather than an oversight. The PR appears to be deliberately simplifying the implementation (removing BinaryIO support, removing helper methods, etc.). The author may have intentionally removedthumbnail_datasupport while keeping the parameter for backward compatibility or future use. While the simplification might be intentional, accepting a parameter that is completely ignored is poor API design and could confuse users. However, the comment uses "Consider supporting" language which makes it a suggestion rather than identifying a clear bug. According to the rules, I should not keep speculative or suggestive comments that don't clearly require a code change. The comment is suggestive ("Consider supporting...") rather than identifying a definite issue. While it's true thatthumbnail_datais ignored, this appears to be part of an intentional simplification. The comment doesn't clearly indicate required action and reads more as a feature request than a bug report.
3. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:268
- Draft comment:
The get_url() method for internal storage always returns None. Consider adding a TODO note or implementing presigned URL generation for internal attachments. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:147
- Draft comment:
The upload confirmation call (HTTP PUT) doesn't validate the response. It might be beneficial to check the server response or handle errors more explicitly. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment uses language like "might be beneficial" which is speculative and suggestive rather than definitive. According to the rules, I should not keep comments that are speculative or that don't clearly require a code change. The comment doesn't say there's definitely a bug or problem - it just suggests it "might be beneficial" to add validation. This is a code quality suggestion but not a clear requirement. Looking at the diff, the previous version had a_confirm_uploadmethod that did checkresult.get("success", False), but the new simplified version removes this check. However, the comment doesn't point out that functionality was lost - it just makes a general suggestion about validation. The rules state "Do NOT comment unless there is clearly a code change required" and this seems more like an optional improvement suggestion. However, looking at the diff more carefully, the old code DID validate the response withreturn result.get("success", False)and would raise an exception if result was None. The new code removes this validation entirely. So this might actually be pointing out that error handling was removed, which could be a legitimate issue rather than just a speculative suggestion. While the old code had validation, the PR author intentionally simplified this code (as stated in the docstring "Simplified implementation"). The fact that they removed the validation suggests it was intentional. The comment doesn't definitively say this is wrong - it uses weak language like "might be beneficial". Without strong evidence that this is definitely a bug or problem, and given the speculative nature of the comment, it should be deleted per the rules. The comment should be deleted because it uses speculative language ("might be beneficial") and doesn't clearly identify a required code change. While validation was removed from the previous version, the simplified implementation appears intentional, and the comment doesn't provide strong evidence that this is definitely problematic.
5. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:166
- Draft comment:
Consider logging exceptions in _upload_to_s3 instead of silently returning False. This could help diagnose issues during file uploads. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is suggesting a code quality improvement - adding logging instead of silently swallowing exceptions. However, looking at the diff, the exception handling pattern in_upload_to_s3wasn't actually changed - it existed before and still exists after. The diff shows the method was simplified (removed file-like object handling), but the try/except pattern returning False on exception was present in both versions. Since the comment is about code that wasn't changed in this diff, it violates the rule that comments should be about changes made by the diff. While the suggestion to add logging is reasonable from a code quality perspective, I need to verify whether this exception handling pattern was actually introduced or modified in this diff. If it was pre-existing and unchanged, the comment should be deleted per the rules. Looking at the diff more carefully, the_upload_to_s3method's exception handling appears in both the old and new versions. The diff shows the method was refactored but theexcept Exception: return Falsepattern was not introduced or modified in this change - it was already there. Therefore, this comment is about unchanged code and should be deleted. The comment should be deleted because it's about code that wasn't changed in this diff. The exception handling pattern in_upload_to_s3existed before this PR and remains the same after.
Workflow ID: wflow_fvA1MYY4b1ZaohJC
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (2)
109-179: Add timeouts and tighten error handling for upload and thumbnail requestsThe
requests.putcalls for the main file upload (Line 173) and thumbnail upload (Line 150) have no timeout, so a hung S3/presigned endpoint can block indefinitely. Static analysis (S113) and prior review comments already flagged this. Also,_upload_to_s3swallows all exceptions withexcept Exception, losing the root cause.I recommend:
- Add explicit timeouts for both uploads (e.g., 300s for main file, 60s for thumbnail).
- Narrow the exception handler in
_upload_to_s3torequests.RequestExceptionso file/OS errors still surface.Example diff:
- if thumb_bytes is not None: - requests.put(upload_response.thumbnail_upload_url, data=thumb_bytes) + if thumb_bytes is not None: + requests.put( + upload_response.thumbnail_upload_url, + data=thumb_bytes, + timeout=60, + ) @@ - try: - file_data = self._get_file_data() - response = requests.put( - upload_url, data=file_data, headers={"Content-Type": self.content_type} - ) - return response.status_code in [200, 201, 204] - except Exception: - return False + try: + file_data = self._get_file_data() + response = requests.put( + upload_url, + data=file_data, + headers={"Content-Type": self.content_type}, + timeout=300, + ) + return response.status_code in [200, 201, 204] + except requests.RequestException: + return FalseOptionally, you might also check the result of the
upload-statushttp_client.putand raise if it fails, so callers are aware when confirmation does not persist.
255-264: Add timeout and consider a dedicated helper/exception for downloadsThe
dataproperty issues a rawrequests.get(download_url)without a timeout (Line 261), so a stuck endpoint can hang indefinitely. This was previously flagged for_download_data; the same concern applies here, and Ruff reportsS113.Suggested change:
- response = requests.get(download_url) + response = requests.get(download_url, timeout=300) response.raise_for_status() self._cached_data = response.contentYou might also want to:
- Extract this into a small
_download_data()helper (as you previously had) to centralize future tweaks (retries, streaming, etc.).- Replace
raise Exception("No download URL available")with a more specific custom exception (e.g.,AttachmentError) to make error handling clearer.
🧹 Nitpick comments (3)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)
90-107: Consider streaming large files instead of fully loading into memory
_get_file_dataand_get_file_sizeread the entire file into memory (f.read()/len(self.data)), which is fine for small/medium attachments but can become costly for large files. A future improvement would be to support streaming (e.g., passing a file-like object/chunk iterator to the upload path) to avoid holding the whole file in memory at once. This can be deferred if datasets are expected to carry relatively small blobs.
186-227: Clarify use ofcontent_type/filenameand server response inExternalAttachment.attachA couple of minor design points here:
content_typeandfilenameare stored on the instance but never sent to the API or surfaced in the returnedAttachmentReference. If those fields are meant to influence server-side behavior or downstream display, consider wiring them intometadataor the request model; otherwise they are effectively unused.- The
attachmethod only checks thatresultis truthy and then ignores any details the server might return (e.g., a storage key or normalized URL). If the API ever exposes richer information, you may want to capture it in theAttachmentReferenceinstead of just echoingself.urlandself.metadata.These are not blockers but worth aligning with how you expect external attachments to be represented long-term.
276-281: Internal attachment downloads are currently unsupported; consider making this explicit
get_urlreturns the external URL forFileStorageType.EXTERNAL, but alwaysNonefor internal storage, which makesAttachmentReference.dataunusable for internal attachments created byAttachment.upload(it will raise “No download URL available”).If that’s intentional for now, consider:
- Documenting explicitly that
AttachmentReference.data/download()only work for external attachments today, or- Raising a more explicit
NotImplementedError/custom exception forINTERNALstorage, or- Implementing presigned URL generation for internal storage using
storage_keyandhttp_clientwhen available.This will make the API behavior less surprising for callers who obtain an
AttachmentReferencefromAttachment.upload.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (2)
packages/traceloop-sdk/traceloop/sdk/client/http.py (2)
HTTPClient(7-91)put(74-91)packages/traceloop-sdk/traceloop/sdk/dataset/model.py (6)
ExternalURLRequest(69-72)FileCellType(16-20)FileStorageType(23-25)UploadStatusRequest(64-66)UploadURLRequest(47-52)UploadURLResponse(55-61)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py
44-44: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
96-96: Avoid specifying long messages outside the exception class
(TRY003)
99-99: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Create your own exception
(TRY002)
132-132: Avoid specifying long messages outside the exception class
(TRY003)
138-138: Create your own exception
(TRY002)
138-138: Avoid specifying long messages outside the exception class
(TRY003)
150-150: Probable use of requests call without timeout
(S113)
173-173: Probable use of requests call without timeout
(S113)
176-176: Consider moving this statement to an else block
(TRY300)
177-177: Do not catch blind exception: Exception
(BLE001)
220-220: Create your own exception
(TRY002)
220-220: Avoid specifying long messages outside the exception class
(TRY003)
260-260: Create your own exception
(TRY002)
260-260: Avoid specifying long messages outside the exception class
(TRY003)
261-261: Probable use of requests call without timeout
(S113)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (1)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (1)
30-78: Constructor validation and type inference look goodThe mutual exclusion and required checks for
file_pathvsdata, plus filename/content-type andFileCellTypeinference, are coherent and defensive. No changes needed here.
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add sample app
| "metadata": ref.metadata, | ||
| } | ||
| except Exception as e: | ||
| print( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not print, use logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 77b8c26 in 57 seconds. Click for details.
- Reviewed
363lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/examples/dataset_attachments_example.py:32
- Draft comment:
Consider centralizing the SDK initialization (Traceloop.init) in main() rather than only in one example, to ensure consistent setup across all examples. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/traceloop-sdk/examples/dataset_attachments_example.py:109
- Draft comment:
Consider using a context manager for temporary files, if possible, to manage cleanup automatically rather than relying on manual os.unlink calls. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/traceloop-sdk/examples/dataset_attachments_example.py:180
- Draft comment:
When cleaning up temporary files (using os.unlink), consider checking for file existence to gracefully handle any unexpected cleanup errors. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_ypNf8Db7jyOhg3dy
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
packages/traceloop-sdk/examples/dataset_attachments_example.py (3)
1-10: Align shebang usage with how the script is invokedRuff flags EXE001 because the file has a shebang but (in the repo) isn’t executable. If this script is meant to be run as
python dataset_attachments_example.py, consider dropping the shebang; if you want./dataset_attachments_example.py, make sure the file is committed with the executable bit.Example change if you choose to remove it:
-#!/usr/bin/env python3
28-35: Consider centralizingTraceloop.initinmain()instead of inside one exampleRight now only
example_external_attachments()callsTraceloop.init, so calling otherexample_*functions directly (e.g., from a REPL) will likely fail if Traceloop wasn’t initialized elsewhere. Initializing once inmain()and assuming it in the examples would be more robust and clearer.Suggested change inside
example_external_attachments:- # Initialize Traceloop - Traceloop.init(app_name="attachment-demo") - datasets = Datasets() + # Datasets client (assumes Traceloop.init(...) was called by the caller) + datasets = Datasets()And then in
main()(outside this hunk), initialize once:def main(): """Run all examples.""" print("=" * 60) print("Traceloop SDK Attachment Feature Examples") print("=" * 60) Traceloop.init(app_name="attachment-demo") ...
334-336: Document API key configuration without implying hardcoding in codeTo better align with the “no hardcoded secrets” guideline, it’s clearer to show setting the API key via the shell environment rather than via
os.environin code, even in comments.- # Set your API key - # os.environ["TRACELOOP_API_KEY"] = "your-api-key-here" + # Ensure TRACELOOP_API_KEY is set in your environment, e.g.: + # export TRACELOOP_API_KEY="your-api-key-here"As per coding guidelines, this keeps secrets configuration outside the code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/examples/dataset_attachments_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/examples/dataset_attachments_example.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/examples/dataset_attachments_example.py (2)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)
Attachment(24-178)ExternalAttachment(181-227)data(255-264)packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)
FileCellType(16-20)ColumnDefinition(75-78)ColumnType(8-13)CreateDatasetRequest(84-89)
🪛 Flake8 (7.3.0)
packages/traceloop-sdk/examples/dataset_attachments_example.py
[error] 239-239: f-string is missing placeholders
(F541)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/examples/dataset_attachments_example.py
1-1: Shebang is present but file is not executable
(EXE001)
239-239: f-string without any placeholders
Remove extraneous f prefix
(F541)
351-351: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (1)
packages/traceloop-sdk/examples/dataset_attachments_example.py (1)
28-310: Example workflows look consistent with the attachment APIThe four
example_*functions useColumnDefinition,ColumnType.FILE,Attachment,ExternalAttachment, andFileCellTypein a way that’s consistent with the dataset and attachment models, and the column slugs/row keys line up correctly. No functional issues jump out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 65c207d in 1 minute and 11 seconds. Click for details.
- Reviewed
40lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/examples/dataset_attachments_example.py:23
- Draft comment:
Reordering the import of 'Datasets' is fine, but please ensure consistent grouping of related imports per project style guidelines. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/traceloop-sdk/examples/dataset_attachments_example.py:184
- Draft comment:
The header print for 'In-Memory Data Attachments' has been removed. For consistency with other examples, consider re-adding a header to clearly delineate this example. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/traceloop-sdk/examples/dataset_attachments_example.py:236
- Draft comment:
Changed from an f-string to a plain string for printing attachment status. Since no interpolation is needed, this change is purely stylistic. - Reason this comment was not posted:
Confidence changes required:10%<= threshold50%None
Workflow ID: wflow_9FXxNmCd5xEXc87u
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/traceloop-sdk/examples/dataset_attachments_example.py (1)
339-352: Broadexcept Exceptioninmainstill trips BLE001 (repeat of prior feedback)The catch‑all around the example runner is reasonable for UX, but it continues to trigger Ruff BLE001 and was already flagged in an earlier review.
To keep the friendly error while satisfying lint, you can explicitly document/suppress the broad catch:
- except Exception as e: - print(f"\nError: {e}") - print("Make sure to set TRACELOOP_API_KEY environment variable") + except Exception as e: # noqa: BLE001 # Broad catch is intentional for this example script + print(f"\nError: {e}") + print("Make sure to set TRACELOOP_API_KEY environment variable")If the SDK exposes specific exception types for auth/config errors, narrowing the
exceptto those would be even better.
🧹 Nitpick comments (3)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
276-309: Preserve futureCreateDatasetRequestfields when cleaning rowsRight now
_prepare_request_for_creationreconstructsCreateDatasetRequestby manually listing fields. If new fields are later added toCreateDatasetRequest(e.g., tags, options), they’ll be silently dropped here.You can make this future‑proof by cloning the model and only updating
rows:- # Create a new request with cleaned rows - return CreateDatasetRequest( - slug=request.slug, - name=request.name, - description=request.description, - columns=request.columns, - rows=clean_rows, - ) + # Create a new request with cleaned rows, preserving any additional fields + return request.model_copy(update={"rows": clean_rows}) + # If you are still on Pydantic < 2, use instead: + # return request.copy(update={"rows": clean_rows})Please pick
model_copyvscopyaccording to the Pydantic version actually used in this project.
311-360: Tighten or explicitly document the broadExceptioncatch in_process_attachmentsCatching bare
Exceptionhere triggers Ruff BLE001 and can hide unexpected programmer errors, even though you likely want best‑effort behavior per cell.A balanced approach is to (a) keep the catch‑all for UX, (b) use structured logging rather than an f-string, and (c) explicitly mark the broad catch as intentional:
- except Exception as e: - logger.warning( - f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}" - ) + except Exception as e: # noqa: BLE001 # Broad catch is intentional: attachment failures shouldn't break dataset creation + logger.warning( + "Failed to process attachment for row %s, column %s: %s", + row_idx, + col_slug, + e, + )If the attachment methods expose well-defined exception types, narrowing the except clause to those would be even better; otherwise the above makes the catch‑all explicit and lint‑clean.
packages/traceloop-sdk/examples/dataset_attachments_example.py (1)
1-1: Address shebang vs executability (EXE001)Ruff flags the shebang because this file isn’t marked executable. For a library example that’s typically run via
python dataset_attachments_example.py, it’s simplest to drop the shebang:-#!/usr/bin/env python3Alternatively, ensure the file is installed with executable permissions so the shebang is meaningful.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/traceloop-sdk/examples/dataset_attachments_example.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.pypackages/traceloop-sdk/examples/dataset_attachments_example.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/examples/dataset_attachments_example.py (4)
packages/traceloop-sdk/traceloop/sdk/__init__.py (1)
init(49-206)packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)
Attachment(24-178)ExternalAttachment(181-227)data(255-264)packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)
FileCellType(16-20)ColumnDefinition(75-78)ColumnType(8-13)CreateDatasetRequest(84-89)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
Datasets(29-360)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
351-351: Do not catch blind exception: Exception
(BLE001)
packages/traceloop-sdk/examples/dataset_attachments_example.py
1-1: Shebang is present but file is not executable
(EXE001)
350-350: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
🔇 Additional comments (3)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (3)
2-26: Imports and logger wiring look correctUsing
loggingplus a module-levelloggerand importingAttachment/ExternalAttachmentandValuesMapcleanly set up the rest of the attachment-aware workflow. No issues here.
64-106: Attachment-awarecreate()flow is soundThe
create()method cleanly separates concerns: it extracts attachments, sends a “clean”CreateDatasetRequestwithout attachment objects, then post-processes the createdDatasetto upload/link attachments and update row values. This preserves server-side validation and keeps attachment failures localized.
257-275: Attachment extraction logic is straightforward and correct
_extract_attachmentscorrectly walksrequest.rows, capturing anyAttachment/ExternalAttachmentinstances into a{row_idx: {col_slug: attachment}}map without mutating the original request. This matches how_process_attachmentslater consumes the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
282-315: Consider using the request model’s copy functionality instead of rebuilding by hand.Manually reconstructing
CreateDatasetRequestwith onlyslug,name,description,columns, androwsmakes_prepare_request_for_creationbrittle if the model gains new fields (e.g., tags, settings) in the future. You can future‑proof this by cloning and updating justrowsvia the Pydantic copy API instead of re‑passing individual fields.For example (adapt to your Pydantic version):
# Pydantic v1 style: clean_request = request.copy(update={"rows": clean_rows}) # or Pydantic v2 style: clean_request = request.model_copy(update={"rows": clean_rows})This keeps all other fields intact while still stripping attachments from rows.
317-366: Align broadexcept Exceptionwith Ruff BLE001 and improve logging detail.Catching a bare
Exceptionhere is intentional (best‑effort attachment processing), but Ruff flags this asBLE001. Making this explicit and using parameterized logging would both satisfy the linter and improve diagnostics.One option that preserves behavior:
- except Exception as e: - logger.warning( - f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}" - ) + except Exception as e: # noqa: BLE001 + logger.warning( + "Failed to process attachment for row %s, column %s: %s", + row_idx, + col_slug, + e, + )This keeps the “catch‑all” semantics while making the linter suppression explicit and using logger‑style formatting. If you know the concrete exception types
upload/attachcan raise, narrowing theexceptto those would be even better.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
🪛 Flake8 (7.3.0)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
[error] 5-5: redefinition of unused 'List' from line 3
(F811)
[error] 5-5: redefinition of unused 'Optional' from line 3
(F811)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
5-5: Redefinition of unused List from line 3
Remove definition: List
(F811)
5-5: Redefinition of unused Optional from line 3
Remove definition: Optional
(F811)
357-357: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (2)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
65-107:Datasets.createattachment flow looks coherent and side‑effect free.The sequence of extracting attachments, building a cleaned request, creating the dataset, and then post‑processing attachments is logically sound and leaves the original
CreateDatasetRequestunmodified. Using the new helpers keeps the method readable.
263-280: Attachment extraction helper is clear and non‑mutating.
_extract_attachmentscorrectly walksrequest.rows, identifiesAttachment/ExternalAttachmentinstances, and builds a{row_idx: {col_slug: attachment}}map without mutating the original rows. This matches how_process_attachmentslater consumes the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed e08cf4b in 41 seconds. Click for details.
- Reviewed
14lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:3
- Draft comment:
Good consolidation of duplicate 'typing' imports into a single statement. Consider ordering the imported names alphabetically for consistency. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_zRYJMHIXbQxQuJuz
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
262-279: LGTM! Attachment extraction logic is sound.The method correctly identifies and extracts
AttachmentandExternalAttachmentobjects from row values.Optional: Consider a more specific return type for improved type safety:
- ) -> Dict[int, Dict[str, Any]]: + ) -> Dict[int, Dict[str, Union[Attachment, ExternalAttachment]]]:You would need to add
Unionto the typing imports.
316-365: Consider more specific exception handling for better error visibility.The broad
Exceptioncatch on line 356 enables resilient bulk processing (logging failures and continuing), but it may mask unexpected errors.Consider catching more specific exceptions or adding an exception type check:
- except Exception as e: + except (IOError, OSError, RuntimeError, ValueError) as e: logger.warning( f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}" )If you need to catch all exceptions for resilience, document the rationale:
- except Exception as e: + except Exception as e: # Catch all to continue processing remaining attachments logger.warning( f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}" )Alternatively, consider collecting and returning failed attachments so callers can decide how to handle them.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)
packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
HTTPClient(7-91)packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (4)
Attachment(24-178)ExternalAttachment(181-227)upload(109-167)attach(200-227)packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py (2)
Dataset(20-112)from_create_dataset_response(39-52)packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)
ColumnDefinition(75-78)ColumnType(8-13)CreateDatasetRequest(84-89)CreateDatasetResponse(99-108)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
356-356: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (3)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (3)
2-26: LGTM! Import consolidation and logger initialization look good.The duplicate typing imports have been properly fixed, and logger initialization follows Python best practices.
64-106: LGTM! Well-structured dataset creation flow with attachment support.The orchestration logic cleanly separates concerns: extract attachments, sanitize the request, create the dataset, then process attachments. The comprehensive docstring with example aids maintainability.
281-314: LGTM! Request sanitization preserves immutability and correctly replaces attachments.The manual deep copy approach is appropriate, and the method leaves the original request unmodified while preparing a clean version for dataset creation.
…g in attachment and datasets modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 5cee01d in 1 minute and 12 seconds. Click for details.
- Reviewed
62lines of code in2files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:249
- Draft comment:
Added explicit type annotation for _cached_data for clarity. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py:265
- Draft comment:
Renamed local variable to 'file_data' in download() for clearer intent. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:271
- Draft comment:
Added explicit type annotation for 'attachments' in _extract_attachments. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:297
- Draft comment:
Explicitly typed 'clean_row' in _prepare_request_for_creation. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:323
- Draft comment:
Added guard in _process_attachments to immediately exit if dataset.rows is empty. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_qsHKx3HMawwyLToo
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped PR review on 9b00146 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
packages/sample-app/sample_app/dataset_attachments_example.py (5)
350-352: Address broadexcept Exceptionfor better error handling.The broad exception catch makes it difficult to distinguish between different failure modes (e.g., TypeError from incorrect
Datasets()instantiation vs. network errors vs. authentication failures).For a demo script, if you intentionally want a catch-all for user experience, make it explicit:
- except Exception as e: + except Exception as e: # noqa: BLE001 # Broad catch is intentional for this example script print(f"\nError: {e}") - print("Make sure to set TRACELOOP_API_KEY environment variable") + print("Make sure TRACELOOP_API_KEY environment variable is set") + print("For detailed error information, run examples individually")Alternatively, narrow to expected exception types:
except (TypeError, RuntimeError, requests.RequestException) as e: print(f"\nError: {e}") print("Make sure TRACELOOP_API_KEY environment variable is set")
34-35: FixDatasets()instantiation - missing requiredHTTPClientparameter.The
Datasetsclass requires anHTTPClientparameter in its__init__method (seepackages/traceloop-sdk/traceloop/sdk/datasets/datasets.pyline 36). CallingDatasets()without arguments will raiseTypeError: __init__() missing 1 required positional argument: 'http'.The correct pattern is to use the
datasetsattribute from the initialized Traceloop client:Apply this diff:
- Traceloop.init(app_name="attachment-demo") - datasets = Datasets() + client = Traceloop.init(app_name="attachment-demo") + datasets = client.datasetsThis same issue also affects lines 119, 188, and 251.
Based on learnings, the established pattern in the Traceloop SDK is to obtain properly initialized instances from the Client class rather than constructing them directly.
119-119: FixDatasets()instantiation - missing requiredHTTPClientparameter.Same issue as line 35. Replace with:
client = Traceloop.init(app_name="attachment-demo") datasets = client.datasets
188-188: FixDatasets()instantiation - missing requiredHTTPClientparameter.Same issue as lines 35 and 119. Replace with:
client = Traceloop.init(app_name="attachment-demo") datasets = client.datasets
251-251: FixDatasets()instantiation - missing requiredHTTPClientparameter.Same issue as lines 35, 119, and 188. Replace with:
client = Traceloop.init(app_name="attachment-demo") datasets = client.datasets
🧹 Nitpick comments (2)
packages/sample-app/sample_app/dataset_attachments_example.py (2)
1-1: Make the file executable to match the shebang.The shebang indicates this file is intended to be run directly, but the file lacks executable permissions.
Apply this command to fix:
chmod +x packages/sample-app/sample_app/dataset_attachments_example.py
186-189: Add missing example header print statement for consistency.Unlike the other three examples (lines 31, 106, 243), this function doesn't print a header banner. This makes the console output less clear when running all examples.
Add this after the function definition:
def example_in_memory_attachments(): """Example: Creating a dataset with in-memory data attachments.""" + print("\n=== Example 3: In-Memory Attachments ===") + datasets = Datasets()Also, fix the
Datasets()instantiation issue (see separate comment).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/sample-app/sample_app/dataset_attachments_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/dataset_attachments_example.py
🧠 Learnings (1)
📚 Learning: 2025-08-04T15:35:30.188Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.
Applied to files:
packages/sample-app/sample_app/dataset_attachments_example.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/dataset_attachments_example.py (4)
packages/traceloop-sdk/traceloop/sdk/dataset/attachment.py (3)
Attachment(24-178)ExternalAttachment(181-227)data(255-264)packages/traceloop-sdk/traceloop/sdk/dataset/model.py (4)
FileCellType(16-20)ColumnDefinition(75-78)ColumnType(8-13)CreateDatasetRequest(84-89)packages/traceloop-sdk/tests/conftest.py (1)
datasets(227-233)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
Datasets(29-368)
🪛 Ruff (0.14.5)
packages/sample-app/sample_app/dataset_attachments_example.py
1-1: Shebang is present but file is not executable
(EXE001)
350-350: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (2)
packages/sample-app/sample_app/dataset_attachments_example.py (2)
12-26: LGTM!Imports are well-organized and follow security best practices with no hardcoded credentials.
355-356: LGTM!Standard Python entry point pattern is correctly implemented.
…metry into gz/add-files-to-dataset
…dule and update imports accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed fb871d9 in 1 minute and 26 seconds. Click for details.
- Reviewed
184lines of code in9files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/dataset_attachments_example.py:16
- Draft comment:
Import paths updated to use the 'traceloop.sdk.datasets' module. This change is consistent with the new package structure. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/sample-app/sample_app/dataset_example.py:10
- Draft comment:
Updated import to use 'traceloop.sdk.datasets' instead of the old 'dataset' module. This ensures consistency with the new API structure. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py:8
- Draft comment:
Imports for Attachment, ExternalAttachment, and FileCellType have been updated to the new datasets package. The changes look correct. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:297
- Draft comment:
When preparing the request for creation (_prepare_request_for_creation), consider using a deep copy (e.g. via copy.deepcopy) to handle nested structures in row values, ensuring the original request remains fully unmodified. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
5. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:360
- Draft comment:
In the attachment processing section (_process_attachments), consider using logger.exception(e) within the exception block to capture the full traceback, which can aid in debugging failures during attachment uploads. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
6. packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py:240
- Draft comment:
The _slugify method works well; however, consider adding unit tests to cover edge cases (e.g., input with only special characters or excessive whitespace) to ensure robust slug generation. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_Y3jeBiARa9zDuyem
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (4)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml (1)
85-85: Scrub AWS credentials fromupload_urlin cassette.This is the same issue flagged in the previous review. The
upload_urlJSON field still embeds AWS credentials (X-Amz-Credential,X-Amz-Security-Token,X-Amz-Signature), which violates the coding guideline "Never commit secrets or PII in VCR cassettes." Even if expired, these must be replaced with clearly fake placeholders to pass security scans (Checkov, Gitleaks).Apply this diff to scrub the sensitive values while preserving the URL structure and JSON deserialization:
- string: '{"upload_url":"https://traceloop-staging-files.s3.amazonaws.com/c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=ASIAQEMAC2MSQIN6EEGS%2F20251124%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20251124T212508Z\u0026X-Amz-Expires=3600\u0026X-Amz-Security-Token=IQoJb3JpZ2luX2VjEJb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIH3%2Bh%2BsH0KvYJqXGuQ0GF0xk8%2BkRicgYH8Ltvo08DLW%2FAiAfcHu0cnrxCIkB7hd2UBATZ1ZSbOQq8L38mh6%2BkJf18Cr5BAheEAAaDDAwOTM5NTQyNjA4NSIMCJbAo9CFNseeyr20KtYEipuqeoYbzFZDx8VMzfnyRA58M7AkLLzZ14X6E8WNz3Fo%2BRbtOZeDaIgG6Q1vvqFtg%2B3JxRcNUyU3Q372FdJsfQZnWlI%2BpmZLykoja%2FasIHEz%2B8h9eTz8tl97XYnWafZTgVHFWD2SJXmFchA0rLVVyRi0YvG6gq4X02ns0Xgn718hzuBvSqbnJ82oa0RvhrAyKeGvCPQwxhghqyNpG0ntXxJzmkgezFZSelFnz7T4FwKXr8pKgEu%2BRCPBG%2BSHGDT1DhZ3sG2F5ORq8aJyj%2FKkGhENcyogtju9Kgaq8oVMJtV%2FrQEkxuftMFrnvaO9vR4e2tgk6p%2FJLv7akwI61%2Fn89pdSBjB0pklC4gGk5zBWwnPq814OhIGcCzmlf1gdD8k4DR1AUCy9Q2wGwnWScpsSEipr3r7i0kvXZIRlx2YVGXOnk0XJHKd32RM9wrpcFb0LSxkABelmMGsYll%2BTktuBkOKJidBLBjzE2kU4oKbbmtcv8FKyXTwXQALerpO54UVzuABh6gmLAWAZspWEcJ%2B2M8tC1n5GxFICqvhgXzFHL7x%2FilwJcjhH9I%2Bt28K7RGIqQpQfTffC2gJtUAUcohLFtmdahNQ6DHLuPiBSKk9XUFd8l5URLndJVNhzMos580RO%2FrcIHQv8FU95kEyrNuaCH6nxk3xC2ef97AyikYu%2BPrdfRDn2JPENF86x4q0Qu1FzErRBQX%2Ft6osQ9TVP3wdnswk41b9EQO4nxzikMp9h1MEoZUETf1nWpelQNxgva7PvCm7ci0NVDNrm20PfbIdNSmnaoYxFsjDykJPJBjqZASCiHdsuYM4lR%2FDje4Zp0IzfS2KDr8069je%2FGD91v%2FurCR5IQMToDFUyIiHRBZ5F%2FW93oSK6z3OpqOXoxyG2iDmmmY5VxR9iRNqjVFu%2FjZLZh53RetQgplscJRgnqD3q8ICez2qwQJY%2FcPwnVddN6CngNOz%2FPbjZ33eETP8V9IizhuKqBA4ExS75x2n50DpzQZei%2ByGCuvYeZw%3D%3D\u0026X-Amz-SignedHeaders=content-type%3Bhost\u0026X-Amz-Signature=e961a868ba46c0a58b13af2d7a2f56eaeec39c6f177f20050341ec45af00f1fb","storage_key":"c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg","expires_at":"2025-11-24T22:25:08.316243221Z","method":"PUT"}' + string: '{"upload_url":"https://traceloop-staging-files.s3.amazonaws.com/c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=REDACTED_CREDENTIAL%2F20251124%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20251124T212508Z\u0026X-Amz-Expires=3600\u0026X-Amz-Security-Token=REDACTED_TOKEN\u0026X-Amz-SignedHeaders=content-type%3Bhost\u0026X-Amz-Signature=REDACTED_SIGNATURE","storage_key":"c108269c-cf1e-4ac6-a7e4-5a456cc9fdb7/744992d1-7f89-4bf7-9978-38db53dd12ae.jpg","expires_at":"2025-11-24T22:25:08.316243221Z","method":"PUT"}'The diff replaces the actual AWS access credential, session token, and signature with clearly fake placeholders while preserving the JSON structure and URL encoding so test deserialization works unchanged.
packages/sample-app/sample_app/dataset_attachments_example.py (2)
350-352: Broad exception catch acceptable for demo script.For example code, the catch-all provides a user-friendly fallback. If you want to silence the Ruff warning, add an inline comment.
- except Exception as e: + except Exception as e: # noqa: BLE001 - Catch-all for user-friendly error in demo
34-35:Datasets()instantiation will fail - requires HTTPClient argument.The
Datasetsclass requires anHTTPClientparameter in__init__, but this example instantiates it without arguments. This will raiseTypeErrorat runtime.As noted in a previous review (marked as addressed), you should use the pre-configured
datasetsattribute from the Traceloop client:# Initialize Traceloop - Traceloop.init(app_name="attachment-demo") - datasets = Datasets() + client = Traceloop.init(app_name="attachment-demo") + datasets = client.datasetsThis same issue applies to lines 119, 188, and 251.
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml (1)
86-86: AWS credentials in presigned URLs should be scrubbed.This cassette contains AWS credentials (
X-Amz-Security-Token,X-Amz-Credential) in the presigned S3 URLs. Per coding guidelines for cassettes, sensitive data should never be committed.Although a previous review indicated this was addressed, the credentials are still present. Please scrub these by replacing the query parameters with placeholders like
?<REDACTED>while preserving thestorage_key,method, andexpires_atfields.Also applies to line 202.
🧹 Nitpick comments (7)
packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1)
3-72: File/attachment models are well-structured; consider tighteningstatustypingThe new file-related enums/models (
FileCellType,FileStorageType,FileCellMetadata,FileCellValue, and upload/external URL request/response models) look consistent with the attachment workflows and should integrate cleanly with the rest of the SDK.If you want stronger guarantees, you could replace the free-form
strstatusfields inFileCellValueandUploadStatusRequestwith a dedicated enum orLiteralunion (e.g.,"in_progress" | "success" | "failed"), which would surface invalid states earlier while keeping JSON shapes the same.packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (1)
100-162: Consider usingtry/finallyfor reliable temp file cleanup.If any assertion fails before lines 161-162, the temporary files won't be cleaned up. Using a
try/finallyblock (as done intest_create_dataset_with_mixed_attachments) would ensure cleanup even on test failures.+ try: + # Create dataset request with file attachments + dataset_request = CreateDatasetRequest( # ... existing code ... + ) + + # Mock the S3 upload to succeed + with patch.object(Attachment, "_upload_to_s3", return_value=True): + dataset = datasets.create(dataset_request) + + # ... assertions ... + finally: + # Clean up + os.unlink(test_image.name) + os.unlink(test_pdf.name)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)
329-334: Redundant "Warning:" prefix in log message.The
logger.warning()call already indicates this is a warning. Remove the "Warning:" prefix from the message.- logger.warning( - f"Warning: Row index {row_idx} out of range, skipping attachments" - ) + logger.warning( + f"Row index {row_idx} out of range, skipping attachments" + )
359-367: Broad exception catch is acceptable here, but consider logging at debug level for the full traceback.The
except Exception(Ruff BLE001) is intentional for resilience—one failed attachment shouldn't prevent processing others. The warning log captures the error message, which is appropriate. Consider also logging the full exception at debug level for troubleshooting.except Exception as e: logger.warning( - f"Warning: Failed to process attachment for row {row_idx}, column {col_slug}: {e}" + f"Failed to process attachment for row {row_idx}, column {col_slug}: {e}" ) + logger.debug("Attachment processing exception details:", exc_info=True) # Mark as failed in row values row.values[col_slug] = { "type": "file", "status": "failed", "error": str(e), }packages/sample-app/sample_app/dataset_attachments_example.py (1)
1-1: Shebang present but file is not executable.Either remove the shebang or make the file executable with
chmod +x.packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (1)
17-29: Optional: Sort__all__for consistency.Ruff suggests applying isort-style sorting to
__all__. This is a minor style preference.__all__ = [ + "Attachment", + "AttachmentReference", + "BaseDatasetEntity", + "Column", + "ColumnType", "Dataset", - "Column", - "Row", - "BaseDatasetEntity", - "ColumnType", "DatasetMetadata", + "ExternalAttachment", "FileCellType", "FileStorageType", - "Attachment", - "ExternalAttachment", - "AttachmentReference", + "Row", ]packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (1)
194-198: Filename extraction may include query parameters.
url.split("/")[-1]doesn't strip query strings, so a URL likehttps://example.com/image.png?token=abcwould yieldimage.png?token=abcas the filename.- self.filename = filename or url.split("/")[-1] + from urllib.parse import urlparse + self.filename = filename or urlparse(url).path.split("/")[-1] or "attachment"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (16)
packages/sample-app/sample_app/dataset_attachments_example.py(1 hunks)packages/sample-app/sample_app/dataset_example.py(1 hunks)packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml(1 hunks)packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml(1 hunks)packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml(1 hunks)packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml(1 hunks)packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_without_attachments.yaml(1 hunks)packages/traceloop-sdk/tests/datasets/test_create_dataset.py(1 hunks)packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py(1 hunks)packages/traceloop-sdk/tests/datasets/test_datasets_operations.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/dataset/__init__.py(0 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py(4 hunks)packages/traceloop-sdk/traceloop/sdk/datasets/model.py(1 hunks)
💤 Files with no reviewable changes (1)
- packages/traceloop-sdk/traceloop/sdk/dataset/init.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.pypackages/traceloop-sdk/traceloop/sdk/datasets/dataset.pypackages/traceloop-sdk/traceloop/sdk/datasets/datasets.pypackages/sample-app/sample_app/dataset_attachments_example.pypackages/traceloop-sdk/tests/datasets/test_create_dataset.pypackages/traceloop-sdk/traceloop/sdk/datasets/attachment.pypackages/sample-app/sample_app/dataset_example.pypackages/traceloop-sdk/traceloop/sdk/datasets/model.pypackages/traceloop-sdk/traceloop/sdk/datasets/__init__.pypackages/traceloop-sdk/tests/datasets/test_datasets_operations.py
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yamlpackages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yamlpackages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_without_attachments.yamlpackages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yamlpackages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml
🧠 Learnings (4)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to **/cassettes/**/*.{yaml,yml,json} : Never commit secrets or PII in VCR cassettes; scrub sensitive data
Applied to files:
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yamlpackages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording
Applied to files:
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
📚 Learning: 2025-08-04T15:35:30.188Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.
Applied to files:
packages/sample-app/sample_app/dataset_attachments_example.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes
Applied to files:
packages/traceloop-sdk/tests/datasets/test_datasets_operations.py
🧬 Code graph analysis (7)
packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (6)
packages/traceloop-sdk/tests/conftest.py (1)
datasets(227-233)packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (3)
Attachment(24-178)ExternalAttachment(181-227)data(255-264)packages/traceloop-sdk/traceloop/sdk/datasets/model.py (4)
FileCellType(16-20)ColumnDefinition(75-78)ColumnType(8-13)CreateDatasetRequest(84-89)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
create(64-106)packages/traceloop-sdk/traceloop/sdk/datasets/column.py (1)
delete(32-51)packages/traceloop-sdk/traceloop/sdk/datasets/row.py (1)
delete(28-38)
packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)
packages/traceloop-sdk/tests/conftest.py (1)
datasets(227-233)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)
packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
HTTPClient(7-91)packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (4)
Attachment(24-178)ExternalAttachment(181-227)upload(109-167)attach(200-227)packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (2)
Dataset(20-112)from_create_dataset_response(39-52)packages/traceloop-sdk/traceloop/sdk/datasets/model.py (5)
ColumnDefinition(75-78)ColumnType(8-13)CreateDatasetRequest(84-89)CreateDatasetResponse(99-108)DatasetMetadata(151-159)
packages/traceloop-sdk/tests/datasets/test_create_dataset.py (2)
packages/traceloop-sdk/tests/conftest.py (1)
datasets(227-233)packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)
Dataset(20-112)
packages/sample-app/sample_app/dataset_example.py (5)
packages/traceloop-sdk/tests/conftest.py (1)
datasets(227-233)packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)
Dataset(20-112)packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1)
ColumnType(8-13)packages/traceloop-sdk/traceloop/sdk/datasets/column.py (1)
Column(11-77)packages/traceloop-sdk/traceloop/sdk/datasets/row.py (1)
Row(10-49)
packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (6)
packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py (3)
Attachment(24-178)AttachmentReference(230-288)ExternalAttachment(181-227)packages/traceloop-sdk/traceloop/sdk/datasets/base.py (1)
BaseDatasetEntity(6-23)packages/traceloop-sdk/traceloop/sdk/datasets/column.py (1)
Column(11-77)packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)
Dataset(20-112)packages/traceloop-sdk/traceloop/sdk/datasets/model.py (4)
ColumnType(8-13)DatasetMetadata(151-159)FileCellType(16-20)FileStorageType(23-25)packages/traceloop-sdk/traceloop/sdk/datasets/row.py (1)
Row(10-49)
packages/traceloop-sdk/tests/datasets/test_datasets_operations.py (2)
packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)
Dataset(20-112)packages/traceloop-sdk/traceloop/sdk/datasets/model.py (1)
DatasetMetadata(151-159)
🪛 Checkov (3.2.334)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_file_attachments_mocked.yaml
[high] 86-87: AWS Access Key
(CKV_SECRET_2)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_in_memory_attachment.yaml
[high] 85-86: AWS Access Key
(CKV_SECRET_2)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml
[high] 85-86: AWS Access Key
(CKV_SECRET_2)
🪛 Gitleaks (8.29.0)
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_mixed_attachments.yaml
[high] 85-85: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
359-359: Do not catch blind exception: Exception
(BLE001)
packages/sample-app/sample_app/dataset_attachments_example.py
1-1: Shebang is present but file is not executable
(EXE001)
350-350: Do not catch blind exception: Exception
(BLE001)
packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py
44-44: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
96-96: Avoid specifying long messages outside the exception class
(TRY003)
99-99: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Create your own exception
(TRY002)
132-132: Avoid specifying long messages outside the exception class
(TRY003)
138-138: Create your own exception
(TRY002)
138-138: Avoid specifying long messages outside the exception class
(TRY003)
150-150: Probable use of requests call without timeout
(S113)
173-173: Probable use of requests call without timeout
(S113)
176-176: Consider moving this statement to an else block
(TRY300)
177-177: Do not catch blind exception: Exception
(BLE001)
220-220: Create your own exception
(TRY002)
220-220: Avoid specifying long messages outside the exception class
(TRY003)
260-260: Create your own exception
(TRY002)
260-260: Avoid specifying long messages outside the exception class
(TRY003)
261-261: Probable use of requests call without timeout
(S113)
packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py
17-29: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (16)
packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py (1)
4-13: Import path realignment totraceloop.sdk.datasets.modellooks correctThe updated import source matches the new datasets model module and preserves behavior of
Datasetmethods that depend on these types.packages/traceloop-sdk/tests/datasets/test_create_dataset.py (1)
12-12: Dataset import updated to new namespaceImporting
Datasetfromtraceloop.sdk.datasets.datasetaligns these tests with the refactored public datasets namespace; no behavioral change to the tests themselves.packages/traceloop-sdk/tests/datasets/test_datasets_operations.py (1)
2-3: Operations tests correctly switched to new datasets module pathsUsing
traceloop.sdk.datasets.dataset.Datasetandtraceloop.sdk.datasets.model.DatasetMetadatamatches the reorganized SDK layout and keeps the tests in sync with the public API.packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_without_attachments.yaml (1)
1-62: Cassette for dataset without attachments looks clean and compliantThe interaction records the dataset-create call without exposing API keys, auth headers, or PII; header/body content is test-only and appropriate for committing.
packages/traceloop-sdk/tests/datasets/cassettes/test_dataset_with_attachments/test_create_dataset_with_external_attachments.yaml (1)
1-245: External-attachments cassette is free of secrets and looks goodThe recorded interactions only contain generic external URLs and standard HTTP metadata, with no API keys or sensitive headers; this cassette is safe to keep as-is.
packages/sample-app/sample_app/dataset_example.py (1)
10-10: Sample app now targets the newtraceloop.sdk.datasetsnamespaceSwitching
Dataset,ColumnType,Column, andRowto import fromtraceloop.sdk.datasetskeeps the example aligned with the refactored public API, without changing its runtime behavior.packages/traceloop-sdk/tests/datasets/test_dataset_with_attachments.py (5)
1-18: LGTM! Clean test file structure with appropriate imports.The test module is well-organized with clear separation of concerns. Imports are properly structured from the public API surface (
traceloop.sdk.datasets) and model definitions.
20-93: LGTM! Comprehensive test for external attachments.The test properly validates the external attachment workflow, including storage type verification, URL preservation, file type mapping, and graceful handling of null values.
165-221: LGTM! Good test structure with proper cleanup pattern.The
try/finallyblock ensures temp files are cleaned up even if assertions fail. The test properly validates mixed attachment types and null handling.
224-260: LGTM! Clean test for in-memory attachments.Properly validates the in-memory data attachment workflow without needing temp file cleanup.
263-288: LGTM! Important regression test for backward compatibility.Ensures that the enhanced
create()method works correctly for datasets without attachments, maintaining backward compatibility.packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4)
2-26: LGTM! Proper logging setup and import organization.Logger initialization follows Python conventions and the imports are cleanly organized for the new attachment functionality.
64-106: LGTM! Well-designed create method with attachment support.The three-step pattern (extract → clean → create → process) is clean and maintains backward compatibility. The docstring with example usage is helpful.
262-279: LGTM! Clean extraction logic.The helper correctly identifies and collects attachment objects from row values.
281-314: LGTM! Proper request preparation with immutability.The method correctly creates a deep copy to avoid mutating the original request while replacing attachment objects with
Nonefor the initial API call.packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py (1)
1-29: ****The review comment's premise is unfounded. Public documentation shows that users access datasets through the client instance (e.g.,
client.datasets.from_csv(),client.datasets.from_dataframe(),client.datasets.get_by_slug()), not by importing theDatasetsclass directly. TheDatasetsclass is an internal implementation detail accessed via the client facade pattern, not intended for direct user imports. Exporting it would create an undocumented parallel API path without corresponding documentation or support.Likely an incorrect or invalid review comment.
… external URL linking capabilities
feat(instrumentation): ...orfix(instrumentation): ....Important
Adds support for file attachments in datasets, including local uploads and external URL linking, with updates to models, tests, and examples.
create()indatasets.pyto handleAttachmentandExternalAttachmentobjects.AttachmentandExternalAttachmentclasses inattachment.pyfor handling file uploads and external URLs.FileCellType,FileStorageType, and related models tomodel.pyfor handling file metadata.Datasetclass indataset.pyto support new attachment features.test_dataset_with_attachments.py.test_create_dataset.pyandtest_datasets_operations.py.datasettodatasetsin various test paths and imports.dataset_attachments_example.py.This description was created by
for fb871d9. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.
Fixes TLP-1196