feat: Add batch functionality#15
Conversation
…tch property, types, and documentation
|
""" WalkthroughThis update adds new batch processing operations to the Supadata Python SDK. Two endpoints are introduced for retrieving transcripts and video metadata in batch, with a polling mechanism to monitor job status until completion or failure. New data classes are added to represent batch job results, and the YouTube class is extended with new inner classes and properties to support the operations. Relevant tests have also been added to validate the functionality and error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant YT as YouTube
participant BS as Batch Service
U->>YT: Request batch operation (transcript/video)
YT->>BS: Create Batch Job
BS-->>YT: Return Job ID
loop Poll every 5 seconds while status is queued/active
YT->>BS: Get Batch Job status
BS-->>YT: Return status (queued/active)
end
YT->>BS: Get Batch Results (completed/failed)
BS-->>YT: Return BatchStats and BatchResultItems
YT-->>U: Output results or error message
Assessment against linked issues
Suggested labels
Suggested reviewers
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
supadata/youtube.py (1)
433-479:_Videoclass largely duplicates the top-levelvideomethod logic.Consider removing the older
YouTube.video()method or consolidating them to limit code duplication and ensure a single source of truth.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
README.md(1 hunks)supadata/types.py(2 hunks)supadata/youtube.py(5 hunks)tests/test_client.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/test_client.py (3)
supadata/types.py (5)
BatchJob(279-286)BatchResultItem(290-303)BatchResults(322-384)BatchStats(307-318)Transcript(26-43)supadata/youtube.py (9)
transcript(36-84)transcript(199-203)batch(206-210)batch(298-354)batch(480-532)batch(586-642)video(134-175)video(192-196)get_batch_results(694-707)supadata/errors.py (1)
SupadataError(8-31)
supadata/youtube.py (3)
supadata/types.py (6)
BatchJob(279-286)BatchResults(322-384)YoutubeVideo(167-202)Transcript(26-43)TranscriptChunk(9-22)TranslatedTranscript(47-60)supadata/errors.py (1)
SupadataError(8-31)supadata/client.py (1)
_request(85-124)
🪛 Ruff (0.8.2)
supadata/youtube.py
192-192: Redefinition of unused video from line 134
(F811)
199-199: Redefinition of unused transcript from line 36
(F811)
🔇 Additional comments (14)
supadata/types.py (5)
276-277: No changes necessary.These blank lines do not impact functionality or readability.
278-287: NewBatchJobdata class looks solid.It correctly models the batch job identifier with clear documentation.
289-304:BatchResultItemis well-structured.Optional fields for transcript, video, and error_code provide flexibility when dealing with partial results.
306-319:BatchStatsis implemented cleanly.The defaults of zero for total, succeeded, and failed are concise and logical.
321-385:BatchResultshandles parsing and object construction elegantly.Your approach to converting dictionary entries into
BatchResultIteminstances and gracefully handling parsing errors incompleted_atexemplifies robust design.README.md (4)
97-99: Documentation for batch operations added.Great introduction of a new "Batch Operations" section to guide users on the newly added functionality.
100-110: Transcript batch job usage example is clear and user-friendly.This snippet concisely demonstrates how to create and start a transcript batch job with optional parameters.
111-117: Video batch job usage example is straightforward.Showing how to retrieve video metadata in a batch is helpful for users handling large playlists.
118-151: Polling mechanism for batch results is well-explained.The example thoroughly covers status checks, error handling, and result parsing—making it easy for developers to follow.
tests/test_client.py (2)
21-26: Import statements for new batch-related classes look correct.No issues with referencing
BatchJob,BatchResultItem,BatchResults, andBatchStats.
558-644: Batch tests are comprehensive and robust.They cover transcript/video batch creation, validation scenarios, and completed/active/failed job results. This suite provides strong coverage and confidence in the new functionality.
supadata/youtube.py (3)
32-34: Lazy instantiation of specialized instances is well-structured.Deferring creation of
_video_instance,_transcript_instance, and_batch_instanceuntil needed can help maintain performance.
205-211: Newbatchproperty is a useful entry point.Exposing
_batch_instancethrough a property keeps the API consistent.
690-708:_Batchclass is a solid addition.
get_batch_resultscorrectly returnsBatchResults, enabling a clean separation of concerns for batch status retrieval.
…transcript handling Signed-off-by: David Anyatonwu <davidanyatonwu@gmail.com>
|
@onyedikachi-david sorry for taking a bit longer to review this, will try to get it done asap |
rafalzawadzki
left a comment
There was a problem hiding this comment.
@onyedikachi-david thanks for the contribution!
It mostly looks good, but I have one ask about docstrings. Looks like we may have lost the IntelliSense documentation for the following methods:
supadata.youtube.transcript()
supadata.youtube.video()
supadata.youtube.channel()
supadata.youtube.playlist()
Screenshots to illustrate what I mean:


That's likely because of namespaces (each of the above has a one level deeper function). It's a similar problem we had with JSDocs in supadata/js.
Can you look into this? At least for the new methods you've added in this PR.
|
Thanks for the review. Looking into them shortly. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
supadata/youtube.py (3)
632-679: Well-implemented translate method.The translate method in the _Transcript class properly handles translation requests and error cases, with good defaults for missing fields.
Note that this duplicates the top-level translate method logic, which was previously flagged in reviews. Consider having one implementation call the other to avoid future maintenance issues.
203-216:⚠️ Potential issueName conflict with existing
videomethod.The property
videoconflicts with the existing method of the same name at line 127. This was previously flagged and marked as addressed in commit 07076bc, but still appears in the code.- @property - def video(self) -> "_Video": + @property + def video_ops(self) -> "_Video":🧰 Tools
🪛 Ruff (0.8.2)
203-203: Redefinition of unused
videofrom line 127(F811)
219-232:⚠️ Potential issueName conflict with existing
transcriptmethod.The property
transcriptconflicts with the existing method of the same name at line 85. This was previously flagged and marked as addressed in commit 07076bc, but still appears in the code.- @property - def transcript(self) -> "_Transcript": + @property + def transcript_ops(self) -> "_Transcript":🧰 Tools
🪛 Ruff (0.8.2)
219-219: Redefinition of unused
transcriptfrom line 85(F811)
🧹 Nitpick comments (1)
supadata/youtube.py (1)
460-544: Comprehensive _Video class implementation.The _Video class is well-implemented with:
- Proper initialization
- Duplicate functionality of the top-level video method in call
- A new batch method for batch operations
- Good docstrings and error handling
However, note that the call method duplicates logic from the top-level video method. While this works, it creates maintenance overhead if one changes and the other doesn't.
Consider having the top-level video method call this implementation to maintain a single source of truth:
def video(self, id: str) -> YoutubeVideo: - response: dict = self._request("GET", "/youtube/video", params={"id": id}) - - # Safely convert upload_date with fallback - try: - uploaded_time = datetime.fromisoformat(response.pop("upload_date", datetime.now().isoformat())) - except (ValueError, TypeError): - uploaded_time = datetime.now() - - # Ensure all required fields have defaults - defaults = { - "id": id, - "title": "", - "description": "", - "duration": 0, - "channel": {"id": "", "name": ""}, - "tags": [], - "thumbnail": "", - "view_count": 0, - "like_count": 0, - "transcript_languages": [], - } - - # Apply defaults for any missing fields - for key, default_value in defaults.items(): - if key not in response: - response[key] = default_value - elif key == "channel" and not isinstance(response[key], dict): - response[key] = defaults[key] - - return YoutubeVideo(**response, uploaded_date=uploaded_time) + # Delegate to the _Video class + if self._video_instance is None: + self._video_instance = self._Video(self) + return self._video_instance(id)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
supadata/youtube.py(9 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
supadata/youtube.py (3)
supadata/types.py (6)
BatchJob(279-286)BatchResults(322-384)YoutubeVideo(167-202)Transcript(26-43)TranscriptChunk(9-22)TranslatedTranscript(47-60)supadata/errors.py (1)
SupadataError(8-31)supadata/client.py (1)
_request(85-124)
🪛 Ruff (0.8.2)
supadata/youtube.py
203-203: Redefinition of unused video from line 127
(F811)
219-219: Redefinition of unused transcript from line 85
(F811)
🔇 Additional comments (9)
supadata/youtube.py (9)
15-16: Good addition of batch processing types.The BatchJob and BatchResults types have been correctly imported to support the new batch functionality.
32-34: LGTM! Good lazy initialization setup.The instance variables for _video_instance, _transcript_instance, and _batch_instance are properly initialized as None, following the same pattern as the existing instance variables.
36-84: Well-structured validation method.The _validate_batch_sources method provides robust validation for batch operations:
- Ensures exactly one source is specified
- Validates the limit parameter
- Returns properly formatted payload
- Includes clear error messages
Good job extracting this common logic into a shared method as recommended in the previous review.
101-104: Good method delegation pattern.The transcript method now properly delegates to the _Transcript.call method, which enhances modularity.
122-125: Good method delegation pattern.The translate method now properly delegates to the _Transcript.translate method, which enhances modularity.
234-248: Well-designed batch property.The batch property is well-implemented with good documentation and examples.
344-375: Good batch functionality implementation in _Channel class.The batch method is well-structured and utilizes the shared _validate_batch_sources method for input validation.
545-631: Comprehensive _Transcript class implementation with batch functionality.The _Transcript class is well-implemented with:
- Proper initialization
- Methods for getting transcripts and translations
- A batch method that properly utilizes the shared validation method
- Good error handling
The implementation is consistent with the previous design patterns and follows good practices.
680-700: Well-designed _Batch class.The _Batch class provides a clean interface for retrieving batch results with:
- Good documentation
- Simple and focused method design
- Proper error handling
|
Done @rafalzawadzki |
hmm I see lots of changes in the commit but when I try it in my VSCode it still doesn't work? Can you show me how it looks for you, maybe it's something with my IDE? |
|
sorry for getting back so late again! and thanks for sharing the screenshots from your IDE. They show the problem too, in fact. For example, If you're using an AI to write this code, then the LLM very likely gets this idea wrong. Technically these are namespaces, but in practice we're using them sometimes as namespace, sometimes as function. And we always want a docstring as if we're dealing with a function, never a namespace. Now in docstring: What we want in docstring: The above should apply to all functions of the API. Because this task seems to expand on functions that may have been incorrectly documented in the past, I'll add an extra $20 tip for completing this PR. |
…for channel, playlist, video, and transcript operations - Introduced private classes (_Channel, _Playlist, _Video, _Transcript) for better organization of YouTube operations. - Updated methods to handle metadata retrieval and batch processing for channels, playlists, and videos. - Enhanced error handling and default value assignments in API responses. - Improved documentation for clarity on method usage and parameters. Signed-off-by: David Anyatonwu <davidanyatonwu@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
supadata/youtube.py (3)
315-319: Fix non‑PEP‑8 indentation inside_Transcript.translateLine 317 starts with 13 spaces, breaking the 4‑space convention and causing mismatched‑indent visuals in many editors.
- processed_content = [ + processed_content = [(Only whitespace changed.)
511-561: Cache helper namespace objects to reduce repeated allocationsEach access to properties like
.channel,.video,.transcript, … returns a new helper object.
For hot paths (e.g. polling in a tight loop) this Needlessly creates dozens of identical objects.A lightweight memoisation keeps the ergonomic property API while avoiding churn:
class YouTube: @@ - @property - def channel(self) -> _Channel: + @property + def channel(self) -> _Channel: ... - return _Channel(self) # Return a new instance each time + if not hasattr(self, "_chan"): + self._chan = _Channel(self) + return self._chanApply the same pattern to the other namespace properties.
456-460: Surface API errors early by validating expected keys inBatchJobpayload
_create_batch_jobblindly forwards the entire JSON intoBatchJob(**response).
If the backend ever changes field names, you’ll get a crypticTypeError: __init__() got an unexpected keyword argument ....Quick guard:
def _create_batch_job(...): response = self._request("POST", endpoint, json=payload) - return BatchJob(**response) + try: + return BatchJob(**response) + except TypeError as exc: + raise SupadataError( + error="invalid-response", + message="Unexpected batch job schema", + details=str(exc), + ) from exc
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
supadata/youtube.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
supadata/youtube.py
387-387: Redefinition of unused YouTube from line 20
(F811)
| # Forward declare YouTube for type hints in private classes | ||
| class YouTube: | ||
| """YouTube namespace for Supadata operations.""" | ||
| pass | ||
|
|
There was a problem hiding this comment.
Remove runtime re‑definition of YouTube to silence Ruff F811 and avoid subtle type/runtime bugs
The dummy forward declaration creates a second YouTube class at runtime that shadows the real one defined later.
Besides tripping Ruff (F811 redefinition), it can also break isinstance()/issubclass() checks performed before the real class is imported.
A clean, type‑checker‑only forward declaration is enough:
-from typing import Any, Callable, Dict, List, Optional, Literal
+from typing import Any, Callable, Dict, List, Optional, Literal, TYPE_CHECKING
...
-# Forward declare YouTube for type hints in private classes
-class YouTube:
- pass
+# Forward declaration for static type checkers only
+if TYPE_CHECKING: # pragma: no cover
+ class YouTube: ...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Forward declare YouTube for type hints in private classes | |
| class YouTube: | |
| """YouTube namespace for Supadata operations.""" | |
| pass | |
| -from typing import Any, Callable, Dict, List, Optional, Literal | |
| +from typing import Any, Callable, Dict, List, Optional, Literal, TYPE_CHECKING | |
| -# Forward declare YouTube for type hints in private classes | |
| -class YouTube: | |
| - pass | |
| +# Forward declaration for static type checkers only | |
| +if TYPE_CHECKING: # pragma: no cover | |
| + class YouTube: ... |
| if limit is not None: | ||
| self._validate_limit(limit) | ||
| # Only add limit if source is playlist or channel | ||
| if playlist_id or channel_id: | ||
| payload["limit"] = limit | ||
| elif video_ids and limit is not None: | ||
| # Optionally warn or ignore limit if only video_ids are provided? | ||
| # Current behavior: limit is ignored by API if only video_ids are sent. | ||
| pass |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Re‑evaluate silent limit drop when video_ids is supplied
_validate_batch_sources ignores limit whenever video_ids is the sole source (lines 447‑451).
That behaviour is opaque to callers and diverges from how playlist_id/channel_id are handled. Consider:
- Raising
SupadataErrorif bothvideo_idsandlimitare provided. - Or enforcing/truncating:
payload["videoIds"] = video_ids[:limit].
Explicitly documenting (or logging) the choice prevents confusing client code.
🏁 Script executed:
#!/bin/bash
set -e
# Find the definition of `_validate_batch_sources` and inspect surrounding code for context
rg -n "def _validate_batch_sources" supadata/youtube.py
# Show lines around the limit handling (lines ~443–451) to get full context
sed -n '430,500p' supadata/youtube.pyLength of output: 3046
Explicitly handle limit when video_ids is provided
Currently, _validate_batch_sources silently ignores limit whenever only video_ids are supplied (supadata/youtube.py, lines 443–451). This diverges from how limit is applied for playlist_id/channel_id and can lead to surprising client behavior.
Please update this section to make the behavior explicit. For example:
• File: supadata/youtube.py, method _validate_batch_sources (around lines 443–451)
– Option A: Raise an error when limit is provided alongside video_ids
diff - elif video_ids and limit is not None: - pass + elif video_ids and limit is not None: + raise SupadataError( + error="invalid-argument", + message="`limit` cannot be used with `video_ids` directly. " + "Either paginate via playlist/channel or omit `limit`." + )
– Option B: Enforce the limit by slicing video_ids
diff - elif video_ids and limit is not None: - pass + elif video_ids and limit is not None: + payload["videoIds"] = video_ids[:limit]
• Update the public helper docstrings (_create_video_batch, _create_transcript_batch) to document the chosen behavior so callers know whether limit will be enforced or rejected.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if limit is not None: | |
| self._validate_limit(limit) | |
| # Only add limit if source is playlist or channel | |
| if playlist_id or channel_id: | |
| payload["limit"] = limit | |
| elif video_ids and limit is not None: | |
| # Optionally warn or ignore limit if only video_ids are provided? | |
| # Current behavior: limit is ignored by API if only video_ids are sent. | |
| pass | |
| if limit is not None: | |
| self._validate_limit(limit) | |
| # Only add limit if source is playlist or channel | |
| if playlist_id or channel_id: | |
| payload["limit"] = limit | |
| elif video_ids and limit is not None: | |
| raise SupadataError( | |
| error="invalid-argument", | |
| message="`limit` cannot be used with `video_ids` directly. " | |
| "Either paginate via playlist/channel or omit `limit`." | |
| ) |
|
Thanks for the review, I have taken a different approach, please check, it should now work. Thanks for the tip. @rafalzawadzki |
|
If we want to have this as you said:
It will be in this form rather: |
No it's not a python thing, it's an oversight, my bad. I'm fixing asap |
|
Done @rafalzawadzki |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
supadata/youtube.py (2)
19-21:⚠️ Potential issueFix YouTube class forward declaration to prevent redefinition error.
The current forward declaration creates a second
YouTubeclass at runtime that is later redefined, triggering the Ruff F811 warning. This can potentially breakisinstance()/issubclass()checks.Use a proper type-checking-only forward declaration:
-from typing import Any, Callable, Dict, List, Optional, Literal +from typing import Any, Callable, Dict, List, Optional, Literal, TYPE_CHECKING -# Forward declare YouTube for type hints in private classes -class YouTube: - pass +# Forward declaration for static type checkers only +if TYPE_CHECKING: + class YouTube: ...
443-451: 🛠️ Refactor suggestionExplicitly handle
limitparameter whenvideo_idsis provided.Currently, when only
video_idsis provided with alimit, the limit is silently ignored. This diverges from howlimitis handled withplaylist_idorchannel_idand could lead to unexpected behavior.Consider either enforcing the limit or raising an error for clarity:
if playlist_id or channel_id: payload["limit"] = limit elif video_ids and limit is not None: - # Optionally warn or ignore limit if only video_ids are provided? - # Current behavior: limit is ignored by API if only video_ids are sent. - pass + # Option 1: Enforce the limit by slicing video_ids + payload["videoIds"] = video_ids[:limit] + + # Option 2: Alternatively, raise an error for clarity + # raise SupadataError( + # error="invalid-request", + # message="Limit cannot be used with video_ids directly.", + # details="Either paginate via playlist/channel or omit the limit parameter." + # )
🧹 Nitpick comments (1)
supadata/youtube.py (1)
510-596: Fix docstring examples to match actual function signatures.The docstring examples show function calls with named arguments (e.g.,
supadata.youtube.channel(id="UC_...")) while the actual functions use positional parameters (e.g.,def __call__(self, id: str)).Update the examples to match the actual function signatures:
- channel_info = supadata.youtube.channel(id="UC_...") + channel_info = supadata.youtube.channel("UC_...") - videos = supadata.youtube.channel.videos(id="UC_...") + videos = supadata.youtube.channel.videos("UC_...") - batch_job = supadata.youtube.channel.batch(channel_id="UC_...") + batch_job = supadata.youtube.channel.batch(channel_id="UC_...") - playlist_info = supadata.youtube.playlist(id="PL_...") + playlist_info = supadata.youtube.playlist("PL_...") - videos = supadata.youtube.playlist.videos(id="PL_...") + videos = supadata.youtube.playlist.videos("PL_...") - video_info = supadata.youtube.video(id="dQw4w9WgXcQ") + video_info = supadata.youtube.video("dQw4w9WgXcQ") - transcript = supadata.youtube.transcript(video_id="dQw4w9WgXcQ") + transcript = supadata.youtube.transcript("dQw4w9WgXcQ")Note that batch methods correctly use named parameters in the examples.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
supadata/youtube.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
supadata/youtube.py (3)
supadata/types.py (9)
BatchJob(279-286)BatchResults(322-384)YoutubeChannel(206-227)VideoIds(260-275)YoutubePlaylist(231-256)YoutubeVideo(167-202)Transcript(26-43)TranscriptChunk(9-22)TranslatedTranscript(47-60)supadata/client.py (1)
_request(85-124)supadata/errors.py (1)
SupadataError(8-31)
🪛 Ruff (0.8.2)
supadata/youtube.py
387-387: Redefinition of unused YouTube from line 20
(F811)
🔇 Additional comments (6)
supadata/youtube.py (6)
27-119: LGTM! Well-structured _Channel class with clean API.The implementation correctly handles channel metadata retrieval, video listing, and batch job creation. The methods properly validate inputs and provide sensible defaults for missing fields in API responses.
121-183: LGTM! Well-implemented _Playlist class with proper error handling.The _Playlist class properly handles playlist metadata retrieval, video listing, and includes robust date parsing with fallbacks. Default values are correctly applied for missing fields.
185-250: LGTM! Well-structured _Video class with robust metadata handling.The implementation correctly handles video metadata retrieval and batch job creation. The date parsing includes appropriate error handling and the default values ensure a consistent interface even with incomplete API responses.
252-359: LGTM! Comprehensive _Transcript class implementation.The _Transcript class properly handles transcript retrieval, translation, and batch operations. The implementation includes appropriate handling of response formats and ensures defaults for optional fields.
360-380: LGTM! Simple and focused _Batch class implementation.The _Batch class correctly handles batch result retrieval and properly delegates parsing to the BatchResults constructor.
387-596: Great overall architecture with modular facade pattern.The code effectively uses a facade pattern with private namespace classes to organize YouTube API operations. The public API is clean with properties returning fresh instances of private classes, maintaining statelessness while providing an intuitive interface.
🧰 Tools
🪛 Ruff (0.8.2)
387-387: Redefinition of unused
YouTubefrom line 20(F811)
|
great! looks good 🚀 |





Fixes: #14
/claim #14
Summary by CodeRabbit
New Features
Tests