Skip to content

feat: Add batch functionality#15

Merged
rafalzawadzki merged 6 commits intosupadata-ai:mainfrom
onyedikachi-david:feature/batch-functionality
Apr 24, 2025
Merged

feat: Add batch functionality#15
rafalzawadzki merged 6 commits intosupadata-ai:mainfrom
onyedikachi-david:feature/batch-functionality

Conversation

@onyedikachi-david
Copy link
Copy Markdown
Contributor

@onyedikachi-david onyedikachi-david commented Apr 8, 2025

Fixes: #14
/claim #14

Summary by CodeRabbit

  • New Features

    • Introduced batch processing capabilities to efficiently retrieve multiple video transcripts and video metadata.
    • Enabled real-time job tracking with a polling mechanism that provides continuous status updates and detailed statistics.
  • Tests

    • Added comprehensive tests to verify batch job creation, input validations, and accurate result tracking.

@algora-pbc algora-pbc bot mentioned this pull request Apr 8, 2025
6 tasks
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2025

"""

Walkthrough

This update adds new batch processing operations to the Supadata Python SDK. Two endpoints are introduced for retrieving transcripts and video metadata in batch, with a polling mechanism to monitor job status until completion or failure. New data classes are added to represent batch job results, and the YouTube class is extended with new inner classes and properties to support the operations. Relevant tests have also been added to validate the functionality and error handling.

Changes

File Change Summary
README.md Updated documentation to include details of new batch endpoints for transcript and video operations with a polling mechanism for job status.
supadata/types.py Added data classes: BatchJob, BatchResultItem, BatchStats, and BatchResults for representing and processing batch job outcomes.
supadata/youtube.py Extended the YouTube class with new inner classes (_Channel, _Playlist, _Video, _Transcript, _Batch) and properties. Added batch methods for transcript and video operations with validation and error handling.
tests/test_client.py Introduced tests for batch operations including creation, validation (missing/multiple sources), and result retrieval for transcript and video batch jobs.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant YT as YouTube
    participant BS as Batch Service

    U->>YT: Request batch operation (transcript/video)
    YT->>BS: Create Batch Job
    BS-->>YT: Return Job ID
    loop Poll every 5 seconds while status is queued/active
        YT->>BS: Get Batch Job status
        BS-->>YT: Return status (queued/active)
    end
    YT->>BS: Get Batch Results (completed/failed)
    BS-->>YT: Return BatchStats and BatchResultItems
    YT-->>U: Output results or error message
Loading

Assessment against linked issues

Objective Addressed Explanation
Implement three new endpoints (/youtube/transcript/batch, /youtube/video/batch, /youtube/batch/{jobId}) [#14]
Update README with batch operations details [#14]
Add tests validating batch operation functionalities [#14]

Suggested labels

💰 Rewarded

Suggested reviewers

  • rafalzawadzki

Poem

Hop along the code, I sing,
New batch jobs make my heart ring.
Videos and transcripts dance in line,
As tests ensure they work just fine.
With each debug hop under the moon’s glow,
I celebrate our code’s fresh flow! 🐇✨
"""

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
supadata/youtube.py (1)

433-479: _Video class largely duplicates the top-level video method logic.

Consider removing the older YouTube.video() method or consolidating them to limit code duplication and ensure a single source of truth.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c9aa8a and c7e687e.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • supadata/types.py (2 hunks)
  • supadata/youtube.py (5 hunks)
  • tests/test_client.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/test_client.py (3)
supadata/types.py (5)
  • BatchJob (279-286)
  • BatchResultItem (290-303)
  • BatchResults (322-384)
  • BatchStats (307-318)
  • Transcript (26-43)
supadata/youtube.py (9)
  • transcript (36-84)
  • transcript (199-203)
  • batch (206-210)
  • batch (298-354)
  • batch (480-532)
  • batch (586-642)
  • video (134-175)
  • video (192-196)
  • get_batch_results (694-707)
supadata/errors.py (1)
  • SupadataError (8-31)
supadata/youtube.py (3)
supadata/types.py (6)
  • BatchJob (279-286)
  • BatchResults (322-384)
  • YoutubeVideo (167-202)
  • Transcript (26-43)
  • TranscriptChunk (9-22)
  • TranslatedTranscript (47-60)
supadata/errors.py (1)
  • SupadataError (8-31)
supadata/client.py (1)
  • _request (85-124)
🪛 Ruff (0.8.2)
supadata/youtube.py

192-192: Redefinition of unused video from line 134

(F811)


199-199: Redefinition of unused transcript from line 36

(F811)

🔇 Additional comments (14)
supadata/types.py (5)

276-277: No changes necessary.

These blank lines do not impact functionality or readability.


278-287: New BatchJob data class looks solid.

It correctly models the batch job identifier with clear documentation.


289-304: BatchResultItem is well-structured.

Optional fields for transcript, video, and error_code provide flexibility when dealing with partial results.


306-319: BatchStats is implemented cleanly.

The defaults of zero for total, succeeded, and failed are concise and logical.


321-385: BatchResults handles parsing and object construction elegantly.

Your approach to converting dictionary entries into BatchResultItem instances and gracefully handling parsing errors in completed_at exemplifies robust design.

README.md (4)

97-99: Documentation for batch operations added.

Great introduction of a new "Batch Operations" section to guide users on the newly added functionality.


100-110: Transcript batch job usage example is clear and user-friendly.

This snippet concisely demonstrates how to create and start a transcript batch job with optional parameters.


111-117: Video batch job usage example is straightforward.

Showing how to retrieve video metadata in a batch is helpful for users handling large playlists.


118-151: Polling mechanism for batch results is well-explained.

The example thoroughly covers status checks, error handling, and result parsing—making it easy for developers to follow.

tests/test_client.py (2)

21-26: Import statements for new batch-related classes look correct.

No issues with referencing BatchJob, BatchResultItem, BatchResults, and BatchStats.


558-644: Batch tests are comprehensive and robust.

They cover transcript/video batch creation, validation scenarios, and completed/active/failed job results. This suite provides strong coverage and confidence in the new functionality.

supadata/youtube.py (3)

32-34: Lazy instantiation of specialized instances is well-structured.

Deferring creation of _video_instance, _transcript_instance, and _batch_instance until needed can help maintain performance.


205-211: New batch property is a useful entry point.

Exposing _batch_instance through a property keeps the API consistent.


690-708: _Batch class is a solid addition.

get_batch_results correctly returns BatchResults, enabling a clean separation of concerns for batch status retrieval.

…transcript handling

Signed-off-by: David Anyatonwu <davidanyatonwu@gmail.com>
@rafalzawadzki
Copy link
Copy Markdown
Contributor

@onyedikachi-david sorry for taking a bit longer to review this, will try to get it done asap

Copy link
Copy Markdown
Contributor

@rafalzawadzki rafalzawadzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onyedikachi-david thanks for the contribution!

It mostly looks good, but I have one ask about docstrings. Looks like we may have lost the IntelliSense documentation for the following methods:

supadata.youtube.transcript()
supadata.youtube.video()
supadata.youtube.channel()
supadata.youtube.playlist()

Screenshots to illustrate what I mean:
CleanShot 2025-04-12 at 10  48 19@2x
CleanShot 2025-04-12 at 10  48 29@2x

That's likely because of namespaces (each of the above has a one level deeper function). It's a similar problem we had with JSDocs in supadata/js.

Can you look into this? At least for the new methods you've added in this PR.

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Looking into them shortly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
supadata/youtube.py (3)

632-679: Well-implemented translate method.

The translate method in the _Transcript class properly handles translation requests and error cases, with good defaults for missing fields.

Note that this duplicates the top-level translate method logic, which was previously flagged in reviews. Consider having one implementation call the other to avoid future maintenance issues.


203-216: ⚠️ Potential issue

Name conflict with existing video method.

The property video conflicts with the existing method of the same name at line 127. This was previously flagged and marked as addressed in commit 07076bc, but still appears in the code.

- @property
- def video(self) -> "_Video":
+ @property
+ def video_ops(self) -> "_Video":
🧰 Tools
🪛 Ruff (0.8.2)

203-203: Redefinition of unused video from line 127

(F811)


219-232: ⚠️ Potential issue

Name conflict with existing transcript method.

The property transcript conflicts with the existing method of the same name at line 85. This was previously flagged and marked as addressed in commit 07076bc, but still appears in the code.

- @property
- def transcript(self) -> "_Transcript":
+ @property
+ def transcript_ops(self) -> "_Transcript":
🧰 Tools
🪛 Ruff (0.8.2)

219-219: Redefinition of unused transcript from line 85

(F811)

🧹 Nitpick comments (1)
supadata/youtube.py (1)

460-544: Comprehensive _Video class implementation.

The _Video class is well-implemented with:

  • Proper initialization
  • Duplicate functionality of the top-level video method in call
  • A new batch method for batch operations
  • Good docstrings and error handling

However, note that the call method duplicates logic from the top-level video method. While this works, it creates maintenance overhead if one changes and the other doesn't.

Consider having the top-level video method call this implementation to maintain a single source of truth:

def video(self, id: str) -> YoutubeVideo:
-    response: dict = self._request("GET", "/youtube/video", params={"id": id})
-    
-    # Safely convert upload_date with fallback
-    try:
-        uploaded_time = datetime.fromisoformat(response.pop("upload_date", datetime.now().isoformat()))
-    except (ValueError, TypeError):
-        uploaded_time = datetime.now()
-    
-    # Ensure all required fields have defaults
-    defaults = {
-        "id": id,
-        "title": "",
-        "description": "",
-        "duration": 0,
-        "channel": {"id": "", "name": ""},
-        "tags": [],
-        "thumbnail": "",
-        "view_count": 0,
-        "like_count": 0,
-        "transcript_languages": [],
-    }
-    
-    # Apply defaults for any missing fields
-    for key, default_value in defaults.items():
-        if key not in response:
-            response[key] = default_value
-        elif key == "channel" and not isinstance(response[key], dict):
-            response[key] = defaults[key]
-
-    return YoutubeVideo(**response, uploaded_date=uploaded_time)
+    # Delegate to the _Video class
+    if self._video_instance is None:
+        self._video_instance = self._Video(self)
+    return self._video_instance(id)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07076bc and 478c068.

📒 Files selected for processing (1)
  • supadata/youtube.py (9 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
supadata/youtube.py (3)
supadata/types.py (6)
  • BatchJob (279-286)
  • BatchResults (322-384)
  • YoutubeVideo (167-202)
  • Transcript (26-43)
  • TranscriptChunk (9-22)
  • TranslatedTranscript (47-60)
supadata/errors.py (1)
  • SupadataError (8-31)
supadata/client.py (1)
  • _request (85-124)
🪛 Ruff (0.8.2)
supadata/youtube.py

203-203: Redefinition of unused video from line 127

(F811)


219-219: Redefinition of unused transcript from line 85

(F811)

🔇 Additional comments (9)
supadata/youtube.py (9)

15-16: Good addition of batch processing types.

The BatchJob and BatchResults types have been correctly imported to support the new batch functionality.


32-34: LGTM! Good lazy initialization setup.

The instance variables for _video_instance, _transcript_instance, and _batch_instance are properly initialized as None, following the same pattern as the existing instance variables.


36-84: Well-structured validation method.

The _validate_batch_sources method provides robust validation for batch operations:

  • Ensures exactly one source is specified
  • Validates the limit parameter
  • Returns properly formatted payload
  • Includes clear error messages

Good job extracting this common logic into a shared method as recommended in the previous review.


101-104: Good method delegation pattern.

The transcript method now properly delegates to the _Transcript.call method, which enhances modularity.


122-125: Good method delegation pattern.

The translate method now properly delegates to the _Transcript.translate method, which enhances modularity.


234-248: Well-designed batch property.

The batch property is well-implemented with good documentation and examples.


344-375: Good batch functionality implementation in _Channel class.

The batch method is well-structured and utilizes the shared _validate_batch_sources method for input validation.


545-631: Comprehensive _Transcript class implementation with batch functionality.

The _Transcript class is well-implemented with:

  • Proper initialization
  • Methods for getting transcripts and translations
  • A batch method that properly utilizes the shared validation method
  • Good error handling

The implementation is consistent with the previous design patterns and follows good practices.


680-700: Well-designed _Batch class.

The _Batch class provides a clean interface for retrieving batch results with:

  • Good documentation
  • Simple and focused method design
  • Proper error handling

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

Done @rafalzawadzki

@rafalzawadzki
Copy link
Copy Markdown
Contributor

Done @rafalzawadzki

hmm I see lots of changes in the commit but when I try it in my VSCode it still doesn't work? Can you show me how it looks for you, maybe it's something with my IDE?

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

I am using Zed editor:

Screenshot 2025-04-14 at 00 55 43 Screenshot 2025-04-14 at 00 57 00 Screenshot 2025-04-14 at 00 57 29 Screenshot 2025-04-14 at 00 58 03

@rafalzawadzki
Copy link
Copy Markdown
Contributor

sorry for getting back so late again!

and thanks for sharing the screenshots from your IDE. They show the problem too, in fact.

For example, youtube.transcript() is not a namespace, but an actual function. It does not return a transcript handler, but a transcript.

If you're using an AI to write this code, then the LLM very likely gets this idea wrong.

Technically these are namespaces, but in practice we're using them sometimes as namespace, sometimes as function. And we always want a docstring as if we're dealing with a function, never a namespace.

Now in docstring:
youtube.channel() is a namespace (wrong)
youtube.channel.videos() is a function (correct)

What we want in docstring:
youtube.channel() is a function
youtube.channel.videos() is a function

The above should apply to all functions of the API.

Because this task seems to expand on functions that may have been incorrectly documented in the past, I'll add an extra $20 tip for completing this PR.

…for channel, playlist, video, and transcript operations

- Introduced private classes (_Channel, _Playlist, _Video, _Transcript) for better organization of YouTube operations.
- Updated methods to handle metadata retrieval and batch processing for channels, playlists, and videos.
- Enhanced error handling and default value assignments in API responses.
- Improved documentation for clarity on method usage and parameters.

Signed-off-by: David Anyatonwu <davidanyatonwu@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
supadata/youtube.py (3)

315-319: Fix non‑PEP‑8 indentation inside _Transcript.translate

Line 317 starts with 13 spaces, breaking the 4‑space convention and causing mismatched‑indent visuals in many editors.

-             processed_content = [
+            processed_content = [

(Only whitespace changed.)


511-561: Cache helper namespace objects to reduce repeated allocations

Each access to properties like .channel, .video, .transcript, … returns a new helper object.
For hot paths (e.g. polling in a tight loop) this Needlessly creates dozens of identical objects.

A lightweight memoisation keeps the ergonomic property API while avoiding churn:

 class YouTube:
@@
-    @property
-    def channel(self) -> _Channel:
+    @property
+    def channel(self) -> _Channel:
         ...
-        return _Channel(self) # Return a new instance each time
+        if not hasattr(self, "_chan"):
+            self._chan = _Channel(self)
+        return self._chan

Apply the same pattern to the other namespace properties.


456-460: Surface API errors early by validating expected keys in BatchJob payload

_create_batch_job blindly forwards the entire JSON into BatchJob(**response).
If the backend ever changes field names, you’ll get a cryptic TypeError: __init__() got an unexpected keyword argument ....

Quick guard:

 def _create_batch_job(...):
     response = self._request("POST", endpoint, json=payload)
-    return BatchJob(**response)
+    try:
+        return BatchJob(**response)
+    except TypeError as exc:
+        raise SupadataError(
+            error="invalid-response",
+            message="Unexpected batch job schema",
+            details=str(exc),
+        ) from exc
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e03cc8 and 49b3516.

📒 Files selected for processing (1)
  • supadata/youtube.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
supadata/youtube.py

387-387: Redefinition of unused YouTube from line 20

(F811)

Comment on lines +19 to 22
# Forward declare YouTube for type hints in private classes
class YouTube:
"""YouTube namespace for Supadata operations."""
pass

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove runtime re‑definition of YouTube to silence Ruff F811 and avoid subtle type/runtime bugs

The dummy forward declaration creates a second YouTube class at runtime that shadows the real one defined later.
Besides tripping Ruff (F811 redefinition), it can also break isinstance()/issubclass() checks performed before the real class is imported.

A clean, type‑checker‑only forward declaration is enough:

-from typing import Any, Callable, Dict, List, Optional, Literal
+from typing import Any, Callable, Dict, List, Optional, Literal, TYPE_CHECKING
...
-# Forward declare YouTube for type hints in private classes
-class YouTube:
-    pass
+# Forward declaration for static type checkers only
+if TYPE_CHECKING:  # pragma: no cover
+    class YouTube: ...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Forward declare YouTube for type hints in private classes
class YouTube:
"""YouTube namespace for Supadata operations."""
pass
-from typing import Any, Callable, Dict, List, Optional, Literal
+from typing import Any, Callable, Dict, List, Optional, Literal, TYPE_CHECKING
-# Forward declare YouTube for type hints in private classes
-class YouTube:
- pass
+# Forward declaration for static type checkers only
+if TYPE_CHECKING: # pragma: no cover
+ class YouTube: ...

Comment on lines +443 to +451
if limit is not None:
self._validate_limit(limit)
# Only add limit if source is playlist or channel
if playlist_id or channel_id:
payload["limit"] = limit
elif video_ids and limit is not None:
# Optionally warn or ignore limit if only video_ids are provided?
# Current behavior: limit is ignored by API if only video_ids are sent.
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Re‑evaluate silent limit drop when video_ids is supplied

_validate_batch_sources ignores limit whenever video_ids is the sole source (lines 447‑451).
That behaviour is opaque to callers and diverges from how playlist_id/channel_id are handled. Consider:

  1. Raising SupadataError if both video_ids and limit are provided.
  2. Or enforcing/truncating: payload["videoIds"] = video_ids[:limit].

Explicitly documenting (or logging) the choice prevents confusing client code.


🏁 Script executed:

#!/bin/bash
set -e
# Find the definition of `_validate_batch_sources` and inspect surrounding code for context
rg -n "def _validate_batch_sources" supadata/youtube.py
# Show lines around the limit handling (lines ~443–451) to get full context
sed -n '430,500p' supadata/youtube.py

Length of output: 3046


Explicitly handle limit when video_ids is provided

Currently, _validate_batch_sources silently ignores limit whenever only video_ids are supplied (supadata/youtube.py, lines 443–451). This diverges from how limit is applied for playlist_id/channel_id and can lead to surprising client behavior.

Please update this section to make the behavior explicit. For example:

• File: supadata/youtube.py, method _validate_batch_sources (around lines 443–451)
– Option A: Raise an error when limit is provided alongside video_ids
diff - elif video_ids and limit is not None: - pass + elif video_ids and limit is not None: + raise SupadataError( + error="invalid-argument", + message="`limit` cannot be used with `video_ids` directly. " + "Either paginate via playlist/channel or omit `limit`." + )
– Option B: Enforce the limit by slicing video_ids
diff - elif video_ids and limit is not None: - pass + elif video_ids and limit is not None: + payload["videoIds"] = video_ids[:limit]

• Update the public helper docstrings (_create_video_batch, _create_transcript_batch) to document the chosen behavior so callers know whether limit will be enforced or rejected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if limit is not None:
self._validate_limit(limit)
# Only add limit if source is playlist or channel
if playlist_id or channel_id:
payload["limit"] = limit
elif video_ids and limit is not None:
# Optionally warn or ignore limit if only video_ids are provided?
# Current behavior: limit is ignored by API if only video_ids are sent.
pass
if limit is not None:
self._validate_limit(limit)
# Only add limit if source is playlist or channel
if playlist_id or channel_id:
payload["limit"] = limit
elif video_ids and limit is not None:
raise SupadataError(
error="invalid-argument",
message="`limit` cannot be used with `video_ids` directly. "
"Either paginate via playlist/channel or omit `limit`."
)

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

Thanks for the review, I have taken a different approach, please check, it should now work. Thanks for the tip. @rafalzawadzki

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

If we want to have this as you said:

What we want in docstring:
youtube.channel() is a function
youtube.channel.videos() is a function

It will be in this form rather:
youtube.channel().videos() # This will break the api spec.

@rafalzawadzki
Copy link
Copy Markdown
Contributor

rafalzawadzki commented Apr 20, 2025

If we want to have this as you said:

What we want in docstring:
youtube.channel() is a function
youtube.channel.videos() is a function

It will be in this form rather: youtube.channel().videos() # This will break the api spec.

I know! But we just want the docstring to document it like it was a function, but it shouldn't be one technically.

I just tried your latest attempt and it seems to be working! That's what I wanted to accomplish.

However, I noticed that while the docstrings are in correct formats, all of them give wrong examples, like this one:
CleanShot 2025-04-20 at 9  38 07@2x

Notice that the function has named arguments, while the docstring example shows arguments without names? Is that a Python thing? Sorry I'm not very familiar with the language so maybe that's expected?

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

Notice that the function has named arguments, while the docstring example shows arguments without names? Is that a Python thing? Sorry I'm not very familiar with the language so maybe that's expected?

No it's not a python thing, it's an oversight, my bad. I'm fixing asap

@onyedikachi-david
Copy link
Copy Markdown
Contributor Author

Done @rafalzawadzki

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
supadata/youtube.py (2)

19-21: ⚠️ Potential issue

Fix YouTube class forward declaration to prevent redefinition error.

The current forward declaration creates a second YouTube class at runtime that is later redefined, triggering the Ruff F811 warning. This can potentially break isinstance()/issubclass() checks.

Use a proper type-checking-only forward declaration:

-from typing import Any, Callable, Dict, List, Optional, Literal
+from typing import Any, Callable, Dict, List, Optional, Literal, TYPE_CHECKING

-# Forward declare YouTube for type hints in private classes
-class YouTube:
-    pass
+# Forward declaration for static type checkers only
+if TYPE_CHECKING:
+    class YouTube: ...

443-451: 🛠️ Refactor suggestion

Explicitly handle limit parameter when video_ids is provided.

Currently, when only video_ids is provided with a limit, the limit is silently ignored. This diverges from how limit is handled with playlist_id or channel_id and could lead to unexpected behavior.

Consider either enforcing the limit or raising an error for clarity:

 if playlist_id or channel_id:
     payload["limit"] = limit
 elif video_ids and limit is not None:
-    # Optionally warn or ignore limit if only video_ids are provided?
-    # Current behavior: limit is ignored by API if only video_ids are sent.
-    pass
+    # Option 1: Enforce the limit by slicing video_ids
+    payload["videoIds"] = video_ids[:limit]
+    
+    # Option 2: Alternatively, raise an error for clarity
+    # raise SupadataError(
+    #    error="invalid-request", 
+    #    message="Limit cannot be used with video_ids directly.",
+    #    details="Either paginate via playlist/channel or omit the limit parameter."
+    # )
🧹 Nitpick comments (1)
supadata/youtube.py (1)

510-596: Fix docstring examples to match actual function signatures.

The docstring examples show function calls with named arguments (e.g., supadata.youtube.channel(id="UC_...")) while the actual functions use positional parameters (e.g., def __call__(self, id: str)).

Update the examples to match the actual function signatures:

- channel_info = supadata.youtube.channel(id="UC_...")
+ channel_info = supadata.youtube.channel("UC_...")

- videos = supadata.youtube.channel.videos(id="UC_...")
+ videos = supadata.youtube.channel.videos("UC_...")

- batch_job = supadata.youtube.channel.batch(channel_id="UC_...")
+ batch_job = supadata.youtube.channel.batch(channel_id="UC_...")

- playlist_info = supadata.youtube.playlist(id="PL_...")
+ playlist_info = supadata.youtube.playlist("PL_...")

- videos = supadata.youtube.playlist.videos(id="PL_...")
+ videos = supadata.youtube.playlist.videos("PL_...")

- video_info = supadata.youtube.video(id="dQw4w9WgXcQ")
+ video_info = supadata.youtube.video("dQw4w9WgXcQ")

- transcript = supadata.youtube.transcript(video_id="dQw4w9WgXcQ")
+ transcript = supadata.youtube.transcript("dQw4w9WgXcQ")

Note that batch methods correctly use named parameters in the examples.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49b3516 and e6ce2c5.

📒 Files selected for processing (1)
  • supadata/youtube.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
supadata/youtube.py (3)
supadata/types.py (9)
  • BatchJob (279-286)
  • BatchResults (322-384)
  • YoutubeChannel (206-227)
  • VideoIds (260-275)
  • YoutubePlaylist (231-256)
  • YoutubeVideo (167-202)
  • Transcript (26-43)
  • TranscriptChunk (9-22)
  • TranslatedTranscript (47-60)
supadata/client.py (1)
  • _request (85-124)
supadata/errors.py (1)
  • SupadataError (8-31)
🪛 Ruff (0.8.2)
supadata/youtube.py

387-387: Redefinition of unused YouTube from line 20

(F811)

🔇 Additional comments (6)
supadata/youtube.py (6)

27-119: LGTM! Well-structured _Channel class with clean API.

The implementation correctly handles channel metadata retrieval, video listing, and batch job creation. The methods properly validate inputs and provide sensible defaults for missing fields in API responses.


121-183: LGTM! Well-implemented _Playlist class with proper error handling.

The _Playlist class properly handles playlist metadata retrieval, video listing, and includes robust date parsing with fallbacks. Default values are correctly applied for missing fields.


185-250: LGTM! Well-structured _Video class with robust metadata handling.

The implementation correctly handles video metadata retrieval and batch job creation. The date parsing includes appropriate error handling and the default values ensure a consistent interface even with incomplete API responses.


252-359: LGTM! Comprehensive _Transcript class implementation.

The _Transcript class properly handles transcript retrieval, translation, and batch operations. The implementation includes appropriate handling of response formats and ensures defaults for optional fields.


360-380: LGTM! Simple and focused _Batch class implementation.

The _Batch class correctly handles batch result retrieval and properly delegates parsing to the BatchResults constructor.


387-596: Great overall architecture with modular facade pattern.

The code effectively uses a facade pattern with private namespace classes to organize YouTube API operations. The public API is clean with properties returning fresh instances of private classes, maintaining statelessness while providing an intuitive interface.

🧰 Tools
🪛 Ruff (0.8.2)

387-387: Redefinition of unused YouTube from line 20

(F811)

@rafalzawadzki
Copy link
Copy Markdown
Contributor

great! looks good 🚀

@rafalzawadzki rafalzawadzki merged commit 434d557 into supadata-ai:main Apr 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for batch operations

2 participants