Skip to content

feat(youtube): add support for new endpoints#11

Merged
rafalzawadzki merged 2 commits intosupadata-ai:mainfrom
LuizDMM:LuizDMM/issue9
Mar 19, 2025
Merged

feat(youtube): add support for new endpoints#11
rafalzawadzki merged 2 commits intosupadata-ai:mainfrom
LuizDMM:LuizDMM/issue9

Conversation

@LuizDMM
Copy link
Copy Markdown
Contributor

@LuizDMM LuizDMM commented Mar 17, 2025

Fixes #9
/claim #9

Requirements

  • Support for above endpoints according to the documentation and aligned with patterns in this library is added
  • All parameters and returns are typed according to spec
  • README is updated
  • Test cases are added in tests

Implemented the support for all required endpoints, in a pythonic and typed way. Implemented tests using the content from some real requests I made and all the tests are passing.

========================================================================================= test session starts ==========================================================================================
platform linux -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/luizdmm/Projetos/Algora Bounties/supadata-ai-py
configfile: pyproject.toml
plugins: requests-mock-1.12.1
collected 21 items                                                                                                                                                                                     

tests/test_client.py .....................                                                                                                                                                       [100%]

========================================================================================== 21 passed in 0.08s ==========================================================================================

Summary by CodeRabbit

  • New Features

    • Enhanced YouTube integration to let users retrieve comprehensive metadata for videos, channels, and playlists.
    • Introduced functionality for fetching lists of video identifiers from channels and playlists, providing a richer view of YouTube content.
    • Improved output messages to display retrieved information clearly, offering an optimized experience when accessing YouTube data.
  • Bug Fixes

    • Updated error handling for API requests to ensure accurate status codes and messages.
  • Tests

    • Expanded test coverage for YouTube functionalities, including new tests for video, channel, and playlist retrieval.

@algora-pbc algora-pbc bot mentioned this pull request Mar 17, 2025
4 tasks
@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 17, 2025

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2025

Walkthrough

The changes add support for new YouTube endpoints in the Supadata SDK. The implementation includes methods to retrieve metadata for videos, channels, and playlists, as well as lists of video IDs from channels and playlists. The SDK’s public API is updated via new and reintroduced data types and classes. Method signatures now include type hints, and validations are in place. Updates in the README and test suite ensure documentation and coverage for the new endpoints.

Changes

File(s) Change Summary
README.md Added descriptions and examples for new YouTube endpoints (video metadata, channel metadata, channel video IDs, playlist metadata, and playlist video IDs retrieval).
supadata/init.py, supadata/types.py Reintroduced and added data types for YouTube integration. New classes (YoutubeChannelBaseDict, YoutubeVideo, YoutubeChannel, YoutubePlaylist) were added, and existing classes (Scrape, CrawlPage) were updated with additional attributes.
supadata/youtube.py Enhanced the YouTube class with new methods (video, channel, channel_videos, playlist, playlist_videos) and a _validate_limit helper. Constructor now enforces type hints for the request handler, improving endpoint interaction and input validation.
tests/test_client.py Reorganized imports and updated mock responses. New test cases were added to cover the YouTube endpoints, including tests for video, channel, and playlist functionalities as well as error handling modifications.
supadata/client.py Modified import statements and updated error handling logic in _handle_gateway_error method for improved specificity regarding 404 responses.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant YT as YouTube
    participant RH as RequestHandler

    C->>YT: Call video("video_id")
    YT->>RH: Send API request for video metadata
    RH-->>YT: Return video metadata dict
    YT-->>C: Return YoutubeVideo object
Loading

Assessment against linked issues

Objective Addressed Explanation
Add support for YouTube endpoints (/youtube/video, /youtube/channel, /youtube/playlist, /youtube/channel/videos, /youtube/playlist/videos) (#9)
All parameters and returns are typed according to spec (#9)
README is updated (#9)
Test cases are added in tests (#9)

Poem

I'm a little rabbit, hopping through the code,
Discovering new endpoints in a bright, fresh mode.
YouTube channels, playlists, and videos all align,
With typed hints and tests making the code shine.
In this burrow of updates, every line is a delight,
Hopping along happily from morning till night!

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 616f41d and 28992d5.

📒 Files selected for processing (2)
  • supadata/client.py (2 hunks)
  • tests/test_client.py (12 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
tests/test_client.py (3)
supadata/types.py (6) (6)
  • Transcript (26-37)
  • TranscriptChunk (9-22)
  • TranslatedTranscript (41-50)
  • YoutubeChannel (174-195)
  • YoutubePlaylist (199-218)
  • YoutubeVideo (145-170)
supadata/youtube.py (7) (7)
  • transcript (30-65)
  • translate (67-101)
  • video (103-118)
  • channel (121-125)
  • playlist (128-132)
  • videos (166-189)
  • videos (214-236)
supadata/errors.py (1) (1)
  • SupadataError (8-31)
supadata/client.py (2)
supadata/errors.py (1) (1)
  • SupadataError (8-31)
supadata/youtube.py (1) (1)
  • YouTube (17-236)
🔇 Additional comments (27)
supadata/client.py (2)

3-3: Insignificant import reordering.

The order of imports from the typing module has been changed from Dict, Any to Any, Dict, which is a non-functional change that doesn't affect the code behavior.


61-62: Fixed error type for 404 responses.

This change properly updates the error type from "invalid-request" to "not-found" and improves the error message to clarify that the 404 status can indicate either a missing endpoint or a resource that couldn't be found. This matches the bug fix requested in the PR feedback to correctly handle 404 error responses.

tests/test_client.py (25)

3-18: Added YouTube classes and datetime imports.

These additions support the new YouTube-related tests. The imports are appropriately ordered and include all necessary classes for testing the new YouTube endpoints.


53-56: Simplified mock response format.

The mock response has been simplified to a more concise format while maintaining all necessary test data.


73-75: Formatting improvements and request mock simplification.

Similar to other changes, the mock responses and request mocks have been simplified for improved readability.


86-91: Simplified mock response and updated test method call.

The test now properly tests the translate functionality with the updated parameter names that match the implementation in the YouTube class.


107-109: Simplified mock responses.

Similar formatting improvements for consistency.


122-123: Simplified mock response and format.

Consistent simplification of mock responses for improved readability.


137-143: Updated error response with consistent formatting.

The error test has been updated to use more consistent formatting and to include proper content type headers.


159-160: Simplified mock response format.

Format consistency improvement.


179-183: Formatting improvements for mock responses.

Consistent formatting changes for better readability.


193-196: Simplified mock response format.

This follows the pattern of formatting improvements throughout the file.


201-202: Simplified mock response array.

Formatting improvement for the array of mock responses.


214-215: Simplified mock response format.

Consistent formatting changes for better readability.


222-235: Updated gateway error test to match real API calls.

The test has been updated to use a proper endpoint path and assertions match the expected error handling in the client.


239-252: Updated 404 error test to match new error handling.

The test assertions now correctly check for "not-found" error type and the updated error message, matching the changes made in the client.py file.


256-269: Updated 429 error test with proper endpoint and assertions.

Similar to the other gateway error tests, this has been updated to use a proper endpoint and match the expected error handling.


272-303: Added test for YouTube video endpoint.

This new test properly verifies the YouTube video endpoint functionality by:

  1. Properly mocking the API response
  2. Verifying the returned object is a YoutubeVideo instance
  3. Asserting all expected properties are correctly parsed
  4. Testing datetime conversion from ISO format

The test is thorough and well-structured.


305-324: Added test for invalid YouTube video ID.

This test properly covers the error case for an invalid video ID, ensuring the error handling works as expected with the updated "not-found" error type.


326-353: Added test for YouTube channel endpoint.

Similar to the video test, this test properly verifies the YouTube channel endpoint functionality by confirming the correct object type and properties.


355-374: Added test for invalid YouTube channel ID.

This test covers the error case for an invalid channel ID with proper assertions matching the updated error handling.


376-399: Added test for YouTube playlist endpoint.

This test properly verifies the YouTube playlist endpoint functionality including datetime conversion from ISO format.


401-422: Added test for invalid YouTube playlist ID.

This test covers the error case for an invalid playlist ID, with assertions that match the updated error handling.


424-441: Added test for channel videos endpoint.

This test verifies the new dot notation access for channel.videos functionality as requested in the PR feedback. The test properly verifies the returned list of video IDs.


443-463: Added test for invalid channel ID in videos endpoint.

This test covers the error case for an invalid channel ID when fetching videos, with assertions matching the updated error handling.


465-483: Added test for playlist videos endpoint.

This test verifies the new dot notation access for playlist.videos functionality as requested in the PR feedback. The test properly verifies the returned list of video IDs.


485-505: Added test for invalid playlist ID in videos endpoint.

This test covers the error case for an invalid playlist ID when fetching videos, with assertions matching the updated error handling.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
supadata/youtube.py (1)

214-223: ⚠️ Potential issue

Correct _validate_limit check.

Currently, isinstance(limit, int) in the condition will raise an error even for valid integers. Also address the typo "amout" → "amount."

     if limit is None:
         return
-    elif isinstance(limit, int) or limit <= 0 or limit > 5000:
+    elif not isinstance(limit, int) or limit <= 0 or limit > 5000:
         raise SupadataError(
             error="invalid-request",
             message="Invalid limit provided",
-            details="You provided a limit in an invalid format or amout.",
+            details="You provided a limit in an invalid format or amount.",
         )
🧹 Nitpick comments (5)
supadata/types.py (2)

183-183: Fix typo in docstring

There's a typo in the attribute name documentation.

-        subcriber_count: Number of subscribers
+        subscriber_count: Number of subscribers

202-202: Fix typos in docstring

There are a couple of typos in the class and attribute documentation.

-    """Youtube Plalist Details
+    """Youtube Playlist Details

-        description: Plalist Description
+        description: Playlist Description

Also applies to: 206-206

supadata/youtube.py (3)

114-126: Remove or convert the unassigned dictionary literal.

A literal dictionary expression is flagged by static analysis as a no-op. Either remove it or embed it in a docstring/example if intended for illustration.

-        {
-            "id": "pEfrdAtAmqk",
-            "duration": 1002,
-            "description": "The programming iceberg is complete roadmap to the loved, ...",
-            "title": "God-Tier Developer Roadmap",
-            "channel": {"id": "UCsBjURrPoezykLs9EqgamOA", "name": "Fireship"},
-            "tags": ["#iceberg", "#learntocode", "#programming"],
-            "thumbnail": "https://i.ytimg.com/vi/pEfrdAtAmqk/maxresdefault.jpg",
-            "upload_date": "2022-08-24T00:00:00.000Z",
-            "view_count": 7388353,
-            "like_count": 262086,
-            "transcript_languages": ["en"],
-        }
🧰 Tools
🪛 Ruff (0.8.2)

114-126: Found useless expression. Either assign it to a variable or remove it.

(B018)


132-142: Typographical fix in docstring.

Minor spelling fix needed: "cointaining" → "containing."

-            YoutubeChannel cointaining the metadata
+            YoutubeChannel containing the metadata

147-172: Refine docstring & remove debugging print.

  1. The docstring’s wording is slightly redundant.
  2. Consider removing or converting the print to a logger method in production.
-    """Get a list of the IDs of the list of video IDs from a YouTube channel."""
+    """Get a list of video IDs from a YouTube channel."""

-        print("Response:", response)
+        # print("Response:", response)  # or use logging.debug(...)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d94a68 and 7f1c1c2.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • supadata/__init__.py (2 hunks)
  • supadata/types.py (7 hunks)
  • supadata/youtube.py (5 hunks)
  • tests/test_client.py (12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
supadata/__init__.py

20-20: supadata.types.YoutubeChannel imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


21-21: supadata.types.YoutubePlaylist imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


22-22: supadata.types.YoutubeVideo imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

supadata/youtube.py

114-126: Found useless expression. Either assign it to a variable or remove it.

(B018)

🔇 Additional comments (9)
README.md (1)

68-87: Good job on the documentation updates!

The new examples for YouTube functionality are clear, concise, and follow the existing documentation style. They effectively demonstrate all the new endpoints added to the SDK.

supadata/types.py (1)

131-219: Well-structured and thoroughly documented data classes

The new data classes for YouTube entities are well-designed with appropriate types and comprehensive documentation. The optional description field in YoutubePlaylist is a good approach for handling potentially missing data.

tests/test_client.py (1)

272-441: Great test coverage for new YouTube endpoints

The tests for the new YouTube functionality are well-structured and thorough. They test both successful API calls and error handling scenarios for each endpoint. The mock responses are realistic and the assertions validate all important aspects of the returned objects.

supadata/youtube.py (6)

3-14: Imports look good.

No issues identified. The necessary classes and types are properly imported.


20-20: Constructor type annotations.

Great job specifying the callable signature for the request handler, ensuring clarity about expected arguments and return types.


28-30: Transcript method updates.

All parameter types, request parameters, and chunk handling look correct and consistent with the rest of the code.

Also applies to: 44-45, 58-60


66-67: Translate method updates.

The type hints and request structure are properly aligned, and chunk transformations are handled similarly to the transcript method. No issues found.

Also applies to: 81-85, 94-96, 99-100


101-112: New video method signature and docstring.

Everything appears well-structured: typed parameter and docstring are clear, raising SupadataError for failed requests.


127-129: Consider handling invalid date strings.

Wrap datetime.fromisoformat(...) in a try-except block or verification logic to prevent unhandled exceptions if upload_date is malformed.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
supadata/types.py (1)

144-172: Inconsistency between docstring and implementation in YoutubeVideo class.

The docstring mentions a comment_count attribute (line 157) that isn't implemented in the class. Additionally, the attributes uploaded_date and transcript_languages exist in the implementation but aren't documented in the docstring.

    """YouTube video details.

    Attributes:
        id: YouTube video ID
        title: Video title
        description: Video description
        duration: Duration of video in seconds
        channel: A dict containing the channel ID and channel name
        thumbnail: The URL of the video thumbnail
        view_count: Number of views
        like_count: Number of likes
-       comment_count: Number of comments
        tags: List of video tags
+       uploaded_date: The date when the video was uploaded
+       transcript_languages: List of available transcript languages
    """
supadata/youtube.py (1)

176-199: Consider clarifying the docstring for playlist_videos method.

The method's docstring description "Get a list of the IDs of the list of video IDs" is a bit redundant and could be clearer.

-    """Get a list of the IDs of the list of video IDs from a YouTube playlist.
+    """Get a list of video IDs from a YouTube playlist.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42f775e and 23e2a76.

📒 Files selected for processing (2)
  • supadata/types.py (7 hunks)
  • supadata/youtube.py (5 hunks)
🔇 Additional comments (14)
supadata/types.py (6)

4-5: Import additions look good.

The added imports for datetime and additional typing options support the new YouTube-related classes and improve type hinting.


53-73: The updated Scrape class attributes are well documented.

The addition of description, og_url, and count_characters attributes enhances the data model with proper type annotations and clear documentation.


88-101: Good addition of description attribute to CrawlPage class.

The addition of the description attribute to the CrawlPage class is consistent with the Scrape class update and properly documented.


130-142: Well-structured TypedDict for YouTube channel base information.

Using TypedDict for the channel information that will be nested in other classes is a good design pattern.


174-197: Well-structured YoutubeChannel class.

The YoutubeChannel class has appropriate attributes with clear type annotations and comprehensive documentation.


199-219: YoutubePlaylist class looks good.

The implementation includes all necessary attributes with appropriate types. The optional description field with a default value of None is a good design choice.

supadata/youtube.py (8)

3-14: Imports properly include the new types and required modules.

The updated imports properly include the required modules and the new YouTube-related types defined in the types.py file.


20-20: Improved type annotation for request_handler.

The explicit typing for the request_handler parameter enhances code clarity and API documentation.


28-30: Improved formatting and type hints for existing methods.

The reformatting of the transcript and translate methods improves readability while maintaining functionality. The explicit type hints provide better documentation.

Also applies to: 66-67, 81-85


101-117: Video method implementation looks good.

The method properly handles the API interaction and datetime conversion for the uploaded_date field. The method signature includes appropriate type annotations.


118-133: Channel method implementation is well structured.

The implementation is straightforward and appropriate, with clear parameter and return type annotations.


134-158: Channel_videos method handles pagination correctly.

The method validates the limit parameter and constructs the query parameters appropriately. The return type annotation is accurate.


159-175: Playlist method properly handles date conversion.

The method correctly extracts and converts the ISO format date string to a datetime object before creating the YoutubePlaylist instance.


200-208: Robust limit validation method with clear error messages.

The _validate_limit method provides comprehensive validation with appropriate error messages for invalid inputs.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/test_client.py (1)

485-505: ⚠️ Potential issue

Test for invalid playlist ID when retrieving videos

There's an inconsistency in this test. The URL in the mock and the endpoint being called don't match.

The URL in the mock points to /youtube/playlist but the assertion is for calling playlist_videos. The correct endpoint should be /youtube/playlist/videos:

requests_mock.get(
-    f"{client.base_url}/youtube/playlist?id={playlist_id}",
+    f"{client.base_url}/youtube/playlist/videos?id={playlist_id}",
    status_code=404,
    json=mock_response,
)

with pytest.raises(SupadataError) as exc_info:
-    client.youtube.playlist(playlist_id)
+    client.youtube.playlist_videos(playlist_id)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23e2a76 and 277431a.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • supadata/__init__.py (2 hunks)
  • supadata/types.py (7 hunks)
  • supadata/youtube.py (5 hunks)
  • tests/test_client.py (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • supadata/init.py
🔇 Additional comments (39)
README.md (1)

68-87: Added documentation is clear and concise

The new examples clearly demonstrate how to use the newly added YouTube endpoint functions: video metadata, channel metadata, channel videos, playlist metadata, and playlist videos. The examples follow the existing pattern and are easy to understand.

supadata/types.py (10)

4-5: Good import organization

Correctly importing datetime and expanding the typing imports to support the new classes.


7-14: Comprehensive import list

All necessary types are imported for the new YouTube functionality while maintaining the existing imports.


56-64: Updated Scrape class docstring is complete

The docstring for the Scrape class has been properly updated to include the new fields: description, og_url, and count_characters.


70-72: New fields added to Scrape class

The new fields match what's described in the docstring and follow the existing pattern.


89-96: Updated CrawlPage class docstring

The CrawlPage docstring has been updated to include the new description attribute.


101-101: Added description field to CrawlPage

The description field has been added to the CrawlPage class as documented.


130-142: Well-structured YoutubeChannelBaseDict

The YoutubeChannelBaseDict TypedDict is well-structured and documented. It will be reused in other YouTube-related classes, which is good for consistency.


144-170: Complete YoutubeVideo class implementation

The YoutubeVideo class is well-documented with all the necessary fields for video metadata. The use of proper types for each field is excellent.


173-196: Comprehensive YoutubeChannel class

The YoutubeChannel class is well-structured with all the necessary fields for channel data, and includes proper typing.


198-218: Well-implemented YoutubePlaylist class

The YoutubePlaylist class follows the same pattern as the other YouTube classes, with proper typing and documentation. Good use of Optional for the description field that might not always be available.

tests/test_client.py (12)

3-18: Appropriate imports for new classes

The imports have been updated to include the necessary imports for the new YouTube classes: YoutubeChannel, YoutubePlaylist, and YoutubeVideo. The datetime import is added to handle date fields in these classes.


172-180: Updated test for crawl results

Added description field to the test to match updated CrawlPage class structure.


193-194: Updated test for second page response

Properly updated the test for the second page of crawl results to include the description field.


272-303: Comprehensive test for the video endpoint

This test covers all the necessary assertions for the YoutubeVideo class, including proper conversion of date strings to datetime objects. The test uses realistic mock data that closely resembles an actual API response.


305-324: Error handling test for invalid video ID

Good test for handling invalid video IDs. The test verifies that the appropriate error is raised and that the error fields are populated correctly.


326-352: Thorough test for the channel endpoint

The test covers all attributes of the YoutubeChannel class and uses realistic mock data. All assertions are properly set up to verify the correct behavior.


355-374: Test for invalid channel ID

Proper test for handling an invalid channel ID request. It verifies that the appropriate error is raised with the correct error details.


376-399: Good playlist endpoint test

The test for the playlist endpoint is comprehensive, covering all attributes of the YoutubePlaylist class including the datetime conversion.


401-421: Proper test for invalid playlist ID

The test correctly checks error handling for invalid playlist IDs.


424-441: Test for channel videos endpoint

Good test for retrieving video IDs from a channel. The test verifies that the response is a list and contains the expected video IDs.


443-463: Test for invalid channel ID when retrieving videos

Proper test for handling an invalid channel ID when retrieving videos. It validates that the appropriate error is raised with the correct error details.


465-483: Test for playlist videos endpoint

Good test for retrieving video IDs from a playlist. The test verifies that the response is a list and contains the expected video IDs.

supadata/youtube.py (16)

3-14: Appropriate imports

The import statements have been updated to include the necessary imports for the new YouTube functionality. Good use of typing for function signatures.


20-20: Improved type annotation for init

Added a more specific type annotation for the request_handler parameter, which improves code readability and helps with IDE autocompletion.


28-30: Better formatting for transcript method signature

Improved the formatting of the method signature for better readability.


44-45: Clean code formatting

Reorganized the params dictionary creation for better readability.


58-61: Consistent code formatting

Improved the formatting of the list comprehension for better readability.


66-66: Better type annotation for translate method

Improved the type annotation for the translate method parameters.


81-85: Improved request formatting

Reorganized the request call for better readability.


94-97: Consistent formatting

Improved the formatting of the list comprehension to match the style used earlier.


101-117: Well-implemented video method

The video method is well-implemented with proper typing, docstring, and error handling. The method correctly transforms the uploaded_date from ISO format to a datetime object.


118-133: Well-implemented channel method

The channel method is well-implemented with proper typing, docstring, and error handling.


134-158: Well-structured channel_videos method

The channel_videos method is well-implemented with proper typing, docstring, error handling, and parameter validation. Good use of the _validate_limit helper method.


159-175: Well-implemented playlist method

The playlist method is well-implemented with proper typing, docstring, and error handling. The method correctly transforms the last_updated field from ISO format to a datetime object.


176-199: Good implementation of playlist_videos method

The playlist_videos method follows the same pattern as channel_videos, with proper typing, docstring, error handling, and parameter validation.


200-208: Well-implemented _validate_limit helper

The _validate_limit helper method is properly implemented with good error handling and appropriate limit validation. The error messages are clear and informative.


114-114:

Details

❓ Verification inconclusive

Potential key error in video method

The code assumes the response will always contain an "upload_date" key, but if this key is missing or named differently in the API response, it would cause a KeyError.

Consider using a more defensive approach:

- uploaded_time = datetime.fromisoformat(response.pop("upload_date"))
+ uploaded_time = datetime.fromisoformat(response.pop("upload_date", None) or datetime.now().isoformat())

Or handle the potential KeyError explicitly:

- uploaded_time = datetime.fromisoformat(response.pop("upload_date"))
+ try:
+     uploaded_time = datetime.fromisoformat(response.pop("upload_date"))
+ except KeyError:
+     # Log warning or use a default value
+     uploaded_time = datetime.now()

🏁 Script executed:

#!/bin/bash
# Check if the API response always includes upload_date
rg -A 5 -B 5 "upload_date|uploadDate" --include "*.py"

Length of output: 498


Action: Verify API Response Field and Implement Defensive Handling
After attempting to search for “upload_date” via a ripgrep command we encountered an error (incorrect use of the --include flag). This means we haven’t yet confirmed whether the API response always provides the “upload_date” field. Without definitive evidence that the key is always present in all API responses, it’s prudent to adopt a more defensive coding approach.

Suggested next steps:

  • Manually verify (or re-run with a corrected command) that the API response always includes “upload_date” (or “uploadDate”) across all relevant endpoints. For example, using:
    rg -A5 -B5 --glob "*.py" "upload_date|uploadDate"
  • Update the code in supadata/youtube.py (around line 114) to either provide a default fallback value or explicitly catch a potential KeyError. For example:
- uploaded_time = datetime.fromisoformat(response.pop("upload_date"))
+ try:
+     uploaded_time = datetime.fromisoformat(response.pop("upload_date"))
+ except KeyError:
+     # Log warning or handle as needed (using current datetime as fallback)
+     uploaded_time = datetime.now()

or

- uploaded_time = datetime.fromisoformat(response.pop("upload_date"))
+ uploaded_time = datetime.fromisoformat(response.pop("upload_date", None) or datetime.now().isoformat())

Please verify these changes within the context of your API responses to ensure they handle all cases correctly.


172-172:

Details

❓ Verification inconclusive

Potential key error in playlist method

Similar to the video method, there's a risk of KeyError if "last_updated" is missing from the response.

Consider using a more defensive approach:

- last_updated = datetime.fromisoformat(response.pop("last_updated"))
+ try:
+     last_updated = datetime.fromisoformat(response.pop("last_updated"))
+ except KeyError:
+     # Log warning or use a default value
+     last_updated = datetime.now()

🏁 Script executed:

#!/bin/bash
# Check if the API response always includes last_updated
rg -A 5 -B 5 "last_updated|lastUpdated" --include "*.py"

Length of output: 500


Action Required: Confirm and Fix Potential KeyError in Playlist Method
There’s a risk of a KeyError in the playlist method if the API response is missing the "last_updated" key. Although we attempted to verify the presence of this key with a search, the execution error with the rg --include flag means we couldn’t conclusively determine if every API response always provides this field. Please verify the API responses manually or via an updated search (for example, using --glob "*.py" instead of --include) to ensure that the key is reliably present. In the meantime, adopting a defensive coding approach is recommended. For instance, you can update the code in supadata/youtube.py (around line 172) as follows:

-        last_updated = datetime.fromisoformat(response.pop("last_updated"))
+        try:
+            last_updated = datetime.fromisoformat(response.pop("last_updated"))
+        except KeyError:
+            # Log warning or use a default value
+            last_updated = datetime.now()

Please confirm the API behavior and adjust the handling as necessary.

Copy link
Copy Markdown
Contributor

@rafalzawadzki rafalzawadzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @LuizDMM thanks for contributing your solution!

mostly looks good, but I have two asks:

  1. Asserts seem to be different from mocks in the tests. Are they correct?
  2. Can you add namespaces for channel and playlist so we can accomplish dot notation like with other features? eg. channel.videos instead of channel_videos

@LuizDMM
Copy link
Copy Markdown
Contributor Author

LuizDMM commented Mar 18, 2025

hi @LuizDMM thanks for contributing your solution!

mostly looks good, but I have two asks:

1. Asserts seem to be different from mocks in the tests. Are they correct?

2. Can you add namespaces for channel and playlist so we can accomplish dot notation like with other features? eg. `channel.videos` instead of `channel_videos`

Hey @rafalzawadzki, thank you for your feedback! I'll check all the tests and the namespace and get back to you.

Fixes supadata-ai#9

fix(project): fix the tests and fix imports on __init__.py

fix(supadata): Fix small typos in docstrings

fix(supadata): Fix small typo in docstring

fix(tests) fix function called in test

fix(youtube): create inner classes to enable dot notation
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
supadata/youtube.py (1)

214-236: Clarify the videos method docstring.

The docstring for videos method in the _Playlist class has redundant wording.

-        """Get a list of the IDs of the list of video IDs from a YouTube playlist.
+        """Get a list of video IDs from a YouTube playlist.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aae525e and 616f41d.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • supadata/__init__.py (2 hunks)
  • supadata/types.py (7 hunks)
  • supadata/youtube.py (5 hunks)
  • tests/test_client.py (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • supadata/init.py
🧰 Additional context used
🧠 Learnings (1)
supadata/youtube.py (3)
Learnt from: LuizDMM
PR: supadata-ai/py#11
File: supadata/youtube.py:157-157
Timestamp: 2025-03-17T19:09:54.600Z
Learning: In the Supadata library, API responses with camelCase keys (like "videoIds") are automatically converted to snake_case (like "video_ids") by the `Supadata._camel_to_snake` method before they're processed by class methods, so using snake_case keys in the code is the correct approach.
Learnt from: LuizDMM
PR: supadata-ai/py#11
File: supadata/youtube.py:157-157
Timestamp: 2025-03-17T19:09:54.600Z
Learning: In the Supadata library, API responses with camelCase keys (like "videoIds") are converted to snake_case (like "video_ids") by the `Supadata._camel_to_snake` method before they're processed by class methods.
Learnt from: LuizDMM
PR: supadata-ai/py#11
File: supadata/youtube.py:198-198
Timestamp: 2025-03-17T19:10:13.198Z
Learning: In the Supadata client, the `_camel_to_snake` method automatically converts API response keys from camelCase to snake_case, allowing methods to access data with snake_case keys (e.g., "video_ids") even though the API returns camelCase keys (e.g., "videoIds").
🧬 Code Definitions (3)
tests/test_client.py (4)
supadata/types.py (10) (10)
  • CrawlJob (120-127)
  • CrawlPage (88-101)
  • Map (77-84)
  • Scrape (54-73)
  • Transcript (26-37)
  • TranscriptChunk (9-22)
  • TranslatedTranscript (41-50)
  • YoutubeChannel (174-195)
  • YoutubePlaylist (199-218)
  • YoutubeVideo (145-170)
supadata/youtube.py (7) (7)
  • transcript (30-65)
  • translate (67-101)
  • video (103-118)
  • channel (121-125)
  • playlist (128-132)
  • videos (166-189)
  • videos (214-236)
supadata/web.py (1) (1)
  • get_crawl_results (69-107)
supadata/errors.py (1) (1)
  • SupadataError (8-31)
supadata/types.py (1)
supadata/youtube.py (1) (1)
  • channel (121-125)
supadata/youtube.py (2)
supadata/types.py (6) (6)
  • Transcript (26-37)
  • TranscriptChunk (9-22)
  • TranslatedTranscript (41-50)
  • YoutubeChannel (174-195)
  • YoutubePlaylist (199-218)
  • YoutubeVideo (145-170)
supadata/client.py (1) (1)
  • _request (84-123)
🔇 Additional comments (18)
supadata/types.py (4)

129-141: Well-structured TypedDict for channel base information.

This is a good foundation for reusing channel information across different types like YoutubeVideo and YoutubePlaylist. The TypedDict approach is the right choice for this scenario.


144-171: Comprehensive video type with complete attributes.

The YoutubeVideo class correctly captures all necessary attributes for a YouTube video, including metadata and engagement metrics. The documentation is thorough and all fields are properly typed.


173-196: Well-defined YouTube channel dataclass.

The YoutubeChannel class has a complete set of attributes needed to represent YouTube channels. All fields have appropriate types and the documentation clearly describes each attribute.


198-219: Good implementation of playlist dataclass with optional description.

The YoutubePlaylist class is well-structured with appropriate types. Making the description optional is a good choice, as not all playlists have descriptions.

tests/test_client.py (7)

272-303: Well-structured test for YouTube video retrieval.

The test thoroughly validates the video method by checking the return type and verifying that all properties from the mock response are correctly mapped to the corresponding fields in the YoutubeVideo object.


305-324: Error handling test is correctly implemented.

This test properly verifies that invalid video ID requests raise a SupadataError with the appropriate error details. The difference between mock response error messages and assertions is expected due to the gateway error handling in Supadata._handle_gateway_error as explained in previous comments.


326-353: Good test for YouTube channel retrieval.

The test thoroughly validates all fields of the returned YoutubeChannel object against the mock response, ensuring that the channel method works as expected.


355-374: Error handling test is correctly implemented.

This test properly verifies that invalid channel ID requests raise a SupadataError with the appropriate error details. As previously discussed, the difference in error details is due to the gateway error handling in the client.


376-399: Thorough test for YouTube playlist retrieval.

The test effectively validates the playlist method by checking all properties of the returned YoutubePlaylist object, including proper handling of the datetime conversion for the last_updated field.


424-441: Good test for channel videos retrieval using namespace approach.

The test correctly validates the namespace-based approach for retrieving channel videos (client.youtube.channel.videos), which aligns with rafalzawadzki's feedback in the PR comments about using dot notation for namespace access.


465-483: Good test for playlist videos retrieval using namespace approach.

The test correctly validates the namespace-based approach for retrieving playlist videos (client.youtube.playlist.videos), which follows the same pattern as the channel videos implementation, maintaining consistency.

supadata/youtube.py (7)

20-20: Good addition of type hints to the constructor.

Adding type hints to the request_handler parameter improves code documentation and provides better IDE support. The Callable[[str, str, Any], Dict[str, Any]] type accurately describes the expected function signature.


103-118: Well-implemented video metadata retrieval.

The video method correctly handles the YouTube video data retrieval, including the conversion of the upload date string to a datetime object. The method has good error handling, proper type hints, and comprehensive documentation.


120-132: Good implementation of namespace properties.

The channel and playlist properties implement the dot notation approach as requested in the PR feedback, allowing users to access features like channel.videos instead of using channel_videos. The lazy initialization pattern used here is efficient.


134-142: Proper limit validation.

The _validate_limit method provides a reusable validation function for limits across different endpoints, with appropriate error messages and handling of edge cases, including None values.


144-190: Well-structured Channel class implementation.

The _Channel inner class properly encapsulates channel-related functionality, with both metadata retrieval and video listing capabilities. The use of __call__ to allow direct invocation is a nice design pattern.


189-189: Consistent use of snake_case for API response keys.

Using video_ids key is correct here. As noted in the retrieved learnings, the Supadata client automatically converts camelCase keys from the API response to snake_case via the _camel_to_snake method, so even though the API returns videoIds, it will be accessible as video_ids in the code.


236-236: Consistent use of snake_case for API response keys.

Using video_ids key is correct here, following the same pattern as in the Channel class. The _camel_to_snake method handles the conversion from the API's camelCase to snake_case internally.

@LuizDMM
Copy link
Copy Markdown
Contributor Author

LuizDMM commented Mar 19, 2025

Hey @rafalzawadzki, done! Just some comments about what I've done and your questions:

  1. Asserts seem to be different from mocks in the tests. Are they correct?

Just repeating myself from above, yes, the assertion is correct. The difference between the message from the API and the one in the exception is due to Supadata._handle_gateway_error. When the API returns 404, in the case of a 404, SupadataError.message is always "Endpoint does not exist.", what changes is the SupadataError.details, that is based from the message in the response as you can see in the code here (supadata/client.py lines 58-63):

Captura de tela de 2025-03-18 20-49-39

We can take a look at this and modify Supadata._handle_gateway_error if you wish. Just let me know and I'll fix this with another commit.

  1. Can you add namespaces for channel and playlist so we can accomplish dot notation like with other features? eg. channel.videos instead of channel_videos

Sure can, I added two inner classes to YouTube, _Channel and _Playlist, moved the logic to these classes, so now you can pull the data in the desired dot notation. I also updated the documentation and tests to reflect these changes.

I hope these changes fix everything, but just let me know if you want me to change the handler of gateway error or anything else in the PR

@rafalzawadzki
Copy link
Copy Markdown
Contributor

oo good find! this is a bug - the gateway should return not-found for 404 errors instead of invalid-request.

Can you please fix that? I'll add a bonus to this bounty to compensate your extra effort.

@LuizDMM
Copy link
Copy Markdown
Contributor Author

LuizDMM commented Mar 19, 2025

oo good find! this is a bug - the gateway should return not-found for 404 errors instead of invalid-request.

Can you please fix that? I'll add a bonus to this bounty to compensate your extra effort.

Sure, on it.

@LuizDMM
Copy link
Copy Markdown
Contributor Author

LuizDMM commented Mar 19, 2025

Done! Fixed the message and the tests to reflect the change. Just let me know if any other change is needed.

Copy link
Copy Markdown
Contributor

@rafalzawadzki rafalzawadzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @LuizDMM! lgtm 🚀

@rafalzawadzki
Copy link
Copy Markdown
Contributor

/tip $10

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 19, 2025

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 19, 2025

@LuizDMM: You just got a $10 tip! 👉 Complete your Algora onboarding to collect your payment.

@rafalzawadzki rafalzawadzki merged commit aa58ce4 into supadata-ai:main Mar 19, 2025
1 check passed
@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 20, 2025

🎉🎈 @LuizDMM has been awarded $10! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for new endpoints

2 participants