feat: Add YouTube batch operations support#10
feat: Add YouTube batch operations support#10rafalzawadzki merged 3 commits intosupadata-ai:mainfrom
Conversation
WalkthroughThe changes introduce batch processing capabilities for handling YouTube transcripts and video metadata in the Supadata SDK. New methods are added to initiate batch jobs and poll for results until completion or error. Documentation and examples have been updated to illustrate these new flows, and tests have been extended to cover both successful and erroneous scenarios. The core service has been refactored for clarity and modularity, and corresponding type definitions have been added. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SDK
participant API
Client->>SDK: Initiate Transcript/Video Batch Job (videoIds, [lang])
SDK->>API: POST /youtube/transcript|video/batch
API-->>SDK: Return Job ID
SDK->>Client: Log Job ID
loop Polling until completed/failed
Client->>SDK: Poll getBatchResults(jobId)
SDK->>API: GET /youtube/batch/{jobId}
API-->>SDK: Return batch status (queued/active/completed/failed)
SDK->>Client: Log current status
end
alt Job Completed
SDK->>Client: Log results and statistics
else Job Failed
SDK->>Client: Log failure status
end
Assessment against linked issues
Possibly related PRs
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
src/__tests__/supadata.test.ts (1)
478-515: Consider adding a test for batch with videoIds arrayWhile you have excellent test coverage for channel-based batch operations, consider adding a test case that uses the
videoIdsarray parameter as shown in the README examples.+ it('should start a video metadata batch job with video IDs array', async () => { + const mockRequest = { + videoIds: ['dQw4w9WgXcQ', 'xvFZjo5PgG0'], + }; + const mockResponse: YoutubeBatchJob = { + jobId: '123e4567-e89b-12d3-a456-426614174003', + }; + + fetchMock.mockResponseOnce(JSON.stringify(mockResponse), { + status: 200, + headers: { 'content-type': 'application/json' }, + }); + + const result = await supadata.youtube.video.batch(mockRequest); + + expect(result).toEqual(mockResponse); + expect(fetchMock).toHaveBeenCalledWith( + 'https://api.supadata.ai/v1/youtube/video/batch', + expect.objectContaining({ + method: 'POST', + body: JSON.stringify(mockRequest), + }) + ); + });README.md (1)
92-99: Consider adding polling example in READMEWhile the example code in
example/index.tsshows a polling mechanism for checking batch job status, it would be helpful to add a simplified version in the README to show best practices for handling long-running batch jobs.// Get results for a batch job (poll until status is 'completed' or 'failed') - const batchResults = await supadata.youtube.batch.getBatchResults(transcriptBatch.jobId); // or videoBatch.jobId - if (batchResults.status === 'completed') { - console.log('Batch job completed:', batchResults.results); - console.log('Stats:', batchResults.stats); - } else { - console.log('Batch job status:', batchResults.status); - } + // Simple polling example + let batchResults; + do { + batchResults = await supadata.youtube.batch.getBatchResults(transcriptBatch.jobId); + console.log('Batch job status:', batchResults.status); + if (batchResults.status !== 'completed' && batchResults.status !== 'failed') { + await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5 seconds + } + } while (batchResults.status !== 'completed' && batchResults.status !== 'failed'); + + if (batchResults.status === 'completed') { + console.log('Batch job completed:', batchResults.results); + console.log('Stats:', batchResults.stats); + } else { + console.log('Batch job failed'); + }src/types.ts (1)
141-146: Consider documenting result item structureWhile the YoutubeBatchResultItem interface is well-structured, it's not immediately clear when transcript or video properties would be present. Consider adding JSDoc comments to clarify the expected content based on the type of batch operation.
+ /** + * Represents an individual result item in a batch operation + * - For transcript batch jobs, the transcript property will be populated + * - For video batch jobs, the video property will be populated + * - If an error occurred for this specific item, errorCode will be populated + */ export interface YoutubeBatchResultItem { videoId: string; transcript?: Transcript; video?: YoutubeVideo; errorCode?: string; }src/services/youtube.ts (3)
86-99: Consider handling partial batch execution results.
While the batch operation is well-structured, you may want to handle partial successes/failures in future improvements.
117-131: Batch video fetch logic is nicely integrated.
Code duplication with transcript batch is acceptable but might be refactored to reduce repetition if it grows more complex.
230-243: Consider consolidating batch limit validation.
This logic largely duplicatesvalidateLimit; consider merging them or extracting shared checks to a helper function for maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (5)
README.md(1 hunks)example/index.ts(2 hunks)src/__tests__/supadata.test.ts(3 hunks)src/services/youtube.ts(4 hunks)src/types.ts(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
example/index.ts (1)
src/types.ts (3)
YoutubeBatchJob(131-133)YoutubeBatchResults(154-159)SupadataError(53-78)
src/__tests__/supadata.test.ts (1)
src/types.ts (2)
YoutubeBatchJob(131-133)YoutubeBatchResults(154-159)
🔇 Additional comments (25)
src/__tests__/supadata.test.ts (3)
412-476: Good test coverage for transcript batch operationsThe tests for transcript batch operations cover important scenarios including:
- Starting batch jobs with video IDs
- Starting batch jobs with playlist ID
- Error handling for invalid parameters
This provides a solid foundation for testing the new functionality.
517-540: Test coverage for batch results is thoroughThe test for retrieving batch results verifies that the correct endpoint is called with the appropriate parameters and that the response is properly handled.
542-547: Good test for jobId validationTesting that an error is thrown when an empty jobId is provided ensures that client-side validation is working properly before making an API call.
README.md (2)
75-82: Clear documentation for transcript batch operationsThe examples clearly demonstrate how to start a YouTube transcript batch job with different input sources (videoIds, playlistId, channelId) and how to handle the response.
84-90: Consistent pattern for video batch operationsThe documentation for video batch operations follows the same pattern as transcript batch operations, making it easy for users to understand and implement both features.
example/index.ts (5)
2-6: Type imports improve code readabilityImporting the specific types needed for the batch operations improves code readability and makes type checking more explicit.
103-110: Good example for transcript batch operationThe example provides a clear demonstration of how to start a YouTube transcript batch job and properly handle the response.
113-120: Good example for video batch operationThe example demonstrates starting a YouTube video metadata batch job with a playlist ID and limit, which is valuable for showing real-world usage.
123-174: Robust polling implementationThe polling implementation for batch results is robust and includes:
- Maximum attempt limits
- Configurable polling interval
- Error handling
- Status checking
- Comprehensive result handling
This provides a good reference implementation for SDK users.
176-180: Enhanced error handling with SupadataErrorThe error handling has been improved to specifically handle SupadataError instances and extract more detailed error information, which is a good practice.
src/types.ts (6)
118-123: Well-structured batch source interfaceThe YoutubeBatchSource interface provides a flexible structure for specifying the source of videos for batch operations, allowing for multiple input methods (videoIds, playlistId, channelId).
125-129: Specialized batch request interfacesThe YoutubeTranscriptBatchRequest and YoutubeVideoBatchRequest interfaces properly extend the base YoutubeBatchSource interface, adding specific properties where needed.
131-133: Simple and clear job identifier interfaceThe YoutubeBatchJob interface provides a clear contract for job identification with a single jobId property.
135-139: Clear batch job status enumerationThe YoutubeBatchJobStatus type defines a clear set of possible statuses for batch jobs, making it easy to understand the lifecycle of a batch operation.
148-152: Clear batch statistics interfaceThe YoutubeBatchStats interface provides a simple and clear way to track the progress and results of batch operations.
154-159:Details
❓ Verification inconclusive
Consider validating input parameters
The YoutubeBatchResults interface is well-structured, but there's no validation to ensure that only one source (videoIds, playlistId, or channelId) is provided at a time. Consider adding validation in the implementation to prevent confusion.
🏁 Script executed:
#!/bin/bash # Check if the implementation validates that only one source is provided # Look for validation in the codebase rg -A 5 "videoIds.*playlistId|playlistId.*videoIds|channelId.*videoIds|videoIds.*channelId|channelId.*playlistId|playlistId.*channelId" --type typescriptLength of output: 190
Action: Implement explicit input validation for source selection.
Upon reviewing the code—and after re-running the regex search using a corrected file filter (e.g., using
-g "*.ts"instead of the unsupported--type typescript)—there’s no evidence of logic that prevents multiple sources (i.e.videoIds,playlistId, orchannelId) from being provided concurrently. To avoid potential confusion or misconfiguration, please verify that the relevant input handling (likely in the job processing logic) enforces that only one source is specified at a time. If validation is already implemented elsewhere, kindly document its behavior; otherwise, consider adding an explicit check to return a clear error when multiple sources are detected.src/services/youtube.ts (9)
6-7: Newly introduced batch types are properly defined.
They align well with existing structures and naming conventions.
10-12: Batch request interfaces look solid.
These interfaces appear consistent with the rest of the codebase.
55-65: Documentation updates provide clear guidance.
The docstrings for transcript-related operations are well-structured, enhancing maintainability.
66-72: Main transcript function is clear and consistent.
The direct return ofthis.fetch<Transcript>seamlessly fits the pattern used throughout the service.
74-84: Translation method is straightforward and self-contained.
The method leveragesthis.fetchproperly and reflects the necessary parameter typing.
107-110: Additional documentation for video batch method is helpful.
The instructions and returns are clearly defined.
112-115: Video fetch implementation follows established patterns.
This mirrors the transcript fetch approach and is consistent with the codebase architecture.
193-214: Generic YouTube batch operations are well-structured.
The error handling for missing jobId is appropriately implemented.
229-229: Trivial whitespace insertion.
No action needed.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/services/youtube.ts (1)
205-210: API Method Path Has ChangedThis change from
supadata.youtube.translateto using/youtube/transcript/translaterepresents a significant change in the API structure. Ensure that this change is clearly documented in the changelog to help existing users migrate their code.
🧹 Nitpick comments (1)
src/services/youtube.ts (1)
226-239: Consider consolidating validation methodsThe
validateBatchLimitmethod is very similar to the existingvalidateLimitmethod, differing only in error messages.Consider consolidating these into a single validation method:
- private validateLimit(params: { limit?: number }) { - if ( - params.limit != undefined && - params.limit != null && - (params.limit < 1 || params.limit > 5000) - ) { - throw new SupadataError({ - error: 'invalid-request', - message: 'Invalid limit.', - details: 'The limit must be between 1 and 5000.', - }); - } - } - - // Add a specific validator for batch limits as per documentation (Max: 5000, Default: 10) - private validateBatchLimit(params: { limit?: number }) { - if ( - params.limit != undefined && - params.limit != null && - (params.limit < 1 || params.limit > 5000) - ) { - throw new SupadataError({ - error: 'invalid-request', - message: 'Invalid limit for batch operation.', - details: 'The limit must be between 1 and 5000.', - }); - } - } + private validateLimit( + params: { limit?: number }, + options: { isBatch?: boolean } = {} + ) { + if ( + params.limit != undefined && + params.limit != null && + (params.limit < 1 || params.limit > 5000) + ) { + throw new SupadataError({ + error: 'invalid-request', + message: options.isBatch ? 'Invalid limit for batch operation.' : 'Invalid limit.', + details: 'The limit must be between 1 and 5000.', + }); + } + }Then update method calls:
this.validateLimit(params); // Regular call this.validateLimit(params, { isBatch: true }); // For batch operations
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/__tests__/supadata.test.ts(2 hunks)src/services/youtube.ts(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/__tests__/supadata.test.ts (1)
src/types.ts (2)
YoutubeBatchJob(131-133)YoutubeBatchResults(154-159)
src/services/youtube.ts (1)
src/types.ts (6)
Transcript(8-12)YoutubeTranscriptBatchRequest(125-127)YoutubeBatchJob(131-133)YoutubeBatchResults(154-159)SupadataError(53-78)TranslatedTranscript(14-17)
🔇 Additional comments (8)
src/__tests__/supadata.test.ts (4)
4-14: LGTM: New imports for batch operation typesThe imports match the required types for batch operations that have been added to the SDK.
412-476: Well-structured test suite for Transcript Batch operationsThe new test suite for
Transcript Batchthoroughly tests the batch capabilities for YouTube transcripts:
- Successfully starting a batch job with video IDs
- Successfully starting a batch job with a playlist ID and limit
- Error handling for invalid limits
All tests properly mock API responses and verify both successful responses and error cases.
478-515: Well-structured test suite for Video Batch operationsThe new test suite for
Video Batchproperly tests:
- Successfully starting a video metadata batch job with a channel ID
- Error handling for invalid limits
The test cases follow best practices by validating both responses and error conditions before making API calls.
517-548: Good coverage of General Batch OperationsThe test suite for
General Batch Operationsappropriately validates:
- Successfully retrieving batch results using a job ID
- Error handling when no job ID is provided
The test cases are consistent with the overall testing approach and provide good coverage of edge cases.
src/services/youtube.ts (4)
6-12: LGTM: New imports for batch operation typesThe imports match the required types for the new batch operations functionality.
66-88: Well-structured refactoring of transcript functionalityThe refactoring using
Object.assign()creates a clean interface that groups related transcript functionality together. The batch method properly validates limits before making API requests.
101-120: Well-structured refactoring of video functionalitySimilar to the transcript refactoring, the video functionality is well-organized using
Object.assign(). The batch method properly validates input parameters before making API requests.
182-203: Good implementation of batch results retrievalThe batch object and its getBatchResults method are well-implemented with proper error handling for missing job IDs. This provides a centralized way to retrieve results for all batch operations.
rafalzawadzki
left a comment
There was a problem hiding this comment.
Overall solid PR and after testing I was this close 🤏 to merging!
However, I am not happy with the JS docs in src/services/youtube.ts. All param documentations are now assigned to top object, making sub-functions undocumented.
Eg. youtube.transcript.batch() params do not have documentation.
Can you fix?
| /** | ||
| * Fetches a YouTube video based on the provided parameters. | ||
| * | ||
| * @param params - The parameters required to fetch the YouTube video. | ||
| * @param params.id - The YouTube video ID. | ||
| * @returns A promise that resolves to a `YoutubeVideo` object. | ||
| * | ||
| * @property batch - Batch fetches metadata for multiple YouTube videos. | ||
| * @param params - Parameters for the video metadata batch job | ||
| * @returns A promise that resolves to a `YoutubeBatchJob` object with the job ID. | ||
| */ | ||
| async video(params: ResourceParams): Promise<YoutubeVideo> { | ||
| return this.fetch<YoutubeVideo>('/youtube/video', params); | ||
| } | ||
| video = Object.assign( | ||
| async (params: ResourceParams): Promise<YoutubeVideo> => { | ||
| return this.fetch<YoutubeVideo>('/youtube/video', params); | ||
| }, | ||
| { | ||
| /** | ||
| * Batch fetches metadata for multiple YouTube videos. | ||
| */ | ||
| batch: async ( | ||
| params: YoutubeVideoBatchRequest | ||
| ): Promise<YoutubeBatchJob> => { | ||
| this.validateBatchLimit(params); | ||
| return this.fetch<YoutubeBatchJob>( | ||
| '/youtube/video/batch', | ||
| params, | ||
| 'POST' | ||
| ); | ||
| }, | ||
| } | ||
| ); |
There was a problem hiding this comment.
the JSDoc params are incorrectly assigned to whole object video, whereas some should be assigned to batch function
src/services/youtube.ts
Outdated
| * @property translate - Translates a YouTube video transcript. | ||
| * @param params - Parameters for translating the transcript | ||
| * @param params.videoId - The YouTube video ID (provide either this OR url) | ||
| * @param params.url - The YouTube video URL (provide either this OR videoId) | ||
| * @param params.lang - The target language code for translation | ||
| * @param params.text - Optional flag to return plain text instead of timestamped list | ||
| * @returns A promise that resolves to the translated transcript | ||
| * | ||
| * @property batch - Batch fetches transcripts for multiple YouTube videos. | ||
| * @param params - Parameters for the transcript batch job | ||
| */ | ||
| async translate(params: TranslateParams): Promise<TranslatedTranscript> { | ||
| return this.fetch<TranslatedTranscript>( | ||
| '/youtube/transcript/translate', | ||
| params | ||
| ); | ||
| } | ||
| transcript = Object.assign( | ||
| /** | ||
| * Fetches a transcript for a YouTube video. | ||
| */ | ||
| async (params: TranscriptParams): Promise<Transcript> => { | ||
| return this.fetch<Transcript>('/youtube/transcript', params); | ||
| }, | ||
| { | ||
| /** | ||
| * Batch fetches transcripts for multiple YouTube videos. | ||
| */ | ||
| batch: async ( | ||
| params: YoutubeTranscriptBatchRequest | ||
| ): Promise<YoutubeBatchJob> => { | ||
| this.validateBatchLimit(params); | ||
| return this.fetch<YoutubeBatchJob>( | ||
| '/youtube/transcript/batch', | ||
| params, | ||
| 'POST' | ||
| ); | ||
| }, | ||
| } | ||
| ); |
There was a problem hiding this comment.
the JSDoc params are incorrectly assigned to whole object transcript, whereas some should be assigned to batch function
|
Thanks for the review, fixed the docs @rafalzawadzki |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/services/youtube.ts (3)
54-90: Consider enhancing the JSDoc for the transcript method to mention batch capabilities.The JSDoc comment in line 55 could be more specific about the batch capabilities that this method now provides.
- /** - * Handles YouTube Transcript operations. - */ + /** + * Handles YouTube Transcript operations. + * Supports both single transcript retrieval and batch operations. + */The current implementation with Object.assign is a good pattern that maintains backward compatibility while adding new functionality. The documentation for individual functions is detailed and clear.
92-124: Consider enhancing the JSDoc for the video method to mention batch capabilities.Similar to the transcript method, the JSDoc in line 93 could be more specific about the batch capabilities.
- /** - * Handles YouTube video operations. - */ + /** + * Handles YouTube video operations. + * Supports both single video metadata retrieval and batch operations. + */The batch implementation for video metadata is well-structured and follows the same pattern as the transcript batch implementation, which is good for consistency.
234-247: Consider refactoring validation methods to reduce duplication.The
validateBatchLimitmethod is very similar to the existingvalidateLimitmethod. Consider extracting the common validation logic to reduce duplication.- private validateLimit(params: { limit?: number }) { - if ( - params.limit != undefined && - params.limit != null && - (params.limit < 1 || params.limit > 5000) - ) { - throw new SupadataError({ - error: 'invalid-request', - message: 'Invalid limit.', - details: 'The limit must be between 1 and 5000.', - }); - } - } - - // Add a specific validator for batch limits as per documentation (Max: 5000, Default: 10) - private validateBatchLimit(params: { limit?: number }) { - if ( - params.limit != undefined && - params.limit != null && - (params.limit < 1 || params.limit > 5000) - ) { - throw new SupadataError({ - error: 'invalid-request', - message: 'Invalid limit for batch operation.', - details: 'The limit must be between 1 and 5000.', - }); - } - } + private validateLimit(params: { limit?: number }, isBatch: boolean = false) { + if ( + params.limit != undefined && + params.limit != null && + (params.limit < 1 || params.limit > 5000) + ) { + throw new SupadataError({ + error: 'invalid-request', + message: isBatch ? 'Invalid limit for batch operation.' : 'Invalid limit.', + details: 'The limit must be between 1 and 5000.', + }); + } + } + + // Wrapper method for batch limit validation + private validateBatchLimit(params: { limit?: number }) { + this.validateLimit(params, true); + }Despite the duplication, having a separate method for batch validation with specific error messages is fine if you prefer the clarity of purpose.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/services/youtube.ts(3 hunks)
🔇 Additional comments (5)
src/services/youtube.ts (5)
6-12: Good job adding the new type imports for batch operations.These new type imports align well with the batch functionality implementation. The naming is clear and follows the existing pattern.
126-154: The improved JSDoc comments for channel methods provide better clarity.The updated documentation for channel operations is clear and comprehensive, which improves code maintainability.
156-183: The improved JSDoc comments for playlist methods enhance code documentation.The updated documentation for playlist operations follows the same pattern as other methods, maintaining consistency throughout the codebase.
185-205: Well-structured batch operations implementation.The new batch object with the getBatchResults method is a logical addition for managing batch operations. The input validation and error handling are implemented properly, and the JSDoc documentation is clear and comprehensive.
207-218: Good refactoring of the translate method.Converting the translate method to an arrow function style makes it consistent with other methods in the class. The improved JSDoc provides better clarity while the functionality remains unchanged.
Fixes: #9
/claim #9
Summary by CodeRabbit
New Features
Tests
Refactor