Skip to content

Add GitHub blob URL code snippet expansion#142

Merged
simonw merged 1 commit intomainfrom
claude/github-url-code-snippets-013bWK7HU7o9T8N7xhQMZEi8
Dec 5, 2025
Merged

Add GitHub blob URL code snippet expansion#142
simonw merged 1 commit intomainfrom
claude/github-url-code-snippets-013bWK7HU7o9T8N7xhQMZEi8

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Dec 5, 2025

When converting GitHub issues to markdown, detect blob URLs with line
numbers (e.g., #L10-L20) and automatically fetch the referenced code
from raw.githubusercontent.com. The code is inserted as a fenced code
block with appropriate language syntax highlighting based on the file
extension.


Update GitHub-issue-to-markdown such that when it notices a URL like this one:

https://github.com/datasette/datasette-public/blob/5213c41521821c03688c6099581e198a831f85d5/tests/test_public.py#L293-L306

It attempts to fetch the specified line of code from the appropriate raw.githubusercontent.com URL and add that to the document in a fenced code block using a tag derived from the file extension (or not tag) directly below that URL

When converting GitHub issues to markdown, detect blob URLs with line
numbers (e.g., #L10-L20) and automatically fetch the referenced code
from raw.githubusercontent.com. The code is inserted as a fenced code
block with appropriate language syntax highlighting based on the file
extension.
@cloudflare-workers-and-pages
Copy link

Deploying tools with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2a87e56
Status: ✅  Deploy successful!
Preview URL: https://c8e89aed.tools-b1q.pages.dev
Branch Preview URL: https://claude-github-url-code-snipp.tools-b1q.pages.dev

View logs

@claude
Copy link
Contributor

claude bot commented Dec 5, 2025

Claude finished @simonw's task —— View job


Code Review Complete ✅

I've thoroughly reviewed the GitHub blob URL code snippet expansion feature. Here's my comprehensive feedback:

Todo List:

  • Read and analyze the changed file
  • Review code quality and best practices
  • Check for potential bugs and issues
  • Evaluate performance considerations
  • Identify security concerns
  • Assess test coverage needs
  • Provide comprehensive feedback

Overall Assessment

This is a well-implemented feature that adds valuable functionality to automatically expand GitHub blob URLs with line numbers into formatted code blocks. The implementation is clean and follows good JavaScript practices.

Code Quality & Best Practices ✅

Strengths:

  • Clean, readable code with good separation of concerns
  • Consistent naming conventions and code style
  • Good error handling with user-friendly messages
  • Proper async/await usage throughout
  • Comprehensive file extension mapping (github-issue-to-markdown.html:277-341)
  • Efficient parallel fetching of multiple code snippets (github-issue-to-markdown.html:379-389)

Minor Improvements:

  • Consider extracting the large langMap object to a separate constant for better maintainability
  • The regex pattern in parseGitHubBlobUrl (github-issue-to-markdown.html:258-260) could benefit from a comment explaining the capture groups

Potential Bugs & Issues ⚠️

  1. Line number edge case: In fetchCodeSnippet (github-issue-to-markdown.html:359-360), if startLine is greater than the file's total lines, this could cause issues. Consider adding bounds checking.

  2. Regex replacement logic: The pattern in expandCodeUrls (github-issue-to-markdown.html:398) uses a negative lookahead to avoid double-expansion, but it might miss some edge cases where there's whitespace between the URL and existing code blocks.

  3. Network error handling: While basic error handling exists, network timeouts or partial failures could leave some URLs unexpanded without clear indication to the user.

Performance Considerations ⚡

Good practices:

  • Parallel fetching of multiple snippets prevents sequential blocking
  • Deduplication of URLs using Set (github-issue-to-markdown.html:373)
  • Efficient string operations with proper regex usage

Potential optimizations:

  • For very large files, fetching the entire content and slicing might be inefficient. Consider using GitHub's API with byte ranges if available
  • The regex replacement loop (github-issue-to-markdown.html:393-401) processes each URL individually - could be optimized for many URLs

Security Concerns 🔒

Well-handled:

  • Proper input validation in parseGitHubBlobUrl with strict regex matching
  • Uses raw.githubusercontent.com which is safe for content fetching
  • GitHub token handling follows good practices with localStorage

Areas of attention:

  • XSS Prevention: The code directly inserts fetched content into markdown without sanitization. While this is likely safe since it's wrapped in code blocks, consider if any edge cases with malicious content in repositories could be problematic
  • SSRF consideration: The feature fetches from raw.githubusercontent.com, which is safe, but the URL parsing should remain strict to prevent redirection attacks

Test Coverage Assessment 🧪

Missing test coverage areas:

  1. URL parsing edge cases: Test malformed GitHub URLs, different repository formats
  2. Line number boundary conditions: Test files shorter than requested line ranges
  3. Network failure scenarios: Test behavior when raw.githubusercontent.com is unavailable
  4. Large file handling: Test performance with very large files or many URLs
  5. Language detection accuracy: Test the extensive language mapping
  6. Regex replacement logic: Test edge cases around existing code blocks

Recommended test cases:

// URL parsing tests
parseGitHubBlobUrl('invalid-url') // should return null
parseGitHubBlobUrl('https://github.com/owner/repo/blob/main/file.py#L999-L1000') // bounds test

// Content fetching tests
fetchCodeSnippet(/* non-existent file */) // should return null
fetchCodeSnippet(/* line numbers beyond file length */) // edge case

// Language detection tests
getLanguageFromPath('file.unknown') // should return empty string
getLanguageFromPath('Dockerfile') // should handle special cases

Recommendations 📋

  1. Add input validation: Validate line number ranges before making API calls
  2. Improve error messaging: Distinguish between "file not found" vs "network error" vs "line range out of bounds"
  3. Consider rate limiting: For issues with many URLs, implement some form of rate limiting to avoid hitting GitHub's API limits
  4. Add loading indicators: For individual URL expansions, consider showing which URLs are being processed
  5. Unit tests: Add comprehensive test coverage for the new functions

Conclusion

This is a solid implementation that adds significant value to the tool. The code is well-structured and handles most edge cases appropriately. The main areas for improvement are around edge case handling, test coverage, and minor security considerations. The feature works as intended and follows good coding practices.


@simonw simonw merged commit 18c5ab3 into main Dec 5, 2025
4 checks passed
@simonw simonw deleted the claude/github-url-code-snippets-013bWK7HU7o9T8N7xhQMZEi8 branch December 5, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants