Add GitHub blob URL code snippet expansion by simonw · Pull Request #142 · simonw/tools

simonw · 2025-12-05T22:34:29Z

When converting GitHub issues to markdown, detect blob URLs with line
numbers (e.g., #L10-L20) and automatically fetch the referenced code
from raw.githubusercontent.com. The code is inserted as a fenced code
block with appropriate language syntax highlighting based on the file
extension.

Update GitHub-issue-to-markdown such that when it notices a URL like this one:

https://github.com/datasette/datasette-public/blob/5213c41521821c03688c6099581e198a831f85d5/tests/test_public.py#L293-L306

It attempts to fetch the specified line of code from the appropriate raw.githubusercontent.com URL and add that to the document in a fenced code block using a tag derived from the file extension (or not tag) directly below that URL

When converting GitHub issues to markdown, detect blob URLs with line numbers (e.g., #L10-L20) and automatically fetch the referenced code from raw.githubusercontent.com. The code is inserted as a fenced code block with appropriate language syntax highlighting based on the file extension.

cloudflare-workers-and-pages · 2025-12-05T22:34:41Z

Deploying tools with Cloudflare Pages

Latest commit:	`2a87e56`
Status:	✅ Deploy successful!
Preview URL:	https://c8e89aed.tools-b1q.pages.dev
Branch Preview URL:	https://claude-github-url-code-snipp.tools-b1q.pages.dev

View logs

claude · 2025-12-05T22:34:42Z

Claude finished @simonw's task —— View job

Code Review Complete ✅

I've thoroughly reviewed the GitHub blob URL code snippet expansion feature. Here's my comprehensive feedback:

Todo List:

Read and analyze the changed file
Review code quality and best practices
Check for potential bugs and issues
Evaluate performance considerations
Identify security concerns
Assess test coverage needs
Provide comprehensive feedback

Overall Assessment

This is a well-implemented feature that adds valuable functionality to automatically expand GitHub blob URLs with line numbers into formatted code blocks. The implementation is clean and follows good JavaScript practices.

Code Quality & Best Practices ✅

Strengths:

Clean, readable code with good separation of concerns
Consistent naming conventions and code style
Good error handling with user-friendly messages
Proper async/await usage throughout
Comprehensive file extension mapping (github-issue-to-markdown.html:277-341)
Efficient parallel fetching of multiple code snippets (github-issue-to-markdown.html:379-389)

Minor Improvements:

Consider extracting the large langMap object to a separate constant for better maintainability
The regex pattern in parseGitHubBlobUrl (github-issue-to-markdown.html:258-260) could benefit from a comment explaining the capture groups

Potential Bugs & Issues ⚠️

Line number edge case: In fetchCodeSnippet (github-issue-to-markdown.html:359-360), if startLine is greater than the file's total lines, this could cause issues. Consider adding bounds checking.
Regex replacement logic: The pattern in expandCodeUrls (github-issue-to-markdown.html:398) uses a negative lookahead to avoid double-expansion, but it might miss some edge cases where there's whitespace between the URL and existing code blocks.
Network error handling: While basic error handling exists, network timeouts or partial failures could leave some URLs unexpanded without clear indication to the user.

Performance Considerations ⚡

Good practices:

Parallel fetching of multiple snippets prevents sequential blocking
Deduplication of URLs using Set (github-issue-to-markdown.html:373)
Efficient string operations with proper regex usage

Potential optimizations:

For very large files, fetching the entire content and slicing might be inefficient. Consider using GitHub's API with byte ranges if available
The regex replacement loop (github-issue-to-markdown.html:393-401) processes each URL individually - could be optimized for many URLs

Security Concerns 🔒

Well-handled:

Proper input validation in parseGitHubBlobUrl with strict regex matching
Uses raw.githubusercontent.com which is safe for content fetching
GitHub token handling follows good practices with localStorage

Areas of attention:

XSS Prevention: The code directly inserts fetched content into markdown without sanitization. While this is likely safe since it's wrapped in code blocks, consider if any edge cases with malicious content in repositories could be problematic
SSRF consideration: The feature fetches from raw.githubusercontent.com, which is safe, but the URL parsing should remain strict to prevent redirection attacks

Test Coverage Assessment 🧪

Missing test coverage areas:

URL parsing edge cases: Test malformed GitHub URLs, different repository formats
Line number boundary conditions: Test files shorter than requested line ranges
Network failure scenarios: Test behavior when raw.githubusercontent.com is unavailable
Large file handling: Test performance with very large files or many URLs
Language detection accuracy: Test the extensive language mapping
Regex replacement logic: Test edge cases around existing code blocks

Recommended test cases:

// URL parsing tests
parseGitHubBlobUrl('invalid-url') // should return null
parseGitHubBlobUrl('https://github.com/owner/repo/blob/main/file.py#L999-L1000') // bounds test

// Content fetching tests
fetchCodeSnippet(/* non-existent file */) // should return null
fetchCodeSnippet(/* line numbers beyond file length */) // edge case

// Language detection tests
getLanguageFromPath('file.unknown') // should return empty string
getLanguageFromPath('Dockerfile') // should handle special cases

Recommendations 📋

Add input validation: Validate line number ranges before making API calls
Improve error messaging: Distinguish between "file not found" vs "network error" vs "line range out of bounds"
Consider rate limiting: For issues with many URLs, implement some form of rate limiting to avoid hitting GitHub's API limits
Add loading indicators: For individual URL expansions, consider showing which URLs are being processed
Unit tests: Add comprehensive test coverage for the new functions

Conclusion

This is a solid implementation that adds significant value to the tool. The code is well-structured and handles most edge cases appropriately. The main areas for improvement are around edge case handling, test coverage, and minor security considerations. The feature works as intended and follows good coding practices.

simonw merged commit 18c5ab3 into main Dec 5, 2025
4 checks passed

simonw deleted the claude/github-url-code-snippets-013bWK7HU7o9T8N7xhQMZEi8 branch December 5, 2025 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GitHub blob URL code snippet expansion#142

Add GitHub blob URL code snippet expansion#142
simonw merged 1 commit intomainfrom
claude/github-url-code-snippets-013bWK7HU7o9T8N7xhQMZEi8

simonw commented Dec 5, 2025 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Dec 5, 2025

Uh oh!

claude bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

simonw commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Dec 5, 2025

Deploying tools with Cloudflare Pages

Uh oh!

claude bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✅

Overall Assessment

Code Quality & Best Practices ✅

Potential Bugs & Issues ⚠️

Performance Considerations ⚡

Security Concerns 🔒

Test Coverage Assessment 🧪

Recommendations 📋

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonw commented Dec 5, 2025 •

edited

Loading

claude bot commented Dec 5, 2025 •

edited

Loading