Skip to content

Add to_text() and to_markdown() export features to JustHTML playground#162

Merged
simonw merged 1 commit intomainfrom
claude/add-export-methods-M1XJB
Dec 14, 2025
Merged

Add to_text() and to_markdown() export features to JustHTML playground#162
simonw merged 1 commit intomainfrom
claude/add-export-methods-M1XJB

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Dec 14, 2025

  • Add two new playground modes: "Extract Text" and "To Markdown"
  • Both modes support optional CSS selector to target specific elements
  • Leave selector empty to convert the whole document
  • Uses justhtml's built-in to_text() and to_markdown() methods
  • Shows match count when using selectors, "Whole document" otherwise

Add to_text() and to_markdown() features to justhtml.html - for the whole document or for CSS selectors against it

Consult a fresh clone of the justhtml Python library (in /tmp) if you need to

- Add two new playground modes: "Extract Text" and "To Markdown"
- Both modes support optional CSS selector to target specific elements
- Leave selector empty to convert the whole document
- Uses justhtml's built-in to_text() and to_markdown() methods
- Shows match count when using selectors, "Whole document" otherwise
@claude
Copy link
Contributor

claude bot commented Dec 14, 2025

Claude finished @simonw's task —— View job

Code Review Complete ✅

I've completed a thorough review of the pull request that adds text extraction and markdown conversion features to the JustHTML playground. Here are my findings:

Todo List:

  • Read and analyze the modified justhtml.html file
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Assess security concerns
  • Review test coverage
  • Provide comprehensive feedback

Code Quality & Best Practices ✅

Strengths:

  • Consistent Architecture: New features follow existing patterns perfectly - same CSS classes, naming conventions, and JavaScript structure
  • Clean Implementation: Functions extractText() and convertToMarkdown() mirror existing functions with proper error handling
  • UI Consistency: New mode buttons integrate seamlessly with existing UI design
  • Code Organization: Python helper functions are logically organized and well-documented

Minor Suggestions:

  • Consider extracting the common string escaping logic (html.replace(/\\/g, '\\\\').replace(/"""/g, '\\"\\"\\"')) into a helper function since it's used in multiple places

Potential Bugs & Issues ✅

No Critical Issues Found

The implementation looks solid with proper:

  • Input validation (empty selector handling)
  • Error handling in try-catch blocks
  • Proper escaping of user input before Python execution
  • Consistent match count display logic

Performance Considerations ✅

Well Optimized:

  • Efficient DOM queries using existing cached elements (textSelectorInputEl, markdownSelectorInputEl)
  • Minimal DOM manipulation
  • Reuses existing Pyodide instance
  • Proper event listener setup without memory leaks

Security Assessment ✅

No Security Concerns:

  • Input Sanitization: User input is properly escaped before being passed to Python (justhtml.html:858, 902)
  • No XSS Risks: Text content is safely inserted using textContent, not innerHTML
  • CORS Handling: Existing URL fetching already includes appropriate CORS messaging
  • CSP Safe: No inline event handlers or unsafe evaluations

Test Coverage Assessment ⚠️

Missing Test Coverage:

  • No automated tests found for the new functionality
  • Recommendation: Consider adding tests for:
    • Text extraction with/without selectors
    • Markdown conversion with/without selectors
    • Error handling for invalid selectors
    • Match count display logic

Additional Observations

Excellent User Experience:

  • Clear placeholder text explaining optional selector usage
  • Intuitive "Whole document" vs match count display
  • Consistent keyboard shortcuts (Enter to execute)
  • Proper error messaging

Code References:

  • New mode buttons: justhtml.html:433-434
  • Text extraction: justhtml.html:645-669
  • Markdown conversion: justhtml.html:671-695
  • Event handlers: justhtml.html:980-981, 988-993

Summary

This is a high-quality implementation that adds valuable functionality while maintaining code consistency and security best practices. The only suggestion is to add automated tests, but the code itself is production-ready.

Recommendation: ✅ Approve with minor suggestion for future test additions

@simonw simonw merged commit 172cd5a into main Dec 14, 2025
4 checks passed
@simonw simonw deleted the claude/add-export-methods-M1XJB branch December 14, 2025 08:53
@simonw
Copy link
Owner Author

simonw commented Dec 14, 2025

Now live at https://tools.simonwillison.net/justhtml

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants