html-to-markdown¶
High-performance HTML to Markdown conversion powered by Rust
Convert HTML to clean, readable Markdown at 150--280 MB/s. A single Rust core with native bindings for 12 language ecosystems, delivering identical output across every runtime.
Key Features¶
| Feature | Description | |
|---|---|---|
| Blazing Fast | 150--280 MB/s throughput, 10--80x faster than pure Python alternatives | |
| Polyglot | 12 native bindings -- Rust, Python, TypeScript, Ruby, PHP, Go, Java, C#, Elixir, R, C, WASM | |
| Smart Conversion | Nested tables, code blocks, task lists, hOCR, and complex HTML structures | |
| Metadata Extraction v2.13.0 | Title, description, headers, links, images, Open Graph, JSON-LD, Microdata | |
| Visitor Pattern v2.23.0 | Custom callbacks for content filtering, URL rewriting, and domain-specific dialects | |
| Secure by Default | Built-in HTML sanitization powered by ammonia prevents malicious content |
Quick Install¶
Quick Example¶
Live Demo¶
Try html-to-markdown directly in your browser -- no installation required. The demo runs entirely client-side using the WebAssembly build.
Part of the Kreuzberg Ecosystem¶
html-to-markdown powers the HTML conversion pipeline in kreuzberg, a document intelligence library for extracting text and structured data from any document format. If you need to process PDFs, DOCX, images, or other document types, check out kreuzberg -- it uses html-to-markdown internally for all HTML-to-Markdown conversion.
Explore the Docs¶
- Installation -- Package manager commands for all 12 language bindings
- Quick Start -- Get converting in under a minute
- Features -- Detailed overview of capabilities
- Configuration -- Control heading styles, code fences, list formatting, and more
- Visitor Pattern -- Custom callbacks for advanced conversion control
- Metadata Extraction -- Extract structured document metadata alongside conversion
- API Reference -- Language-specific API documentation
- Contributing -- Development setup and contribution guidelines