Skip to content

kreuzberg-dev/html-to-markdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,582 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html-to-markdown

Banner

High-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly with identical rendering across all runtimes.

Documentation | Live Demo | API Reference

Highlights

  • 150-280 MB/s throughput (10-80x faster than pure Python alternatives)
  • 12 language bindings with consistent output across all runtimes
  • Metadata extraction — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)
  • Visitor pattern — custom callbacks for content filtering, URL rewriting, domain-specific dialects
  • Table extraction — extract structured table data (cells, headers, rendered markdown) during conversion
  • Secure by default — built-in HTML sanitization via ammonia

Quick Start

# Rust
cargo add html-to-markdown-rs

# Python
pip install html-to-markdown

# TypeScript / Node.js
npm install @kreuzberg/html-to-markdown-node

# Ruby
gem install html-to-markdown

# CLI
cargo install html-to-markdown-cli
# or
brew install kreuzberg-dev/tap/html-to-markdown

See the Installation Guide for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.

Part of the Kreuzberg Ecosystem

html-to-markdown is developed by kreuzberg.dev and powers the HTML conversion pipeline in Kreuzberg, a document intelligence library for extracting text from PDFs, images, and office documents.

Contributing

Contributions welcome! See CONTRIBUTING.md for setup instructions and guidelines.

License

MIT License — see LICENSE for details.

About

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors