-
Notifications
You must be signed in to change notification settings - Fork 613
[FEATURE][PLUGIN]: Create HTML to Markdown plugin #997
Copy link
Copy link
Labels
Milestone
Description
Overview
Create an HTML to Markdown Plugin that converts HTML resource content to clean Markdown format for improved readability and processing.
Plugin Requirements
Plugin Details
- Name: HtmlToMarkdownPlugin
- Type: Self-contained (native) plugin
- File Location:
plugins/html_to_markdown/ - Complexity: Low-Medium
Functionality
- Convert HTML content to clean Markdown format
- Preserve semantic structure and formatting
- Handle tables, lists, links, and code blocks
- Configurable conversion options
- Support for custom HTML elements
Hook Integration
- Primary Hooks:
resource_post_fetch - Purpose: Transform HTML resources into Markdown for better AI processing
- Behavior: Convert HTML content to Markdown after resource fetch
Configuration Schema
plugins:
- name: "HtmlToMarkdown"
kind: "plugins.html_to_markdown.converter.HtmlToMarkdownPlugin"
description: "Convert HTML resource content to Markdown"
version: "0.1.0"
hooks: ["resource_post_fetch"]
mode: "permissive"
priority: 5
conditions:
- mime_types: ["text/html", "application/xhtml+xml"]
config:
# Conversion settings
conversion:
preserve_whitespace: false
convert_links: true
convert_images: true
convert_tables: true
convert_lists: true
convert_code_blocks: true
strip_comments: true
strip_scripts: true
strip_styles: true
# Element handling
element_mapping:
h1: "# "
h2: "## "
h3: "### "
h4: "#### "
h5: "##### "
h6: "###### "
strong: "**"
em: "*"
code: "`"
blockquote: "> "
# Custom element handlers
custom_elements:
- tag: "div"
class: "code-block"
convert_to: "```"
- tag: "span"
class: "highlight"
convert_to: "=="
# Link handling
link_processing:
convert_relative_urls: true
base_url: ""
preserve_anchors: true
convert_mailto: true
# Table conversion
table_options:
include_headers: true
align_columns: true
max_column_width: 50
handle_colspan: true
handle_rowspan: false
# Output formatting
output_format:
line_breaks: "lf"
max_line_length: 80
indent_code_blocks: true
normalize_whitespace: trueAcceptance Criteria
- Plugin implements HtmlToMarkdownPlugin class
- Converts HTML to clean Markdown format
- Preserves semantic structure and formatting
- Handles tables, lists, links, and code blocks
- Configurable conversion options
- Custom element mapping support
- Plugin manifest and documentation created
- Unit tests with >85% coverage
- Integration tests with real HTML content
Priority
Medium - Content processing feature
Reactions are currently unavailable