Skip to content

[FEATURE][PLUGIN]: Create HTML to Markdown plugin #997

@crivetimihai

Description

@crivetimihai

Overview

Create an HTML to Markdown Plugin that converts HTML resource content to clean Markdown format for improved readability and processing.

Plugin Requirements

Plugin Details

  • Name: HtmlToMarkdownPlugin
  • Type: Self-contained (native) plugin
  • File Location: plugins/html_to_markdown/
  • Complexity: Low-Medium

Functionality

  • Convert HTML content to clean Markdown format
  • Preserve semantic structure and formatting
  • Handle tables, lists, links, and code blocks
  • Configurable conversion options
  • Support for custom HTML elements

Hook Integration

  • Primary Hooks: resource_post_fetch
  • Purpose: Transform HTML resources into Markdown for better AI processing
  • Behavior: Convert HTML content to Markdown after resource fetch

Configuration Schema

plugins:
  - name: "HtmlToMarkdown"
    kind: "plugins.html_to_markdown.converter.HtmlToMarkdownPlugin"
    description: "Convert HTML resource content to Markdown"
    version: "0.1.0"
    hooks: ["resource_post_fetch"]
    mode: "permissive"
    priority: 5
    conditions:
      - mime_types: ["text/html", "application/xhtml+xml"]
    config:
      # Conversion settings
      conversion:
        preserve_whitespace: false
        convert_links: true
        convert_images: true
        convert_tables: true
        convert_lists: true
        convert_code_blocks: true
        strip_comments: true
        strip_scripts: true
        strip_styles: true
      
      # Element handling
      element_mapping:
        h1: "# "
        h2: "## "
        h3: "### "
        h4: "#### "
        h5: "##### "
        h6: "###### "
        strong: "**"
        em: "*"
        code: "`"
        blockquote: "> "
      
      # Custom element handlers
      custom_elements:
        - tag: "div"
          class: "code-block"
          convert_to: "```"
        - tag: "span"
          class: "highlight"
          convert_to: "=="
      
      # Link handling
      link_processing:
        convert_relative_urls: true
        base_url: ""
        preserve_anchors: true
        convert_mailto: true
      
      # Table conversion
      table_options:
        include_headers: true
        align_columns: true
        max_column_width: 50
        handle_colspan: true
        handle_rowspan: false
      
      # Output formatting
      output_format:
        line_breaks: "lf"
        max_line_length: 80
        indent_code_blocks: true
        normalize_whitespace: true

Acceptance Criteria

  • Plugin implements HtmlToMarkdownPlugin class
  • Converts HTML to clean Markdown format
  • Preserves semantic structure and formatting
  • Handles tables, lists, links, and code blocks
  • Configurable conversion options
  • Custom element mapping support
  • Plugin manifest and documentation created
  • Unit tests with >85% coverage
  • Integration tests with real HTML content

Priority

Medium - Content processing feature

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions