Remarkpy is a Python library that provides an interface to JavaScript Markdown processing, allowing you to parse Markdown text into an Abstract Syntax Tree (AST).
It leverages the power of existing JavaScript Markdown libraries (md-mdast) by running JavaScript code within your Python environment using the quickjs engine. This allows Python developers to easily tap into the rich ecosystem of JavaScript-based Markdown tooling.
Remarkpy takes a string of Markdown text as input and returns a detailed, structured representation of that Markdown, known as an Abstract Syntax Tree (AST). This AST can then be used for various purposes, such as:
- Analyzing Markdown content.
- Transforming Markdown into other formats (though direct HTML conversion is not yet implemented in this version).
- Building tools that work with Markdown structure.
Currently, Remarkpy focuses on the parsing aspect, utilizing the md-mdast JavaScript library to generate the AST.
- Python Developers: If you're working on a Python project and need to understand the structure of Markdown documents, Remarkpy provides a straightforward way to do so.
- Tool Builders: Anyone building tools that need to programmatically interact with Markdown content at a structural level.
- Content Analysts: Researchers or developers who need to analyze patterns or extract information from Markdown files.
- Leverage JavaScript Ecosystem: Access robust and well-maintained JavaScript Markdown libraries from within Python.
- Detailed Structural Information: Get more than just HTML; obtain a full AST for fine-grained control and analysis.
- Cross-Language Consistency: Potentially use the same Markdown parsing logic as a JavaScript-based frontend or other tools.
Remarkpy is not yet available on PyPI. To install it, you would typically clone the repository and integrate it into your project, or install it locally using pip once a setup.py or pyproject.toml is provided.
For now, ensure you have Python 3 and the quickjs library installed:
pip install quickjsYou will also need the remarkpy.js file (the bundled JavaScript code) to be accessible by the remarkpy/core.py module, which it is by default if you are running from the cloned repository.
The primary way to use Remarkpy is through its RemarkpyParser class.
from remarkpy.core import RemarkpyParser, RemarkpyError, JavaScriptError
import json
# Initialize the parser
try:
parser = RemarkpyParser()
except RemarkpyError as e:
print(f"Failed to initialize parser: {e}")
exit()
# Markdown text to parse
markdown_input = """
# My Document
This is a paragraph with **bold** and *italic* text.
- Item 1
- Item 2
"""
try:
# Parse the Markdown
ast = parser.parse(markdown_input)
# Print the resulting AST (as a JSON string for readability)
print("Markdown AST:")
print(json.dumps(ast, indent=2))
except JavaScriptError as e:
print(f"An error occurred in the JavaScript code: {e}")
except RemarkpyError as e:
print(f"A Remarkpy error occurred: {e}")
except TypeError as e:
print(f"Type error: {e}")Remarkpy does not have a dedicated CLI tool in this version. However, the remarkpy/core.py script can be run directly to parse a predefined Markdown string or a test.md file if available in the expected location, which demonstrates its basic functionality:
python remarkpy/core.pyThis will print the AST of the example Markdown snippets found within core.py.
Remarkpy acts as a bridge between Python and JavaScript for Markdown parsing:
-
Python Core (
remarkpy/core.py):- The
RemarkpyParserclass is the main entry point. - Upon initialization, it reads the bundled JavaScript file (
remarkpy/remarkpy.js). - It uses the
quickjslibrary to create a JavaScript runtime environment and load theparseMdfunction fromremarkpy.js. - The
parse(markdown_text)method takes a Markdown string, passes it to the JavaScriptparseMdfunction, and receives the resulting AST (as a Python dictionary). - It includes custom error classes (
RemarkpyError,JavaScriptError) for better error handling.
- The
-
JavaScript Bridge (
remarkpy/remarkpy.js):- This is a browserified JavaScript bundle generated by Webpack (see
remarkpyjs/webpack.config.js). It contains all necessary JavaScript code to run independently. - The primary source for this bundle is
remarkpyjs/index.js.
- This is a browserified JavaScript bundle generated by Webpack (see
-
JavaScript Logic (
remarkpyjs/index.js):- This file defines the
parseMd(md)function. - It uses the
md-mdastlibrary (a Markdown parser that generates an AST conforming to the Unist specification). md-mdast'screate().tokenizeBlock(md)method is called to perform the actual parsing of the Markdown string into an AST.
- This file defines the
-
QuickJS Integration:
- The
quickjsPython library allows embedding the QuickJS JavaScript engine. - It enables calling JavaScript functions from Python and transparently converts data types between the two languages (e.g., JavaScript objects to Python dictionaries).
- The
- Licensing: The project is licensed under the terms specified in the
LICENSEfile. Please review it before contributing. - Coding Style:
- Python: Follow PEP 8 guidelines for Python code. Use a linter like Flake8 to check your code.
- JavaScript: Follow standard JavaScript best practices. Consider using a linter like ESLint if not already configured.
- Testing:
- Basic tests and examples are included in
remarkpy/core.py(within theif __name__ == '__main__':block) andremarkpy/test.py. remarkpy/test.mdprovides a sample Markdown file for testing.- The JavaScript part (
md-mdast) has its own comprehensive tests. - Before submitting contributions, ensure your changes work correctly and do not break existing functionality. Consider adding new test cases for new features.
- Basic tests and examples are included in
- JavaScript Bundling:
- The JavaScript code in
remarkpyjs/is bundled intoremarkpy/remarkpy.jsusing Webpack. The configuration for this is inremarkpyjs/webpack.config.js. - If you make changes to the JavaScript source files in
remarkpyjs/, you will need to rebuild the bundle. Typically, this involves running a command likenpm run buildornpx webpackfrom within theremarkpyjs/directory (assuming Node.js and npm/npx are installed and apackage.jsonscript is set up). The currentpackage.json(inremarkpyjs/) definesbuildandbuild-devscripts.cd remarkpyjs npm install # if you haven't already npm run build # or npm run build-dev for development cd ..
- The JavaScript code in
- Commit Messages: Write clear and descriptive commit messages.
- Changelog: For significant changes, update
CHANGELOG.md. - TODOs: Check
TODO.mdfor planned features or improvements if you are looking for ways to contribute. - Pull Requests: Submit changes via pull requests. Ensure your code is well-tested and adheres to the project's coding style.
This README aims to provide a comprehensive overview. For further details, please refer to the source code and the documentation of the libraries used (QuickJS, md-mdast).