OpenDataLoader LogoOpenDataLoader

Roadmap

Upcoming features and development priorities

Coming Soon

Q3 2026

FeatureDescriptionStatus
Structure ValidationVerify and repair PDF tag treesPlanned
TOC ExtractionAuto-detect document navigation structurePlanned

Recently Shipped

FeatureDescriptionVersionDate
Auto-Tagging EngineGenerate accessible Tagged PDFs from untagged PDFs (--format tagged-pdf)Latest2026-Q2
Apache 2.0 LicenseLicense migration from MPL-2.0 to Apache-2.0v2.0.02026-03-11
Header/Footer Control--include-header-footer option for output generationv1.10.02026-02-04
Equation & Figure AILaTeX formula extraction and AI chart/image description via hybrid modev1.8.02026-01-13
Hybrid Mode Options--hybrid-mode full for formula/picture enrichments, --hybrid-ocrv1.8.02026-01-13
OCR for Scanned PDFsExtract text from image-based PDFs via hybrid modev1.6.02026-01-05
Table AIML-assisted detection for borderless and merged-cell tables via hybrid modev1.6.02026-01-05
XY-Cut++ Reading OrderImproved multi-column layout detectionv1.4.02025-12-19
Base64 Image EmbeddingEmbed images directly in JSON/HTML/Markdown outputv1.4.02025-12-19
Tagged PDF SupportNative structure tag extractionv1.3.02025-11-21
Benchmarks & DatasetsTransparent evaluations using open datasets and standardized metricsv1.3.02025-11-21
AI Safety FiltersAuto-filter hidden text and prompt injection contentv1.0.02025-09-16

Feature Requests

Have a feature request? Open an issue on GitHub.

On this page