Skip to content

OCR PDF using PDF.js and tesseract.js #1

@simonw

Description

@simonw

Drop a PDF onto a web page and have it converted into JPEG images (using PDF.js) and then OCRd (using tesseract.js).

Combination of https://github.com/simonw/til/blob/main/templates/pages/tools/annotated-presentations.html and https://github.com/datasette/datasette-extract/blob/main/datasette_extract/templates/_extract_drop_handler.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions