Skip to content

Ocrmypdf fails due to Tesseract failed to report available languages #2504

@pschichtel

Description

@pschichtel

I'm on version 0.41.0 and I just noticed that I can't select text in my imported PDF (a scanned document).

Looking at the job log I found this:

Tue, February 20th, 2024, 21:03: Running external command: ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17124542125895878539/infile /tmp/docspell-convert/docspell-ocrmypdf17124542125895878539/out.pdf
Tue, February 20th, 2024, 21:03: Command `ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17124542125895878539/infile /tmp/docspell-convert/docspell-ocrmypdf17124542125895878539/out.pdf` finished: 3
Tue, February 20th, 2024, 21:03: ocrmypdf stdout:
Tue, February 20th, 2024, 21:03: ocrmypdf stderr: Tesseract failed to report available languages. Output from Tesseract: ----------- [DS] Profile file not available (tesseract_opencl_profile_devices.dat); performing profiling. [DS] Device: "(null)" (Native) evaluation... Error in pixCloseBrick: pixs not 1 bpp Error in pixOpenBrick: pixs not defined Error in pixSubtract: pixs1 not defined Error in pixOpenBrick: pixs not defined Error in pixOpenBrick: pixs not defined [DS] Device: "(null)" (Native) evaluated [DS] composeRGBPixel: 0.017794 (w=1.2) [DS] HistogramRect: 0.015793 (w=2.4) [DS] ThresholdRectToPix: 0.025850 (w=4.5) [DS] getLineMasksMorph: 0.000040 (w=5.0) [DS] Score: 0.175782 [DS] Scores written to file (tesseract_opencl_profile_devices.dat). [DS] Device[1] 0:(null) score is 0.175782 [DS] Selected Device[1]: "(null)" (Native) List of available languages in "/usr/share/tessdata/" (23): ces dan deu est fin fra heb ita jpn jpn_vert khm lav lit nld nor pol por ron rus slk spa swe ukr
Tue, February 20th, 2024, 21:03: PDF conversion failed: Command result=3. No output file found.. Go without PDF file
Tue, February 20th, 2024, 21:03: Closing process: `ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17124542125895878539/infile /tmp/docspell-convert/docspell-ocrmypdf17124542125895878539/out.pdf`

I don't think I ever saw this error when importing my ~1000 documents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dockerAll things regarding docker setup

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions