fix: demote misleading LibreOffice 'not found' warning to debug (closes #230)#237
Merged
LarFii merged 2 commits intoHKUDS:mainfrom Apr 7, 2026
Merged
Conversation
HKUDS#159) Reasoning models (DeepSeek-R1, Qwen2.5-think, etc.) wrap their chain-of-thought in <think>…</think> blocks before emitting the final answer. When _robust_json_parse fails to extract a valid JSON object from the response, the four modal-processor parse methods (_parse_response, _parse_table_response, _parse_equation_response, _parse_generic_response) were returning the **raw** LLM response as the fallback caption and summary. This caused internal model reasoning to be stored in the knowledge graph instead of the actual content description. Fix: add a static helper `BaseModalProcessor._strip_thinking_tags` that removes <think>/<thinking> blocks (case-insensitive, multiline) and apply it in every fallback branch so only the final-answer text is stored or returned. The helper is tested in tests/test_strip_thinking_tags.py with 13 unit tests covering: tag variants, multiline blocks, multiple blocks, case-insensitivity, and the full fallback path for all four processor classes.
HKUDS#230) On systems where only 'soffice' is on PATH (common on macOS), the existing fallback loop logged a WARNING for the 'libreoffice' candidate before successfully converting via 'soffice'. This caused users to see: WARNING: LibreOffice command 'libreoffice' not found INFO: Successfully converted file.pptx to PDF using soffice …and conclude that something was broken, even though the conversion succeeded. Fix: log FileNotFoundError at DEBUG level for any non-final candidate so that routine 'libreoffice' → 'soffice' fallback stays silent in normal logs. The WARNING is preserved only when the last candidate in the list is not found (meaning no usable LibreOffice binary exists at all and the conversion is about to fail).
Collaborator
|
Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On systems where only
sofficeis onPATH(the typical case on macOS afterbrew install --cask libreoffice), the converter loop logged a WARNINGfor the
libreofficecandidate before immediately succeeding via thesofficefallback. Users saw:This made the conversion look broken (#230) even though it completed
successfully. Users were confused and opened support questions.
Root Cause
FileNotFoundErrorin the candidate loop was always logged atWARNINGlevel, regardless of whether more candidates remained to try.
Fix
Introduce an
is_lastflag and only emit aWARNINGwhen theFileNotFoundErroris raised for the final candidate (i.e. all optionsare exhausted). For intermediate candidates the message is demoted to
DEBUG, keeping normal logs clean while still allowing the full trace toappear under
--debug.Behaviour After This Change
libreofficemissing,sofficeworkslibreofficeworksChecklist
ruff check+ruff formatpass