Pre-compile Regex Patterns for Markdown Stripping#4714
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors markdown-stripping logic used by TTS and SMS paths to precompile regex patterns at module import time, avoiding repeated regex compilation on each call.
Changes:
- Precompiled markdown-stripping regex patterns into module-level tuples and applied them in loops (CLI voice TTS, TTS tool, SMS sending).
- Minor formatting/line-wrapping changes around imports, subprocess calls, and JSON responses.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/tts_tool.py | Precompiles markdown stripping regexes for streaming TTS and applies them via a loop. |
| tools/send_message_tool.py | Precompiles markdown stripping regexes for SMS sending and refactors formatting; adjusts media extension constants. |
| cli.py | Precompiles markdown stripping regexes for voice-mode TTS and applies them via a loop; adds re import. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".webp", ".gif"} | ||
| _VIDEO_EXTS = {".mp4", ".mov", ".avi", ".mkv", ".3gp"} | ||
| _AUDIO_EXTS = {".ogg", ".opus", ".mp3", ".wav", ".m4a"} | ||
| _AUDIO_EXTS = {".ogg", ".opus", ".mp3", "wav", "m4a"} |
There was a problem hiding this comment.
_AUDIO_EXTS now contains "wav" and "m4a" without leading dots, but ext values come from os.path.splitext(...)[1] (e.g., ".wav"). This will prevent WAV/M4A media from being detected as audio. Update the set to use the dotted extensions (".wav", ".m4a").
| _AUDIO_EXTS = {".ogg", ".opus", ".mp3", "wav", "m4a"} | |
| _AUDIO_EXTS = {".ogg", ".opus", ".mp3", ".wav", ".m4a"} |
| _MD_STRIP_PATTERNS = ( | ||
| (re.compile(r"\*\*(.+?)\*\*"), r"\1"), | ||
| (re.compile(r"\*(.+?)\*"), r"\1"), | ||
| (re.compile(r"__(.+?)__"), r"\1"), | ||
| (re.compile(r"_(.+?)_"), r"\1"), |
There was a problem hiding this comment.
The new precompiled markdown patterns for SMS stripping no longer use DOTALL for bold/italic/underscore patterns. Previously these substitutions used flags=re.DOTALL, so formatting that spans newlines (e.g., "a\nb") would be stripped; now it will not. If behavior is intended to remain unchanged, compile the relevant patterns with re.DOTALL (or adjust them to match across newlines, e.g., using [\s\S]).
|
Thanks for the PR @NotUnHackable! We appreciate the thought behind pre-compiling the regex patterns. After review, we're going to pass on this one for a couple of reasons:
If you'd like to contribute in the future, keeping PRs focused on a single change with minimal formatting noise makes review much smoother. Thanks again for taking the time! |
…poisoning (v2, rebased) Malformed tool calls (empty function name, non-JSON arguments) can poison a session's history and cause repeated HTTP 400 errors on every subsequent turn when the broken history is replayed to strict providers. Adds `_is_valid_tool_call()` static method called in the tool-call persistence loop. Invalid tool calls are skipped with a warning log and never written to session history. Fixes NousResearch#4714
…poisoning (v2, rebased) Malformed tool calls (empty function name, non-JSON arguments) can poison a session's history and cause repeated HTTP 400 errors on every subsequent turn when the broken history is replayed to strict providers. Adds `_is_valid_tool_call()` static method called in the tool-call persistence loop. Invalid tool calls are skipped with a warning log and never written to session history. Fixes NousResearch#4714
What does this PR do?
Pre-compiles regex patterns used for markdown stripping in TTS and messaging functions, eliminating repeated regex compilation overhead on every function call. This improves performance for voice mode and SMS sending.
Related Issue
N/A (performance optimization)
Type of Change
Changes Made
tools/send_message_tool.py: Pre-compiled 9 markdown stripping regex patterns into_MD_STRIP_PATTERNStuple at module loadtools/tts_tool.py: Pre-compiled 10 markdown stripping regex patterns into_MD_STRIP_PATTERNStuple at module loadcli.py: Pre-compiled 10 markdown stripping regex patterns into_TTS_MD_PATTERNStuple at module loadBefore: Each function called
re.sub()9-10 times sequentially, recompiling patterns on every invocation.After: Patterns pre-compiled once at import time in a tuple, then applied in a loop.
How to Test
Run syntax checks:
Verify imports work:
Test pattern correctness:
Checklist
Code
Documentation & Housekeeping