Skip to content

fix(gateway): fix underscore-stripping regex to preserve snake_case identifiers#15451

Closed
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:fix/gateway-helpers-underscore-regex
Closed

fix(gateway): fix underscore-stripping regex to preserve snake_case identifiers#15451
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:fix/gateway-helpers-underscore-regex

Conversation

@Tranquil-Flow

@Tranquil-Flow Tranquil-Flow commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

The blanket _(.+?)_ and __(.+?)__ patterns in gateway/platforms/helpers.py incorrectly consumed snake_case identifiers like send_as_bot and user_id. This adds lookbehind/lookahead boundaries ((?<![a-zA-Z0-9]) / (?![a-zA-Z0-9])) so underscores adjacent to alphanumeric characters are not treated as markdown formatting.

The same fix was already applied and tested in the CLI renderer (cli.py); this addresses the gateway copy.

Related Issue

Supersedes #15076.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/platforms/helpers.py — added non-alphanumeric boundary assertions to _RE_BOLD_UNDER and _RE_ITALIC_UNDER so they only match real Markdown delimiters
  • New tests covering snake_case preservation, bold/italic stripping behaviour, mixed cases

How to Test

  1. pytest -k test_snake_case_preservedsend_as_bot, user_id survive stripping
  2. pytest -k test_bold_underscore_stripped — actual __bold__ formatting is stripped
  3. pytest -k test_italic_underscore_stripped — actual _italic_ formatting is stripped
  4. pytest -k test_double_underscore_in_identifier_preservedmy_var__name survives
  5. pytest -k test_config_keys_preservedmax_tokens, api_base_url survive
  6. pytest -k test_asterisk_bold_unaffected**bold** still works
  7. pytest -k test_mixed_formatting_and_identifiers — both formatting and identifiers handled correctly
  8. pytest -k test_multiple_snake_case_in_one_line — multiple identifiers in one line

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15 (Darwin 24.6.0)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

8/8 tests pass

…dentifiers

The blanket _(.+?)_ and __(.+?)__ patterns incorrectly consumed
snake_case identifiers like send_as_bot and user_id.  Add
lookbehind/lookahead boundaries so underscores adjacent to
alphanumeric characters are not treated as markdown formatting.

Same fix already applied and tested in the CLI renderer; this
addresses the gateway/platforms/helpers.py copy.

Supersedes NousResearch#15076.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Supersedes #15076 and #11775 — same gateway underscore-stripping fix with proper word-boundary lookarounds.

@Tranquil-Flow

Copy link
Copy Markdown
Contributor Author

Closing — the fix is now on main via an alternative regex.

On current origin/main:

  • gateway/platforms/telegram.py:185-186: the italic-stripping regex was rewritten as re.sub(r'(?<!\w)_([^_]+)_(?!\w)', r'\1', cleaned) with an inline comment "Use word boundary (\b) to avoid breaking snake_case like my_variable_name" — the same intent as this PR's [a-zA-Z0-9] checks but using zero-width word-boundary lookarounds instead.

Same goal, same outcome (my_variable_name no longer mis-parsed as italic). No further action needed on this PR. Thanks for the original diagnosis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants