Skip to content

fix(config): sanitize_env substring false positive breaks valid API keys#17241

Closed
Feng-H wants to merge 1 commit into
NousResearch:mainfrom
Feng-H:fix/sanitize-env-key-substring-mismatch
Closed

fix(config): sanitize_env substring false positive breaks valid API keys#17241
Feng-H wants to merge 1 commit into
NousResearch:mainfrom
Feng-H:fix/sanitize-env-key-substring-mismatch

Conversation

@Feng-H

@Feng-H Feng-H commented Apr 29, 2026

Copy link
Copy Markdown

Problem

_sanitize_env_lines() in hermes_cli/config.py uses str.find() to detect concatenated KEY=VALUE pairs on a single line. This substring search causes false positives when one known env var name is a substring of another:

  • LM_API_KEY= matches at position 1 inside GLM_API_KEY=<value>, because find() does not require a word boundary.
  • The sanitizer then splits the valid entry into a bare G line and an LM_API_KEY=... line, destroying the key.
  • On the next gateway startup, sanitize_env_file() runs again and sees the two fragments as-is (neither is a valid known key), preserving the corruption permanently.

This is a general bug — any pair of known keys where one name contains another as a substring is affected (e.g., API_KEY= inside OPENAI_API_KEY=, ANTHROPIC_TOKEN= inside ANTHROPIC_API_KEY=, etc.). The practical impact is limited because most users do not encounter the initial corruption that triggers the cycle.

Trigger scenario (CJK input methods)

The bug becomes a persistent data-loss issue when combined with CJK input methods (fcitx5, ibus). These can occasionally insert a stray newline in the middle of a pasted env var name, producing:

G
LM_API_KEY=<actual_value>

Once this corruption exists:

  1. The old sanitizer has no merge logic — it preserves both fragments as-is.
  2. If the user manually fixes the split (e.g., via sed), the sanitizer re-splits it on the next startup due to the substring match described above.
  3. This creates a repair loop that makes the .env file unfixable without patching the code.

Fix

Pass 0 (new): Detect and merge split key names. If a line contains only uppercase letters/underscores (no =) and the next non-empty line, when concatenated, forms a known env var name with =, merge them.

Pass 1 (improved): Replace str.find() (arbitrary substring search) with str.startswith() at position 0 + forward offset scanning. This ensures concatenated key detection only matches actual key names at line/segment boundaries, never as substrings inside other key names.

Testing

Verified with these scenarios:

  • Split key merge: G + newline + LM_API_KEY=<value>GLM_API_KEY=<value>
  • Normal key preserved: GLM_API_KEY=abcGLM_API_KEY=abc
  • Concatenated keys still split: KEY1=val1KEY2=val2 → two lines
  • Comments/blanks preserved
  • No false merge on unrelated uppercase text

sanitize_env_lines() uses str.find() to detect concatenated KEY=VALUE
pairs, but this matches key names as substrings of other key names.
For example, LM_API_KEY= is found at position 1 inside
GLM_API_KEY=<value>, causing the sanitizer to split a valid entry into
a bare 'G' line and an 'LM_API_KEY=...' line — destroying the key.

This is especially harmful when combined with CJK input methods (fcitx5,
ibus) that can insert a stray newline in the middle of an env var name
during paste, producing a split like:
    G
    LM_API_KEY=<actual_value>

The sanitizer would then:
1. Fail to merge the split (original behavior had no merge logic)
2. When the split was manually fixed, Pass 1 would re-split it on the
   next gateway startup because find('LM_API_KEY=') matches inside
   'GLM_API_KEY=...' at position 1

Fix:
- Pass 0 (new): detect and merge split key names where a bare uppercase
  fragment on one line + a KEY=VALUE on the next line combine to form a
  known env var name.
- Pass 1 (improved): replace str.find() with str.startswith() at
  position 0 + forward scanning. This ensures we only match actual key
  names at line/segment boundaries, never as substrings inside other
  key names.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles labels Apr 29, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Fixes #17138. More comprehensive than #17141 — adds Pass 0 merge logic for split key names (CJK input method edge case) plus Pass 1 startswith() fix.

@teknium1

Copy link
Copy Markdown
Contributor

Closing as already fixed on main.

Triage notes (medium confidence):
hermes_cli/config.py:4690-4709 on main already filters contained match_ranges so LM_API_KEY inside GLM_API_KEY no longer triggers the substring false-positive split described in the PR title.

If you still see this on the latest version, please reopen with reproduction steps.

(Bulk-closed during a CLI triage sweep.)

@teknium1 teknium1 closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants