Bug Description
After a Hermes CLI session is interrupted during context compaction and later resumed/continued, the effective context window shown in the status bar can change from the configured 1M tokens to 256K. In the same failure path, compression summary generation can fail with HTTP 502 and Hermes inserts a fallback context marker, which may degrade the resumed session's accuracy.
Steps to Reproduce
- Configure a custom OpenAI-compatible provider with a 1M context length:
model:
default: gpt-5.5
provider: custom
base_url: https://redacted.example/openai/v1
context_length: 1000000
custom_providers:
- name: private_codex
base_url: https://redacted.example/openai/v1
model: gpt-5.5
models:
gpt-5.5:
context_length: 1000000
compression:
enabled: true
threshold: 0.8
target_ratio: 0.2
- Start a long Hermes CLI session and let it approach/trigger context compaction.
- Have the custom endpoint return HTTP 502 during compression summary generation.
- Continue or resume the interrupted session.
- Observe the CLI status bar and compaction warnings.
Expected Behavior
- The resumed/continued session should preserve the configured context length (
1000000, displayed as roughly 1M).
- If summary generation fails, Hermes should not silently degrade context by inserting a fallback marker that loses important middle turns, or it should at least make the session state and recovery options explicit.
- The configured context length should remain stable across interruption, compaction, resume, and continue flows.
Actual Behavior
- The status bar showed
97.9K/256K even though the active config has model.context_length: 1000000 and the custom provider entry also sets context_length: 1000000.
- The CLI reported:
compression summary failed: Error code: 502. Inserted a fallback context marker.
Session compressed 3 times — accuracy may degrade. Consider /new to start fresh.
API call failed (attempt 1/3): InternalServerError [HTTP 502]
Provider: custom Model: gpt-5.5
Endpoint: https://redacted.example/openai/v1
Error: HTTP 502: Error code: 502
...
Max retries (3) exhausted — trying fallback...
API failed after 3 retries — HTTP 502: Error code: 502
Final error: HTTP 502: Error code: 502
Additional Observations
- Running
hermes config after the incident confirms the config still contains context_length: 1000000.
- Directly calling Hermes' model metadata resolver with
config_context_length=1000000 returns 1000000, so a fresh initialization should resolve to 1M.
- The
256K value appears to come from the running/resumed agent's ContextCompressor.context_length, not from the current config file.
- This suggests a resume/continue or compression failure path may restore or retain a stale/default context length instead of the configured value.
Environment
- OS: Windows 10
- Shell: Git Bash/MSYS via Hermes terminal backend
- Hermes profile: default
- Provider: custom OpenAI-compatible endpoint
- Model:
gpt-5.5
- Configured context length:
1000000
- Observed status bar context length after interruption/continue:
256K
Possible Fix Direction
- Ensure resumed/continued sessions rehydrate
ContextCompressor.context_length from the active model/custom provider config, not from stale runtime state or fallback metadata.
- Consider making
compression.abort_on_summary_failure default to safer behavior for transient provider failures, or avoid inserting a fallback marker that drops middle context when the summary request fails.
- Surface a clearer warning when the runtime context length differs from the configured
model.context_length for the active provider/model.
Bug Description
After a Hermes CLI session is interrupted during context compaction and later resumed/continued, the effective context window shown in the status bar can change from the configured 1M tokens to
256K. In the same failure path, compression summary generation can fail with HTTP 502 and Hermes inserts a fallback context marker, which may degrade the resumed session's accuracy.Steps to Reproduce
Expected Behavior
1000000, displayed as roughly1M).Actual Behavior
97.9K/256Keven though the active config hasmodel.context_length: 1000000and the custom provider entry also setscontext_length: 1000000.Additional Observations
hermes configafter the incident confirms the config still containscontext_length: 1000000.config_context_length=1000000returns1000000, so a fresh initialization should resolve to 1M.256Kvalue appears to come from the running/resumed agent'sContextCompressor.context_length, not from the current config file.Environment
gpt-5.51000000256KPossible Fix Direction
ContextCompressor.context_lengthfrom the active model/custom provider config, not from stale runtime state or fallback metadata.compression.abort_on_summary_failuredefault to safer behavior for transient provider failures, or avoid inserting a fallback marker that drops middle context when the summary request fails.model.context_lengthfor the active provider/model.