Endless Terminals Environment Integration by samherring99 · Pull Request #24 · NousResearch/hermes-agent

samherring99 · 2026-02-18T20:40:35Z

This PR covers the first implementation of endless-terminals dataset repo paper

To run this, simply point your atropos environment to the ./environments/endless_terminals/endless_terminals_env.py environment.

Download Tasks

huggingface-cli download obiwan96/endless-terminals-train --repo-type dataset --local-dir ~/endless-terminals-data --local-dir-use-symlinks False

Testing the environment

python environments/endless_terminals/endless_terminals_env.py process --env.tasks_base_dir /path/to/endless-terminals --env.total_steps 1 --env.group_size 1 --env.use_wandb false --openai.model_name "anthropic/claude-sonnet-4.5" --openai.base_url "https://openrouter.ai/api/v1" --openai.server_type "openai" --openai.api_key "$OPENROUTER_API_KEY"

…ut of provided .sif files

dmahan93 · 2026-02-27T08:05:32Z

+            terminal_backend="local",
+            use_dataset=True,
+            tasks_base_dir="",
+            group_size=1,


can you increase the group size default here

forgot to update from testing good callout, will do

dmahan93 · 2026-02-27T08:05:52Z

+            use_dataset=True,
+            tasks_base_dir="",
+            group_size=1,
+            total_steps=1,


and the total steps

same for above

dmahan93 · 2026-02-27T08:07:03Z

+        task_name = item.get("task_name", "unknown")
+        docker_image = item.get("docker_image", self.config.default_docker_image)
+
+        print(f"[DEBUG] collect_trajectory START for {task_name}", flush=True)


can you switch print to logging?

yep, will do this as well

dmahan93 · 2026-02-27T08:07:40Z

+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log Endless Terminals specific metrics to wandb."""
+        if wandb_metrics is None:


anyway you can add some metrics?

dmahan93 · 2026-02-27T08:07:53Z

+            return 0.0
+
+    async def evaluate(self):
+        """Periodic evaluation (optional)."""


can you make an eval somehow?

i'll look into this, yeah

Added an eval set

dmahan93 · 2026-02-27T08:09:31Z

+                    "masks": node.masked_tokens,
+                    "scores": reward,
+                }
+                if hasattr(node, "logprobs") and node.logprobs:


you need to include logprobs into the scored data item

Good callout, will do this

dmahan93 · 2026-02-27T08:09:51Z

+
+            if nodes:
+                # Phase 2: use actual node data
+                node = nodes[-1]


why nodes[-1] here?

I was going off this:

hermes-agent/environments/hermes_base_env.py

Line 569 in 8b54bb4

node = nodes[-1] # Final sequence node = full trajectory

assuming that nodes[-1] is the accumulation of the full trajectory, is that not accurate here?

No, @teknium1 needs to fix that too, here's how you use it:

https://github.com/NousResearch/atropos/blob/main/environments/math_server_zero.py#L340-L382

we may have multiple trajectories in the node due to how interesting agents can be, so you may need to return multiple sequences

Fixed this up

…ajectories

…logger-crash fix: NameError on logger in custom endpoint model discovery

…on export, pinned sessions, context meter Ported from ibelick/webclaw PRs NousResearch#24, #10, NousResearch#14, NousResearch#13: - Command palette (⌘K): search and switch sessions instantly - Conversation export: download as Markdown, JSON, or Plain Text - Pinned sessions: pin/unpin from context menu, shown at top of sidebar - Context meter: token usage ring in chat header with hover details - Keyboard shortcuts: ⌘K search, ⌘⇧O new session New UI primitives: autocomplete, command, input, preview-card Attachment button/preview components (composer already has built-in support)

…f light Cherry-picked from PR NousResearch#24 (clawjasper56). One-liner: respects OS dark/light preference out of the box for new users.

…agents (phase 10) The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents: - verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`, new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"` - pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us) - `name`+`session` collapsed into a single `session_name` (the 5th subject token) — `Envelope.session` and the `session=` kwarg on `AgentService` / `Agent.prompt` are gone. One service = one session_name. Package + import root rename: `natsagent` → `synadia-ai-agents`, `synadia_ai.agents`. Service-side class `Agent` → `AgentService`. Adapter changes: - Adopt single-service-per-session: rely on Hermes profile isolation for multi-session deployments instead of building an envelope.session demuxer on top of `AgentService`. The `_session_locks` dict collapses to a single `_session_lock`. - The SDK explicitly does not own NATS connections: callers build the client. Adapter calls `nats.connect(servers=...)` or `nats.connect(**sdk.load_context_options(name))` directly. - Config: `extra.name` + `extra.session_default` → required `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` → `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged. - Lock identity rebuilt as `{agent}:{owner}:{session_name}`. Tests + docs: - conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`, installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so the adapter's `nats.connect(...)` resolves under test. - New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct- sessions test removed (v0.2-only concept); positive test added that chat_id is sourced from `settings.session_name` regardless of any stray envelope field. - design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10 decision-log entry; user-facing nats.md rewritten with verb-first subject examples, status endpoint walkthrough, and `_INBOX.agents.>` permission note. Live-verified end-to-end against `nats-server -p 4223` + `hermes-local` context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt streamed a real haiku reply through `agents.prompt.hermes.rene.local`, multi-turn session continuity intact, `/status` slash command dispatched through the gateway's command registry. Discovery shows `protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`. Status endpoint replies on `agents.status.hermes.rene.local`. NATS gateway tests: 190/190 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fixes 12 remaining MEDIUM issues from the deep audit (19 total, 7 fixed in Round 12): design_agent: - NousResearch#15: add asyncio.wait_for(300s) around LLM API call to prevent infinite hangs - NousResearch#17: replace 2x hardcoded 'claude-opus-4-8' with shared DEFAULT_MODEL constant qa_agent / validate_agent: - NousResearch#20,NousResearch#22,NousResearch#23: already fixed in Round 12 (verified — dynamic timeout/threshold values used) memory.py: - NousResearch#24: frontmatter parser uses regex r'^---$' instead of str.split('---',2), preventing false splits on content containing '---' (SQL, markdown tables) - NousResearch#25: parse and preserve 'description' field from frontmatter in metadata, fixing write→load roundtrip data loss profiles.py: - NousResearch#26: ProfileConfig now frozen=True (immutable dataclass per coding standards) deploy_agent: - NousResearch#31: replace 2x sync subprocess.run with asyncio.create_subprocess_exec - fix 5x .decode() → .decode('utf-8', errors='replace') for Windows CJK safety - remove unused import subprocess db.py: - NousResearch#27: add class docstring explaining RLock + _unlocked pattern - NousResearch#28: FK constraints already in DDL (verified PRAGMA foreign_keys=ON active) - NousResearch#29: add _ensure_connection() with PRAGMA integrity_check(1) + auto-reconnect on 4 critical methods (create_task, get_task, claim_task, submit_result) - extract _create_connection() static method for reuse by reconnect Tests: 79 passed, 0 failed

Adding endless terminal environment after rebase:

9139eea

samherring99 requested a review from teknium1 February 18, 2026 20:40

samherring99 added 5 commits February 24, 2026 16:35

Updating to use hermes-agent backend and parse container definition o…

f1c2f8a

…ut of provided .sif files

Updating path vars and dataset loading

b93ad43

Adding config init method

c12e46c

Updating config

0e694b9

Wandb changes

b7e713b

dmahan93 requested changes Feb 27, 2026

View reviewed changes

samherring99 added 3 commits February 27, 2026 11:20

Added task sppecific metris and evals

6fdb38e

Changing return type to be ScoredDataGroup to account for multiple tr…

fe17b5f

…ajectories

Eval splits for holdout sets

dff5481

teknium1 closed this Mar 17, 2026

sudo-yf pushed a commit to sudo-yf/hermes-agent that referenced this pull request Apr 5, 2026

Merge pull request NousResearch#24 from nesquena/fix/model-discovery-…

ae1faa7

…logger-crash fix: NameError on logger in custom endpoint model discovery

kshitijk4poor mentioned this pull request Apr 10, 2026

refactor(browser): extract camofox dispatch pattern and reduce boilerplate #7337

Open

alexzhu0 mentioned this pull request Apr 29, 2026

fix(browser): SIGKILL Chrome descendants when reaping orphaned daemons #17547

Closed

GodsBoy mentioned this pull request May 24, 2026

feat(egress): iron-proxy credential-injection firewall for sandboxes #30179

Open

BassMantis99 mentioned this pull request Jun 1, 2026

ci: add native Windows smoke workflow #36154

Draft

acoastalfog mentioned this pull request Jun 3, 2026

Guided KB Review Sessions: Telegram queue cards #38025

Closed

Conversation

samherring99 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Download Tasks

Testing the environment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samherring99 commented Feb 18, 2026 •

edited

Loading