Endless Terminals Environment Integration#24
Conversation
| terminal_backend="local", | ||
| use_dataset=True, | ||
| tasks_base_dir="", | ||
| group_size=1, |
There was a problem hiding this comment.
can you increase the group size default here
There was a problem hiding this comment.
forgot to update from testing good callout, will do
| use_dataset=True, | ||
| tasks_base_dir="", | ||
| group_size=1, | ||
| total_steps=1, |
There was a problem hiding this comment.
same for above
| task_name = item.get("task_name", "unknown") | ||
| docker_image = item.get("docker_image", self.config.default_docker_image) | ||
|
|
||
| print(f"[DEBUG] collect_trajectory START for {task_name}", flush=True) |
There was a problem hiding this comment.
can you switch print to logging?
There was a problem hiding this comment.
yep, will do this as well
|
|
||
| async def wandb_log(self, wandb_metrics: Optional[Dict] = None): | ||
| """Log Endless Terminals specific metrics to wandb.""" | ||
| if wandb_metrics is None: |
There was a problem hiding this comment.
anyway you can add some metrics?
| return 0.0 | ||
|
|
||
| async def evaluate(self): | ||
| """Periodic evaluation (optional).""" |
There was a problem hiding this comment.
can you make an eval somehow?
There was a problem hiding this comment.
i'll look into this, yeah
There was a problem hiding this comment.
Added an eval set
| "masks": node.masked_tokens, | ||
| "scores": reward, | ||
| } | ||
| if hasattr(node, "logprobs") and node.logprobs: |
There was a problem hiding this comment.
you need to include logprobs into the scored data item
There was a problem hiding this comment.
Good callout, will do this
|
|
||
| if nodes: | ||
| # Phase 2: use actual node data | ||
| node = nodes[-1] |
There was a problem hiding this comment.
I was going off this:
hermes-agent/environments/hermes_base_env.py
Line 569 in 8b54bb4
assuming that nodes[-1] is the accumulation of the full trajectory, is that not accurate here?
There was a problem hiding this comment.
No, @teknium1 needs to fix that too, here's how you use it:
https://github.com/NousResearch/atropos/blob/main/environments/math_server_zero.py#L340-L382
There was a problem hiding this comment.
we may have multiple trajectories in the node due to how interesting agents can be, so you may need to return multiple sequences
…logger-crash fix: NameError on logger in custom endpoint model discovery
…on export, pinned sessions, context meter Ported from ibelick/webclaw PRs NousResearch#24, #10, NousResearch#14, NousResearch#13: - Command palette (⌘K): search and switch sessions instantly - Conversation export: download as Markdown, JSON, or Plain Text - Pinned sessions: pin/unpin from context menu, shown at top of sidebar - Context meter: token usage ring in chat header with hover details - Keyboard shortcuts: ⌘K search, ⌘⇧O new session New UI primitives: autocomplete, command, input, preview-card Attachment button/preview components (composer already has built-in support)
…f light Cherry-picked from PR NousResearch#24 (clawjasper56). One-liner: respects OS dark/light preference out of the box for new users.
…agents (phase 10) The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents: - verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`, new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"` - pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us) - `name`+`session` collapsed into a single `session_name` (the 5th subject token) — `Envelope.session` and the `session=` kwarg on `AgentService` / `Agent.prompt` are gone. One service = one session_name. Package + import root rename: `natsagent` → `synadia-ai-agents`, `synadia_ai.agents`. Service-side class `Agent` → `AgentService`. Adapter changes: - Adopt single-service-per-session: rely on Hermes profile isolation for multi-session deployments instead of building an envelope.session demuxer on top of `AgentService`. The `_session_locks` dict collapses to a single `_session_lock`. - The SDK explicitly does not own NATS connections: callers build the client. Adapter calls `nats.connect(servers=...)` or `nats.connect(**sdk.load_context_options(name))` directly. - Config: `extra.name` + `extra.session_default` → required `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` → `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged. - Lock identity rebuilt as `{agent}:{owner}:{session_name}`. Tests + docs: - conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`, installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so the adapter's `nats.connect(...)` resolves under test. - New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct- sessions test removed (v0.2-only concept); positive test added that chat_id is sourced from `settings.session_name` regardless of any stray envelope field. - design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10 decision-log entry; user-facing nats.md rewritten with verb-first subject examples, status endpoint walkthrough, and `_INBOX.agents.>` permission note. Live-verified end-to-end against `nats-server -p 4223` + `hermes-local` context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt streamed a real haiku reply through `agents.prompt.hermes.rene.local`, multi-turn session continuity intact, `/status` slash command dispatched through the gateway's command registry. Discovery shows `protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`. Status endpoint replies on `agents.status.hermes.rene.local`. NATS gateway tests: 190/190 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…agents (phase 10) The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents: - verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`, new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"` - pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us) - `name`+`session` collapsed into a single `session_name` (the 5th subject token) — `Envelope.session` and the `session=` kwarg on `AgentService` / `Agent.prompt` are gone. One service = one session_name. Package + import root rename: `natsagent` → `synadia-ai-agents`, `synadia_ai.agents`. Service-side class `Agent` → `AgentService`. Adapter changes: - Adopt single-service-per-session: rely on Hermes profile isolation for multi-session deployments instead of building an envelope.session demuxer on top of `AgentService`. The `_session_locks` dict collapses to a single `_session_lock`. - The SDK explicitly does not own NATS connections: callers build the client. Adapter calls `nats.connect(servers=...)` or `nats.connect(**sdk.load_context_options(name))` directly. - Config: `extra.name` + `extra.session_default` → required `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` → `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged. - Lock identity rebuilt as `{agent}:{owner}:{session_name}`. Tests + docs: - conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`, installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so the adapter's `nats.connect(...)` resolves under test. - New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct- sessions test removed (v0.2-only concept); positive test added that chat_id is sourced from `settings.session_name` regardless of any stray envelope field. - design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10 decision-log entry; user-facing nats.md rewritten with verb-first subject examples, status endpoint walkthrough, and `_INBOX.agents.>` permission note. Live-verified end-to-end against `nats-server -p 4223` + `hermes-local` context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt streamed a real haiku reply through `agents.prompt.hermes.rene.local`, multi-turn session continuity intact, `/status` slash command dispatched through the gateway's command registry. Discovery shows `protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`. Status endpoint replies on `agents.status.hermes.rene.local`. NATS gateway tests: 190/190 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…agents (phase 10) The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents: - verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`, new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"` - pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us) - `name`+`session` collapsed into a single `session_name` (the 5th subject token) — `Envelope.session` and the `session=` kwarg on `AgentService` / `Agent.prompt` are gone. One service = one session_name. Package + import root rename: `natsagent` → `synadia-ai-agents`, `synadia_ai.agents`. Service-side class `Agent` → `AgentService`. Adapter changes: - Adopt single-service-per-session: rely on Hermes profile isolation for multi-session deployments instead of building an envelope.session demuxer on top of `AgentService`. The `_session_locks` dict collapses to a single `_session_lock`. - The SDK explicitly does not own NATS connections: callers build the client. Adapter calls `nats.connect(servers=...)` or `nats.connect(**sdk.load_context_options(name))` directly. - Config: `extra.name` + `extra.session_default` → required `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` → `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged. - Lock identity rebuilt as `{agent}:{owner}:{session_name}`. Tests + docs: - conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`, installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so the adapter's `nats.connect(...)` resolves under test. - New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct- sessions test removed (v0.2-only concept); positive test added that chat_id is sourced from `settings.session_name` regardless of any stray envelope field. - design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10 decision-log entry; user-facing nats.md rewritten with verb-first subject examples, status endpoint walkthrough, and `_INBOX.agents.>` permission note. Live-verified end-to-end against `nats-server -p 4223` + `hermes-local` context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt streamed a real haiku reply through `agents.prompt.hermes.rene.local`, multi-turn session continuity intact, `/status` slash command dispatched through the gateway's command registry. Discovery shows `protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`. Status endpoint replies on `agents.status.hermes.rene.local`. NATS gateway tests: 190/190 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes 12 remaining MEDIUM issues from the deep audit (19 total, 7 fixed in Round 12): design_agent: - NousResearch#15: add asyncio.wait_for(300s) around LLM API call to prevent infinite hangs - NousResearch#17: replace 2x hardcoded 'claude-opus-4-8' with shared DEFAULT_MODEL constant qa_agent / validate_agent: - NousResearch#20,NousResearch#22,NousResearch#23: already fixed in Round 12 (verified — dynamic timeout/threshold values used) memory.py: - NousResearch#24: frontmatter parser uses regex r'^---$' instead of str.split('---',2), preventing false splits on content containing '---' (SQL, markdown tables) - NousResearch#25: parse and preserve 'description' field from frontmatter in metadata, fixing write→load roundtrip data loss profiles.py: - NousResearch#26: ProfileConfig now frozen=True (immutable dataclass per coding standards) deploy_agent: - NousResearch#31: replace 2x sync subprocess.run with asyncio.create_subprocess_exec - fix 5x .decode() → .decode('utf-8', errors='replace') for Windows CJK safety - remove unused import subprocess db.py: - NousResearch#27: add class docstring explaining RLock + _unlocked pattern - NousResearch#28: FK constraints already in DDL (verified PRAGMA foreign_keys=ON active) - NousResearch#29: add _ensure_connection() with PRAGMA integrity_check(1) + auto-reconnect on 4 critical methods (create_task, get_task, claim_task, submit_result) - extract _create_connection() static method for reuse by reconnect Tests: 79 passed, 0 failed
This PR covers the first implementation of endless-terminals dataset repo paper
To run this, simply point your atropos environment to the
./environments/endless_terminals/endless_terminals_env.pyenvironment.Download Tasks
Testing the environment