Skip to content

Endless Terminals Environment Integration#24

Closed
samherring99 wants to merge 9 commits into
mainfrom
endless-terminal-new
Closed

Endless Terminals Environment Integration#24
samherring99 wants to merge 9 commits into
mainfrom
endless-terminal-new

Conversation

@samherring99

@samherring99 samherring99 commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

This PR covers the first implementation of endless-terminals dataset repo paper

To run this, simply point your atropos environment to the ./environments/endless_terminals/endless_terminals_env.py environment.

Download Tasks

huggingface-cli download obiwan96/endless-terminals-train --repo-type dataset --local-dir ~/endless-terminals-data --local-dir-use-symlinks False

Testing the environment

python environments/endless_terminals/endless_terminals_env.py process --env.tasks_base_dir /path/to/endless-terminals --env.total_steps 1 --env.group_size 1 --env.use_wandb false --openai.model_name "anthropic/claude-sonnet-4.5" --openai.base_url "https://openrouter.ai/api/v1" --openai.server_type "openai" --openai.api_key "$OPENROUTER_API_KEY"

terminal_backend="local",
use_dataset=True,
tasks_base_dir="",
group_size=1,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you increase the group size default here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to update from testing good callout, will do

use_dataset=True,
tasks_base_dir="",
group_size=1,
total_steps=1,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the total steps

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for above

task_name = item.get("task_name", "unknown")
docker_image = item.get("docker_image", self.config.default_docker_image)

print(f"[DEBUG] collect_trajectory START for {task_name}", flush=True)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you switch print to logging?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do this as well


async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log Endless Terminals specific metrics to wandb."""
if wandb_metrics is None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway you can add some metrics?

return 0.0

async def evaluate(self):
"""Periodic evaluation (optional)."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make an eval somehow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll look into this, yeah

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an eval set

"masks": node.masked_tokens,
"scores": reward,
}
if hasattr(node, "logprobs") and node.logprobs:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to include logprobs into the scored data item

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, will do this


if nodes:
# Phase 2: use actual node data
node = nodes[-1]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why nodes[-1] here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going off this:

node = nodes[-1] # Final sequence node = full trajectory

assuming that nodes[-1] is the accumulation of the full trajectory, is that not accurate here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may have multiple trajectories in the node due to how interesting agents can be, so you may need to return multiple sequences

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this up

@teknium1 teknium1 closed this Mar 17, 2026
sudo-yf pushed a commit to sudo-yf/hermes-agent that referenced this pull request Apr 5, 2026
…logger-crash

fix: NameError on logger in custom endpoint model discovery
h4x3rotab pushed a commit to Clawdi-AI/hermes-agent that referenced this pull request Apr 10, 2026
…on export, pinned sessions, context meter

Ported from ibelick/webclaw PRs NousResearch#24, #10, NousResearch#14, NousResearch#13:
- Command palette (⌘K): search and switch sessions instantly
- Conversation export: download as Markdown, JSON, or Plain Text
- Pinned sessions: pin/unpin from context menu, shown at top of sidebar
- Context meter: token usage ring in chat header with hover details
- Keyboard shortcuts: ⌘K search, ⌘⇧O new session

New UI primitives: autocomplete, command, input, preview-card
Attachment button/preview components (composer already has built-in support)
h4x3rotab pushed a commit to Clawdi-AI/hermes-agent that referenced this pull request Apr 10, 2026
…f light

Cherry-picked from PR NousResearch#24 (clawjasper56). One-liner: respects OS
dark/light preference out of the box for new users.
renerocksai added a commit to renerocksai/hermes-agent that referenced this pull request Apr 28, 2026
…agents (phase 10)

The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents:
- verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`,
  new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"`
- pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us)
- `name`+`session` collapsed into a single `session_name` (the 5th subject
  token) — `Envelope.session` and the `session=` kwarg on `AgentService` /
  `Agent.prompt` are gone. One service = one session_name.

Package + import root rename: `natsagent` → `synadia-ai-agents`,
`synadia_ai.agents`. Service-side class `Agent` → `AgentService`.

Adapter changes:
- Adopt single-service-per-session: rely on Hermes profile isolation for
  multi-session deployments instead of building an envelope.session demuxer
  on top of `AgentService`. The `_session_locks` dict collapses to a single
  `_session_lock`.
- The SDK explicitly does not own NATS connections: callers build the
  client. Adapter calls `nats.connect(servers=...)` or
  `nats.connect(**sdk.load_context_options(name))` directly.
- Config: `extra.name` + `extra.session_default` → required
  `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` →
  `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged.
- Lock identity rebuilt as `{agent}:{owner}:{session_name}`.

Tests + docs:
- conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`,
  installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so
  the adapter's `nats.connect(...)` resolves under test.
- New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct-
  sessions test removed (v0.2-only concept); positive test added that
  chat_id is sourced from `settings.session_name` regardless of any stray
  envelope field.
- design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10
  decision-log entry; user-facing nats.md rewritten with verb-first subject
  examples, status endpoint walkthrough, and `_INBOX.agents.>` permission
  note.

Live-verified end-to-end against `nats-server -p 4223` + `hermes-local`
context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt
streamed a real haiku reply through `agents.prompt.hermes.rene.local`,
multi-turn session continuity intact, `/status` slash command dispatched
through the gateway's command registry. Discovery shows
`protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`.
Status endpoint replies on `agents.status.hermes.rene.local`.

NATS gateway tests: 190/190 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
renerocksai added a commit to synadia-ai/hermes-agent that referenced this pull request May 6, 2026
…agents (phase 10)

The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents:
- verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`,
  new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"`
- pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us)
- `name`+`session` collapsed into a single `session_name` (the 5th subject
  token) — `Envelope.session` and the `session=` kwarg on `AgentService` /
  `Agent.prompt` are gone. One service = one session_name.

Package + import root rename: `natsagent` → `synadia-ai-agents`,
`synadia_ai.agents`. Service-side class `Agent` → `AgentService`.

Adapter changes:
- Adopt single-service-per-session: rely on Hermes profile isolation for
  multi-session deployments instead of building an envelope.session demuxer
  on top of `AgentService`. The `_session_locks` dict collapses to a single
  `_session_lock`.
- The SDK explicitly does not own NATS connections: callers build the
  client. Adapter calls `nats.connect(servers=...)` or
  `nats.connect(**sdk.load_context_options(name))` directly.
- Config: `extra.name` + `extra.session_default` → required
  `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` →
  `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged.
- Lock identity rebuilt as `{agent}:{owner}:{session_name}`.

Tests + docs:
- conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`,
  installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so
  the adapter's `nats.connect(...)` resolves under test.
- New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct-
  sessions test removed (v0.2-only concept); positive test added that
  chat_id is sourced from `settings.session_name` regardless of any stray
  envelope field.
- design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10
  decision-log entry; user-facing nats.md rewritten with verb-first subject
  examples, status endpoint walkthrough, and `_INBOX.agents.>` permission
  note.

Live-verified end-to-end against `nats-server -p 4223` + `hermes-local`
context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt
streamed a real haiku reply through `agents.prompt.hermes.rene.local`,
multi-turn session continuity intact, `/status` slash command dispatched
through the gateway's command registry. Discovery shows
`protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`.
Status endpoint replies on `agents.status.hermes.rene.local`.

NATS gateway tests: 190/190 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
renerocksai added a commit to synadia-ai/hermes-agent-work that referenced this pull request May 18, 2026
…agents (phase 10)

The SDK landed PRs NousResearch#24/NousResearch#25/NousResearch#26 in synadia-ai/synadia-agents:
- verb-first subjects (`agents.prompt.{a}.{o}.{s}`, `agents.hb.{a}.{o}.{s}`,
  new `agents.status.{a}.{o}.{s}`) and `metadata.protocol_version="0.3"`
- pinned `_INBOX.agents` reply-inbox prefix (caller-side; no-op for us)
- `name`+`session` collapsed into a single `session_name` (the 5th subject
  token) — `Envelope.session` and the `session=` kwarg on `AgentService` /
  `Agent.prompt` are gone. One service = one session_name.

Package + import root rename: `natsagent` → `synadia-ai-agents`,
`synadia_ai.agents`. Service-side class `Agent` → `AgentService`.

Adapter changes:
- Adopt single-service-per-session: rely on Hermes profile isolation for
  multi-session deployments instead of building an envelope.session demuxer
  on top of `AgentService`. The `_session_locks` dict collapses to a single
  `_session_lock`.
- The SDK explicitly does not own NATS connections: callers build the
  client. Adapter calls `nats.connect(servers=...)` or
  `nats.connect(**sdk.load_context_options(name))` directly.
- Config: `extra.name` + `extra.session_default` → required
  `extra.session_name`; env var `HERMES_NATS_NAME`/`HERMES_NATS_SESSION` →
  `HERMES_NATS_SESSION_NAME`. No migration shim — branch hadn't merged.
- Lock identity rebuilt as `{agent}:{owner}:{session_name}`.

Tests + docs:
- conftest mock renamed `_ensure_natsagent_mock` → `_ensure_synadia_agents_mock`,
  installs under `sys.modules["synadia_ai.agents"]`, also stubs `nats` so
  the adapter's `nats.connect(...)` resolves under test.
- New `mock_nats` fixture in test_nats_connect.py; concurrent-distinct-
  sessions test removed (v0.2-only concept); positive test added that
  chat_id is sourced from `settings.session_name` regardless of any stray
  envelope field.
- design doc §1-§6/§11/§17 updated for v0.3; progress doc gains a Phase 10
  decision-log entry; user-facing nats.md rewritten with verb-first subject
  examples, status endpoint walkthrough, and `_INBOX.agents.>` permission
  note.

Live-verified end-to-end against `nats-server -p 4223` + `hermes-local`
context + `model: anthropic/claude-haiku-4.5` over OpenRouter: real prompt
streamed a real haiku reply through `agents.prompt.hermes.rene.local`,
multi-turn session continuity intact, `/status` slash command dispatched
through the gateway's command registry. Discovery shows
`protocol_version: 0.3`. Heartbeats fire on `agents.hb.hermes.rene.local`.
Status endpoint replies on `agents.status.hermes.rene.local`.

NATS gateway tests: 190/190 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
difeizheng pushed a commit to difeizheng/zdf-hermes-agent that referenced this pull request Jun 3, 2026
Fixes 12 remaining MEDIUM issues from the deep audit (19 total, 7 fixed in Round 12):

design_agent:
- NousResearch#15: add asyncio.wait_for(300s) around LLM API call to prevent infinite hangs
- NousResearch#17: replace 2x hardcoded 'claude-opus-4-8' with shared DEFAULT_MODEL constant

qa_agent / validate_agent:
- NousResearch#20,NousResearch#22,NousResearch#23: already fixed in Round 12 (verified — dynamic timeout/threshold values used)

memory.py:
- NousResearch#24: frontmatter parser uses regex r'^---$' instead of str.split('---',2),
  preventing false splits on content containing '---' (SQL, markdown tables)
- NousResearch#25: parse and preserve 'description' field from frontmatter in metadata,
  fixing write→load roundtrip data loss

profiles.py:
- NousResearch#26: ProfileConfig now frozen=True (immutable dataclass per coding standards)

deploy_agent:
- NousResearch#31: replace 2x sync subprocess.run with asyncio.create_subprocess_exec
- fix 5x .decode() → .decode('utf-8', errors='replace') for Windows CJK safety
- remove unused import subprocess

db.py:
- NousResearch#27: add class docstring explaining RLock + _unlocked pattern
- NousResearch#28: FK constraints already in DDL (verified PRAGMA foreign_keys=ON active)
- NousResearch#29: add _ensure_connection() with PRAGMA integrity_check(1) + auto-reconnect
       on 4 critical methods (create_task, get_task, claim_task, submit_result)
- extract _create_connection() static method for reuse by reconnect

Tests: 79 passed, 0 failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants