Skip to content

[Proposal] (safety): enhance /careful and /guard with structured execution judgment #1091

@Jang-woo-AnnaSoft

Description

@Jang-woo-AnnaSoft

[Proposal] (safety): enhance /careful and /guard with structured execution judgment

Summary

The current /careful and /guard skills warn before destructive commands by matching known-dangerous patterns (rm -rf, DROP TABLE, force-push, etc.).

This is necessary — but not sufficient.

Pattern matching catches known dangers. It misses the structural question:
does the AI actually have enough information to judge whether this action is safe to execute right now?

This PR proposes integrating the 9-Question Protocol from the
execution-boundaries project into /careful and /guard, alongside a companion ISE(Intent–State–Effect) model for multi-agent execution tracing.


The Root Cause: Boundaries Are Not Declared

Before an AI can judge whether an action is safe, someone must have declared what the boundaries of that action are.

This is the part that is almost always missing.

The precondition for safe execution is not AI judgment.
It is the prior declaration of boundaries by the manufacturer or developer.

Whether the actor is a physical IoT device or a software agent, the structure is identical:

  • If boundaries are declared → AI can verify against them
  • If boundaries are not declared → AI fills the gap with inference
  • Inference-filled gaps are not judgment. They are gaps in accountability.

This is not a failure of AI capability. It is a failure of design.

The execution-boundaries project makes this argument formally.

Its 9-Question Protocol is the minimal, actionable output:
a fixed set of questions that must have declared answers before any action may execute. If even one question has no answer, execution must be blocked — not warned, blocked.

"AI does not fill gaps. It reveals gaps."

This principle applies equally to:

  • IoT / physical devices: a hardware manufacturer must declare what the device does, what boundaries must never be crossed before human review.

We are already preparing experiments to apply this boundary-declaration model to real IoT deployments.

  • Software agents: an agent developer (or the maintainer of a skill pack like gstack) must declare, per action, what the execution effect is, what the safety boundary is, and who is responsible for each answer.

gstack's SKILL.md files are already close to this idea — they define what each skill does and when it should be invoked.
The gap is that they do not yet declare execution boundaries at the action level. This PR proposes closing that gap for the safety-critical skills first.


The Problem in gstack Today

When /careful is active and Claude is about to run a command, the current behavior is:

  1. Match command against a known-dangerous pattern list
  2. If matched → warn and ask for confirmation
  3. If not matched → proceed

This works for obvious cases.

But consider:

  • git push --force on the wrong remote (not dangerous in isolation, fatal in context)
  • DELETE FROM orders WHERE status='pending' (destructive only if intent is wrong)
  • An agent in a 10-agent parallel sprint deleting a shared resource that another agent depends on

The problem is not that the command is on a list.

The problem is that intent, context, and effect have not been explicitly declared or confirmed — by the developer who wrote the skill, or by the user who invoked it.


The Proposal

For /careful — 9-Question Protocol at execution time

Before executing any action flagged as potentially destructive, the agent must be able to answer all nine questions. If even one is unanswered, execution is blocked.

# Question Responsible Party
Q1 What is the intent of this action? User / Manufacturer
Q2 What happens in reality when this executes? Manufacturer / Agent
Q3 What boundary must never be crossed? Manufacturer / Agent
Q4 In what context is this action valid? User
Q5 What event has occurred? (start / stop) Observation Layer
Q6 How far has the goal been reached? Observation Layer
Q7 For how long can responsibility be held? Manufacturer / Agent
Q8 Does starting this affect anything else? User / Manufacturer / Agent
Q9 Does stopping this cause a problem? User / Manufacturer / Agent

The agent resolves what it can from context, and surfaces only the
unanswered questions to the user. It does not guess. It asks.

For /guard — ISE model for multi-agent execution tracing

In parallel sprint environments (10–15 agents via Conductor), /careful + /freeze prevents accidental file edits. But it does not track which agent executed what, with what declared intent, and what effect it had across the session.

The ISE (Intent–State–Effect) model adds:

  • Intent: declared before execution (what the agent is trying to do)
  • State: system state at the moment of execution (branch, env, files in scope)
  • Effect: recorded after execution (what actually changed)

This makes /guard sessions auditable and supports post-sprint /retroanalysis.
More importantly, it begins building the habit of declaration-first execution — the same habit that the execution-boundaries project argues must exist at the manufacturer and developer level, before any agent runs.


The Larger Picture

This PR is small in scope. But it points toward something larger.

As AI agents gain the ability to execute actions with real-world
consequences — in codebases, in production systems, in physical devices — the industry will need a norm where:

  1. Device manufacturers declare execution boundaries in their hardware and firmware, before an AI is ever connected to it.
  2. Agent developers declare execution boundaries in their skill
    definitions, before a user ever invokes them.
  3. AI agents verify against those declared boundaries at runtime, and block — not warn — when the boundary conditions are not met.

gstack is well-positioned to model step 2 for the software agent ecosystem.
The execution-boundaries project is building toward step 1 for the physical world, and step 2 for agents in general.

These are the same problem. The execution boundary does not care whether the actuator is a robot arm or a bash command.


Implementation Path

This PR is a proposal and design note. Not yet a SKILL.md diff.

Next steps if the approach is accepted:

  1. Update careful/SKILL.md.tmpl to integrate 9-question check before destructive command execution
  2. Update guard/SKILL.md.tmpl to add ISE logging per agent action
  3. Optionally: new /execution-log command to surface the ISE audit trail
  4. Longer term: a boundary declaration format in SKILL.md that agent
    developers fill out per action — making gstack skills the reference
    implementation for agent-level execution boundaries

References

Labels
enhancement · safety · discussion

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions