[Proposal] (safety): enhance /careful and /guard with structured execution judgment


[Proposal] (safety): enhance /careful and /guard with structured execution judgment 

## Summary

The current `/careful` and `/guard` skills warn before destructive commands by matching known-dangerous patterns (rm -rf, DROP TABLE, force-push, etc.).

This is necessary — but not sufficient.

Pattern matching catches *known* dangers. It misses the structural question:
**does the AI actually have enough information to judge whether this action is safe to execute right now?**

This PR proposes integrating the **9-Question Protocol** from the
[execution-boundaries](https://github.com/Jang-woo-AnnaSoft/execution-boundaries) project into `/careful` and `/guard`, alongside a companion ISE(Intent–State–Effect) model for multi-agent execution tracing.

---

## The Root Cause: Boundaries Are Not Declared

Before an AI can judge whether an action is safe, someone must have declared what the boundaries of that action are.

This is the part that is almost always missing.

**The precondition for safe execution is not AI judgment.
It is the prior declaration of boundaries by the manufacturer or developer.**

Whether the actor is a physical IoT device or a software agent, the structure is identical:

- If boundaries are declared → AI can verify against them
- If boundaries are not declared → AI fills the gap with inference
- Inference-filled gaps are not judgment. They are gaps in accountability.

This is not a failure of AI capability. It is a failure of design.

The [execution-boundaries](https://github.com/Jang-woo-AnnaSoft/execution-boundaries) project makes this argument formally. 

Its 9-Question Protocol is the minimal, actionable output: 
a fixed set of questions that must have declared answers before any action may execute. If even one question has no answer, execution must be blocked — not warned, blocked.

> "AI does not fill gaps. It reveals gaps."

This principle applies equally to:

- **IoT / physical devices**: a hardware manufacturer must declare what the device does, what boundaries must never be crossed before human review. 

We are already preparing experiments to apply this boundary-declaration model to real IoT deployments.

- **Software agents**: an agent developer (or the maintainer of a skill pack like gstack) must declare, per action, what the execution effect is, what the safety boundary is, and who is responsible for each answer.

gstack's SKILL.md files are already close to this idea — they define what each skill does and when it should be invoked. 
The gap is that they do not yet declare execution boundaries at the action level. This PR proposes closing that gap for the safety-critical skills first.

---

## The Problem in gstack Today

When `/careful` is active and Claude is about to run a command, the current behavior is:

1. Match command against a known-dangerous pattern list
2. If matched → warn and ask for confirmation
3. If not matched → proceed

This works for obvious cases. 

But consider:
- `git push --force` on the wrong remote (not dangerous in isolation, fatal in context)
- `DELETE FROM orders WHERE status='pending'` (destructive only if intent is wrong)
- An agent in a 10-agent parallel sprint deleting a shared resource that another agent depends on

The problem is not that the command is on a list.

The problem is that **intent, context, and effect have not been explicitly declared or confirmed** — by the developer who wrote the skill, or by the user who invoked it.

---

## The Proposal

### For `/careful` — 9-Question Protocol at execution time

Before executing any action flagged as potentially destructive, the agent must be able to answer all nine questions. If even one is unanswered, execution is blocked.

| # | Question | Responsible Party |
|---|----------|-------------------|
| Q1 | What is the intent of this action? | User / Manufacturer |
| Q2 | What happens in reality when this executes? | Manufacturer / Agent |
| Q3 | What boundary must never be crossed? | Manufacturer / Agent |
| Q4 | In what context is this action valid? | User |
| Q5 | What event has occurred? (start / stop) | Observation Layer |
| Q6 | How far has the goal been reached? | Observation Layer |
| Q7 | For how long can responsibility be held? | Manufacturer / Agent |
| Q8 | Does starting this affect anything else? | User / Manufacturer / Agent |
| Q9 | Does stopping this cause a problem? | User / Manufacturer / Agent |

The agent resolves what it can from context, and surfaces only the
unanswered questions to the user. It does not guess. It asks.

### For `/guard` — ISE model for multi-agent execution tracing

In parallel sprint environments (10–15 agents via Conductor), `/careful + /freeze` prevents accidental file edits. But it does not track **which agent executed what, with what declared intent, and what effect it had** across the session.

The ISE (Intent–State–Effect) model adds:

- **Intent**: declared before execution (what the agent is trying to do)
- **State**: system state at the moment of execution (branch, env, files in scope)
- **Effect**: recorded after execution (what actually changed)

This makes `/guard` sessions auditable and supports post-sprint `/retro`analysis. 
More importantly, it begins building the habit of declaration-first execution — the same habit that the execution-boundaries project argues must exist at the manufacturer and developer level, before any agent runs.

---

## The Larger Picture

This PR is small in scope. But it points toward something larger.

As AI agents gain the ability to execute actions with real-world
consequences — in codebases, in production systems, in physical devices — the industry will need a norm where:

1. **Device manufacturers** declare execution boundaries in their hardware and firmware, before an AI is ever connected to it.
2. **Agent developers** declare execution boundaries in their skill
   definitions, before a user ever invokes them.
3. **AI agents** verify against those declared boundaries at runtime, and block — not warn — when the boundary conditions are not met.

gstack is well-positioned to model step 2 for the software agent ecosystem.
The execution-boundaries project is building toward step 1 for the physical world, and step 2 for agents in general.

These are the same problem. The execution boundary does not care whether the actuator is a robot arm or a bash command.

---

## Implementation Path

This PR is a proposal and design note. Not yet a SKILL.md diff.

Next steps if the approach is accepted:

1. Update `careful/SKILL.md.tmpl` to integrate 9-question check before destructive command execution
2. Update `guard/SKILL.md.tmpl` to add ISE logging per agent action
3. Optionally: new `/execution-log` command to surface the ISE audit trail
4. Longer term: a boundary declaration format in SKILL.md that agent
   developers fill out per action — making gstack skills the reference
   implementation for agent-level execution boundaries

---

## References

- [execution-boundaries repo](https://github.com/Jang-woo-AnnaSoft/execution-boundaries)
- [9-Question Protocol](https://discuss.huggingface.co/t/the-9-question-protocol-for-responsible-ai-actions/173045)
- [ISE Model](https://discuss.huggingface.co/t/ise-intent-state-effect-model-isolating-judgment-for-cost-optimization-and-explainable-safety/172853)
- [Stop Turning Buttons into Switches](https://discuss.huggingface.co/t/stop-turning-buttons-into-switches/173264)
- [Making the Physical World Callable for AI](https://discuss.huggingface.co/t/making-the-physical-world-callable-for-ai/172627)

Labels 
enhancement · safety · discussion


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] (safety): enhance /careful and /guard with structured execution judgment #1091

Summary

The Root Cause: Boundaries Are Not Declared

The Problem in gstack Today

The Proposal

For `/careful` — 9-Question Protocol at execution time

For `/guard` — ISE model for multi-agent execution tracing

The Larger Picture

Implementation Path

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

#	Question	Responsible Party
Q1	What is the intent of this action?	User / Manufacturer
Q2	What happens in reality when this executes?	Manufacturer / Agent
Q3	What boundary must never be crossed?	Manufacturer / Agent
Q4	In what context is this action valid?	User
Q5	What event has occurred? (start / stop)	Observation Layer
Q6	How far has the goal been reached?	Observation Layer
Q7	For how long can responsibility be held?	Manufacturer / Agent
Q8	Does starting this affect anything else?	User / Manufacturer / Agent
Q9	Does stopping this cause a problem?	User / Manufacturer / Agent

[Proposal] (safety): enhance /careful and /guard with structured execution judgment #1091

Description

Summary

The Root Cause: Boundaries Are Not Declared

The Problem in gstack Today

The Proposal

For /careful — 9-Question Protocol at execution time

For /guard — ISE model for multi-agent execution tracing

The Larger Picture

Implementation Path

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

For `/careful` — 9-Question Protocol at execution time

For `/guard` — ISE model for multi-agent execution tracing