Philip

Lift declarative artifacts into the right substrate. Diagrams, playbooks, and queries already encode dataflow or control flow; Philip turns them into runnable Burr state machines and Hamilton DAGs you can audit, replay, and introspect. Drive the resulting FSMs under MCP gating via Theodosia, or run any lifted artifact locally as a regular Python application.

Sources lifting to Burr state machines:

Ansible playbooks (YAML) — philip.from_playbook(path). Tasks become actions, conditions become guards, failures become classified transitions.
Mermaid stateDiagram-v2 — philip.from_mermaid(path). Paste a diagram, get a runnable FSM. Multi-outbound branches lift to _choice guards the actor picks at runtime.

Sources lifting to Hamilton DAGs (install the hamilton extra):

Mermaid flowchart / graph — philip.from_mermaid_flow(path). Each node becomes a Hamilton function whose parameter names declare its upstream dependencies, exactly mirroring the diagram's edges.
SQL with CTEs — philip.from_sql_cte(sql). Each CTE becomes a function; external tables become Driver inputs. Read a 500-line query as a typed, visualizable dataflow.

More sources land in subsequent minor releases: AWS Step Functions ASL, BPMN, dbt manifests, and additional DAG sources. Hamilton itself ships no from_X of its own; Burr ships no from_X of its own. Philip is the lift layer both substrates were missing.

import philip

# Ansible YAML -> Burr Application
app = philip.from_playbook("site.yml")

# Mermaid diagram -> Burr Application
app = philip.from_mermaid("docs/incident_response.mmd")

last, _, state = app.run(halt_after=["done", "escalate"])

report = philip.inspect("site.yml")
print(report.rendered_markdown())

# SQL with CTEs -> Hamilton module
module = philip.from_sql_cte(open("query.sql").read())
from hamilton.driver import Driver
Driver({}, module).visualize_execution(["query"], output_file_path="dag.png")

Why

Today there are two choices for AI-operated infrastructure. Run a playbook unattended (rigid, no judgment, can't adapt to context). Or let a free-form agent run shell commands (no constraint, no audit). Both extremes are wrong for most real ops work.

A third option exists. The playbook author writes the contract: which steps are reachable, in what order, where verification must happen, which failures route to recovery. The model (or human) operates within that contract. It picks, interprets logs, refuses when unsure, composes with other tools. It cannot invent steps, skip verification, or escape the procedure.

Three independent sources validate this architecture:

STRATUS (IBM Research, NeurIPS 2025) demonstrated FSM-organized SRE agents beat free-form ones by at least 1.5x on AIOpsLab and ITBench.
ITBench (IBM Research, Apache 2.0) is the open benchmark suite the STRATUS work was evaluated on.
Wikimedia's Spicerack is the operational existence proof. The most transparent SRE organization in the public world looked at Ansible in 2014, decided it was not controllable enough for production response, and built a Python framework with explicit phases and structural error handling from scratch.

Burr is the structured-Python substrate Spicerack would have used if it existed in 2014. Philip lifts your existing Ansible playbooks onto that substrate without rewriting anything.

Install

uv add philip-machine

Pulls ansible-core and ansible-runner transitively. Install Ansible collections via ansible-galaxy for your modules of choice:

ansible-galaxy collection install community.general community.docker ansible.posix

What you get over `ansible-playbook`

Situation	`ansible-playbook`	Philip + Burr
Mid-procedure decisions on runtime context	`when:` over already-known vars	Explicit transitions; an agent or human picks based on full state
Resume from arbitrary point	`--start-at-task` hack; registers lost	Burr persister rebuilds full state including registers
Approval gates	`pause:` is stdin-bound; AWX bolt-on	First-class states; approver acts through the same surface as any actor
Auditable rationale	Task logs only	Every transition logged with actor's choice and state snapshot
Counterfactual replay	None	`fork_at(sequence_id)` walks the alternate path
Composition with non-Ansible work	Shell out and pray	Coordinate Ansible steps alongside any Python in the same audit surface
Refusal as a structural action	Run or fail	First-class refusal transitions; "I don't know, escalating" routes structurally

ansible-playbook remains correct for deterministic, unattended, batch runs. Philip is the right tool when the playbook has decision points, needs verification gates, must be auditable for postmortem, or composes with non-Ansible work.

Supported subset

Philip lifts a defined subset of Ansible playbook syntax. Supported:

One play per file (multi-play raises UnsupportedPlaybookConstruct).
Tasks with name, exactly one module reference, and a module-args dict.
when: predicates (string expressions; lists are AND-joined).
register: capturing the full module result into a state key.
become: per-task or per-play.
failed_when: and ignore_errors: as guard transitions.
gather_facts: yes lowers to a leading ansible.builtin.setup action.
Play-level vars: populate with_state(...).
block: (group-only) inlines its tasks with the block's when: AND-propagated to each inner task.
include_tasks: and import_tasks: with literal filesystem paths.
notify: and handlers: with a deferred handler gated on _last_changed.
loop: and with_items: with literal list values.
set_fact: (bare and FQCN).

The following raise UnsupportedPlaybookConstruct:

rescue: and always: (deferred to a later release).
loop_control:, with_dict:, with_fileglob:, with_subelements:.
Jinja-templated loop: values.
Jinja-templated include_tasks: paths.
import_role:, include_role:, include:.
roles: blocks, pre_tasks:, post_tasks:.
serial:, strategy:, max_fail_percentage:, any_errors_fatal:.
Multi-play files.

The supported subset covers the common single-play remediation and day-2 procedures the FSM lift adds value to.

Mermaid stateDiagram-v2

import philip

app = philip.from_mermaid("incident_response.mmd")

Or directly from text:

text = """
stateDiagram-v2
    [*] --> Acknowledge
    Acknowledge --> Investigate : on_alert
    Investigate --> Mitigate
    Investigate --> Escalate : severity == "critical"
    Mitigate --> Verify
    Verify --> Done
    Escalate --> Done
    Done --> [*]
"""
app = philip.from_mermaid_text(text)

Lifting rules:

[*] --> X declares the entrypoint. Exactly one is required.
X --> [*] marks X as a terminal. Every terminal routes through a synthesized done action so the Burr graph is closed.
A --> B is an unconditional transition.
A --> B : label carries an edge label. Labels that look like Python predicates (contain comparison operators, and, or, not, in, is) lift directly to burr.core.expr guards. Labels that look like event names lift to _choice == "<label>" guards when the source has multiple outbound edges; otherwise they are documentation only.
Comments (%%), classDef, class, note, direction, and state declarations without a body are ignored.
Composite states (state X { ... }) raise MermaidLiftError. Inline them in your diagram before lifting.

Pair with Theodosia to mount a diagram as an MCP server:

import philip
import theodosia

theodosia.mount(
    philip.from_mermaid("incident_response.mmd"),
    name="incident",
).run()

Compose with Theodosia

Theodosia mounts a Burr Application as an MCP server. Philip lifts an Ansible playbook to a Burr Application. The composition is two lines:

import philip
import theodosia

theodosia.mount(philip.from_playbook("site.yml"), name="nginx-deploy").run()

Now any MCP client (Claude Code, Cursor, fast-agent, mcphost, a custom agent built on the Agent SDK) can drive the playbook step by step. The FSM enforces structural constraints. The model picks transitions at branch points. Every step is audit-trailed. Forking at any sequence_id gives you counterfactual replay for postmortem.

Theodosia and Philip are independent packages. Philip does not depend on Theodosia.

CLI

philip run     <playbook.yml>                 # execute end to end
philip inspect <playbook.yml>                 # static report: variables + failure topology
philip inspect <playbook.yml> --format json
philip graph   <playbook.yml> [--format mermaid|dot|text]
philip lint    <playbook.yml>                 # dry-convert with structural summary
philip emit    <playbook.yml>                 # round-trip lift -> emit canonical YAML

API

import philip

# Lift
app = philip.from_playbook("site.yml")
yaml_text = philip.to_playbook(app)

# Static introspection
report = philip.inspect("site.yml")
report.variables                 # variable provenance DAG
report.undefined_variables       # references with no defining site
report.unused_definitions        # bound but never referenced
report.failure_topology          # per-action FAILURE_KIND routing
report.actions_with_recovery     # actions with a true recovery branch
report.unhandled_failures        # actions where a failure has no transition
report.rendered_markdown()       # human-readable report

# Hand-write actions backed by Ansible modules
@philip.module_action("ansible.builtin.command", reads=["target"], writes=["uptime"])
def get_uptime(state):
    return {"cmd": "uptime"}

# Or call modules directly
result = philip.run_module("ansible.builtin.ping", {}, host="myhost")

# Connection management
host = philip.host(ansible_host="example.com", ansible_user="deploy")

# Polling sub-graphs
wait = philip.wait_until(
    "ansible.builtin.wait_for",
    args={"port": 8080, "timeout": 1},
    max_attempts=30,
)

Failure classification

Every module-backed action writes structural failure sentinels into state on each step:

_last_action
_last_failed
_last_changed
_last_unreachable
_last_msg
_last_failure_kind (one of ok, unreachable, auth_failed, timeout, module_error)

Transitions branch on these without each action having to opt in via writes=. The classification is conservative and pattern-based: the unreachable and failed result flags are trusted; the diagnostic msg is scanned for known phrases that indicate auth and timeout. The labels are loosely aligned with the MAST taxonomy.

License

Apache 2.0.

Development

LLM-assisted development was used during construction.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
bench		bench
examples		examples
skills/_examples/runbooks		skills/_examples/runbooks
src/philip		src/philip
tests		tests
vhs		vhs
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
REFERENCE.md		REFERENCE.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Philip

Why

Install

What you get over `ansible-playbook`

Supported subset

Mermaid stateDiagram-v2

Compose with Theodosia

CLI

API

Failure classification

License

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Philip

Why

Install

What you get over ansible-playbook

Supported subset

Mermaid stateDiagram-v2

Compose with Theodosia

CLI

API

Failure classification

License

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What you get over `ansible-playbook`

Packages