Skip to content

ci: add graphify structural impact analysis to PR review and structure audit#567

Merged
andreatgretel merged 10 commits into
mainfrom
andreatgretel/feat/graphify-structural-rebase
May 5, 2026
Merged

ci: add graphify structural impact analysis to PR review and structure audit#567
andreatgretel merged 10 commits into
mainfrom
andreatgretel/feat/graphify-structural-rebase

Conversation

@andreatgretel

Copy link
Copy Markdown
Contributor

📋 Summary

Add a graphify-based AST analysis tool that builds a
directed graph of the codebase (~6s, zero LLM cost) to surface architectural risk before the
review agent starts. Integrates into both the PR review workflow (pre-computed context for the
agent) and the Wednesday structure audit (week-over-week graph diff).

Hat tip to @nabinchha for suggesting graphify as the right tool for this.

🔗 Related Issue

Closes #472 (Phase 4 — structural analysis)

🔄 Changes

✨ Added

  • structural_impact.py
    standalone analysis tool with two modes:
    • --changed-files (PR review): extracts changed AST entities against the full codebase graph,
      reports risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction violations, and
      cross-package dependencies
    • --full (structure audit): god node rankings, cross-package edge summary table, import
      violation detection, and graph diff against the previous week's cached graph
  • Step 3.5 in the
    review-code skill
    agent reads the pre-computed structural impact to calibrate review depth (HIGH risk = extra
    scrutiny on blast radius and backward compatibility)
  • graphify-out/ added to .gitignore (9MB graph cache is CI-only, persisted via actions/cache)

🔧 Changed

  • agentic-ci-pr-review.yml
    new step installs graphifyy in a lightweight temp venv and runs structural analysis before the
    agent starts. Tool is read from the base-branch checkout (base-agents/) so fork PRs cannot
    inject code that runs with GH_TOKEN in scope. Base-branch checkout expanded from
    .agents/recipes to also include .agents/tools
  • agentic-ci-daily.yml
    conditional graphifyy install for the structure suite; graphify-out/ added to the cache path
    for week-over-week graph diffs
  • pr-review recipe
    agent reads /tmp/structural-impact-{{pr_number}}.md and appends findings to the review
  • structure recipe
    new Section 4 with graphify instructions, baseline tracking, and week-over-week diff guidance

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • agentic-ci-pr-review.yml
    the security model: structural_impact.py runs from base-agents/ (base-branch sparse
    checkout), not from the PR's files, to prevent fork PRs from injecting code that executes
    with secrets in scope. The --repo-root flag ensures the tool still analyzes the PR's
    source code correctly despite living in a different checkout
  • structural_impact.py
    new 361-line tool; pure AST analysis, no DD imports at runtime. Depends on
    graphifyy (tree-sitter based) and networkx (transitive)

📊 Retroactive validation on 3 merged PRs

Ran graphify retroactively against PRs #502, #545, and #557 to validate the signal quality:

PR Graphify risk Key signal Actual critical findings
#502 (skip.when, 22 files) HIGH — 5 god nodes, 9 violations, 12/73 clusters DatasetBuilder (88 deps) + ExecutionGraph (89 deps) flagged as highest-risk allow_resize propagation bug and positional merge divergence — both in the flagged entities
#557 (jinja engine, 20 files) HIGH — config→engine violation, 52-dep env class config.__getattr__() calls into engine jsonpath filter bug at exactly that cross-package seam
#545 (async bridge, 6 files) MEDIUM — 9 clusters for 6 files Error hierarchy connectivity (52 deps) Fragile error string matching — in the flagged error class

In all 3 cases, the most critical review findings correlated with the highest-connectivity
entities in the graphify report. The risk level (LOW/MEDIUM/HIGH) matched actual review
effort needed.

🧪 Testing

  • structural_impact.py smoke-tested in both modes (--changed-files, --full)
    against the live DD codebase
  • --repo-root flag tested from external checkout directory (simulates CI base-agents/ path)
  • graphify API signatures verified against graphifyy==0.4.23
  • Pre-commit hooks pass (ruff, yaml, merge conflict check)
  • Unit tests added/updated — N/A: CI tool, no testable library code
  • E2E tests added/updated — validated retroactively against 3 merged PRs (see above)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A: .agents/ tooling, not package architecture

…e audit

Add a graphify-based AST analysis tool that builds a directed graph of the
codebase (~2s, no LLM calls) to detect architectural impact. Integrates
into both the PR review workflow (pre-computed before claude runs) and the
Wednesday structure audit (with week-over-week diff).

PR review: extracts changed files against the full codebase graph, reports
risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction
violations, and cross-package dependencies. Output saved to /tmp and read
by the review agent.

Structure audit: produces god node rankings, cross-package edge summary
table, import violation detection, and graph diff against previous week's
cached graph. Baselines saved for runner memory trend tracking.
- Fix KeyError: god_nodes() returns 'degree' not 'edges' (3 call sites)
- Fix deduped vs raw violation count inconsistency in baselines.json
- Security: run structural_impact.py from base-branch checkout so fork
  PRs cannot inject code that executes with GH_TOKEN in scope
- Add --repo-root flag so the tool resolves package paths correctly when
  invoked from a different checkout directory
- Replace make install-dev + .venv with lightweight /tmp/graphify-venv
  (only graphifyy needed, saves ~2min CI per PR review)
- Add graphify-out/ to .gitignore (9MB graph cache is CI-only)
@andreatgretel andreatgretel requested a review from a team as a code owner April 21, 2026 20:42
@github-actions

Copy link
Copy Markdown
Contributor

Summary

PR #567 adds a graphify-based AST structural-impact tool and wires it into two CI workflows:

  • agentic-ci-pr-review.yml: a pre-review step installs graphifyy into a temp venv, runs the new .agents/tools/structural_impact.py against changed Python files, and writes /tmp/structural-impact-{PR}.md for the review agent to consume.
  • agentic-ci-daily.yml (structure suite): conditionally installs graphify and caches graphify-out/ for week-over-week graph diffs.
  • New tool (.agents/tools/structural_impact.py, 361 LOC): builds a directed AST graph of the three DD packages, flags god nodes, import-direction violations (interface → engine → config), cross-package edges, and emits risk levels (LOW/MEDIUM/HIGH).
  • Recipe/skill updates: pr-review recipe reads the pre-computed report and appends it to the review; structure recipe adds a Section 4 with baseline tracking; review-code skill adds Step 3.5 describing how to calibrate review depth based on risk.

Retroactive validation against 3 merged PRs is cited in the description; signal correlation with actual critical findings looks plausible.

Findings

Security — correctly handled, one residual concern

The critical security story (tool must run from trusted base-branch code, not PR code) is executed well:

  • Sparse checkout expanded from .agents/recipes to also include .agents/tools; the script is invoked from base-agents/.agents/tools/structural_impact.py, not from the PR checkout.
  • --repo-root "${{ github.workspace }}" cleanly separates trusted tool location from untrusted analysis target.
  • continue-on-error: true on the structural step means a tool regression or graphify pin drift will not block PR review.

One residual concern in agentic-ci-pr-review.yml:

CHANGED_PY=$(gh pr diff "$PR_NUMBER" --name-only | grep '\.py$' || true)
...
--changed-files $CHANGED_PY \

$CHANGED_PY is deliberately unquoted to word-split into multiple args, but that also subjects the values to IFS word-splitting and glob expansion. A PR introducing a .py file whose name contains whitespace or glob characters would produce incorrect arg boundaries. Git allows such filenames. Low risk (the tool treats each path as a file and filters by Path.exists()/.suffix), but consider reading into an array:

mapfile -t CHANGED_PY < <(gh pr diff "$PR_NUMBER" --name-only | grep '\.py$' || true)
...
--changed-files "${CHANGED_PY[@]}"

This also fixes a latent bug: if CHANGED_PY is only newlines (no matches), the unquoted expansion produces zero args and nargs="+" will error — but the early if [ -z "$CHANGED_PY" ] guard covers that case today.

Correctness

  1. _dedup key truncation (structural_impact.py:60): dedup key uses label[:30] on each side. Two distinct entities that share the first 30 characters of their label would be merged. Probably rare in practice but not obviously intended — prefer the full label, or document the truncation as intentional.

  2. pip install ... | tail -3 (both workflows): piping suppresses the pip exit code under the default set +o pipefail. A failed install would proceed to run structural_impact.py, which would then ImportError on graphify. continue-on-error catches this, but the log would be confusing. Either set pipefail for these steps or drop | tail -3.

  3. Global _REPO_ROOT mutation (structural_impact.py:21, 308): the global _REPO_ROOT pattern works but is fragile — _PACKAGE_SUBDIRS is resolved at import time against the default, while _collect_source_files reads _REPO_ROOT at call time. If anyone later changes _PACKAGE_SUBDIRS to absolute paths, the override breaks silently. Pass repo_root explicitly through the call chain, or evaluate subdirs inside the function.

  4. _get_package interface detection (structural_impact.py:53-57): matches on the literal directory segment data-designer followed by src. This correctly skips data-designer-engine / data-designer-config because the earlier substring checks win, but the fallthrough logic is subtle. A brief comment explaining the ordering dependency would save a future reader.

  5. NetworkX API compatibility (structural_impact.py:273-277): the try: node_link_graph(..., edges="links") / except TypeError: node_link_graph(...) fallback covers a real API change across NetworkX versions. Good defensive coding; worth a one-line comment referencing the NetworkX version boundary so it isn't mistaken for dead code later.

Style / project conventions

  • Missing type annotations: project requires full annotations (AGENTS.md). _build_graph, _cross_package_edges, _changed_files_mode, _full_mode, main are missing return types on some paths; _build_graph returns dict which hides its structure — consider a TypedDict or a small @dataclass. The returned G is a NetworkX DiGraph; typing it as dict defeats downstream type checking.
  • from __future__ import annotations: present at top of file. Good.
  • Absolute imports only: satisfied.
  • SPDX header: present.
  • Lazy heavy imports (AGENTS.md "Fast imports" invariant): graphify.* and networkx are imported at module top-level. For a standalone CI tool invoked directly this is fine — the invariant is about keeping data_designer import time low, and this file is in .agents/tools/, not in the package. Worth confirming the tool isn't auto-imported from anywhere in data_designer.

Test coverage

  • The checklist marks unit tests as "N/A: CI tool, no testable library code." This is partially true but several pure helpers are clearly testable and carry real logic: _get_package, _dedup, the risk-level decision tree in _changed_files_mode, and the direction-violation classifier in _cross_package_edges. A small test module (even a single pytest file in .agents/tools/tests/) would protect against silent regressions — especially since graphify is pinned loosely (graphifyy, no version) and API drift is the most likely failure mode.
  • Retroactive validation on 3 merged PRs is good signal-quality evidence, but it isn't a regression guard.

Performance

  • ~6s per PR and ~2s for full audit (per description) against a small codebase is acceptable.
  • The full graph is built even for PR review (only changed files would not be enough to compute cross-package context). That is the correct trade-off.
  • graphify-out/ at ~9MB adds to the cache footprint but only for the structure suite. Fine.

Minor / nits

  • agentic-ci-pr-review.yml:166: pip install graphifyy --quiet with no version pin. Recommend pinning (e.g., graphifyy==0.4.23, the version the PR says was tested against) to match the existing graphify API signatures verified against graphifyy==0.4.23 claim in the description. Same for the daily workflow.
  • structural_impact.py:9-12: docstring usage examples are helpful; consider adding --full --previous-graph ... example too.
  • _fmt_gods mixes bullet-list output (affected) and table output (full) in one function via a flag — two small formatters would read more cleanly, but this is purely cosmetic.
  • Recipe change in pr-review/recipe.md asks the agent to append structural impact before the Verdict section. The skill's Step 3.5 also tells the agent to use it to calibrate review depth. These two instructions are compatible but slightly duplicative — confirm the agent applies both (use during review, then append the raw report). Not a blocker.

Verdict

Approve with minor follow-ups. The core design — trusted-path execution of a pure-AST tool with --repo-root pointing at the untrusted checkout — is sound, the fallback story (continue-on-error) is correct, and the integration into both review and audit surfaces is clean.

Recommended before merge:

  1. Quote/array-expand $CHANGED_PY in the workflow (latent filename-handling bug).
  2. Pin graphifyy version in both pip install steps.
  3. Either set pipefail or drop | tail -3 around pip install so install failures surface cleanly.

Recommended after merge:
4. Add a minimal pytest for _get_package, _dedup, and the risk-level branches in _changed_files_mode — graphify API drift is the most likely future break.
5. Replace the dict return type of _build_graph with a TypedDict/dataclass for proper typing.

No blocking issues; security model is correct; retroactive validation supports the signal claim.

@greptile-apps

greptile-apps Bot commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a graphify-based AST structural analysis tool that runs before the review agent starts, surfacing god nodes, import direction violations, and cross-package dependencies for each PR. It integrates into both the PR review workflow (pre-computed structural context) and the Wednesday structure audit (week-over-week graph diff).

  • structural_impact.py: new 411-line standalone tool with --changed-files (PR mode) and --full (audit mode); derives changed_node_ids from the already-built graph, deduplication uses full labels, and MEDIUM risk reasons correctly reflect the actual trigger — all issues surfaced in earlier review rounds have been addressed.
  • CI workflows: agentic-ci-pr-review.yml runs the tool from the base-branch sparse checkout (base-agents/) so fork PRs cannot inject code with GH_TOKEN in scope; agentic-ci-daily.yml adds a conditional install and extends the actions/cache path to persist graphify-out/ for week-over-week diffs. Both workflows pin to graphifyy==0.4.23.
  • Agent recipes / skill: pr-review recipe reads the pre-computed impact file before running /review-code; review-code SKILL inserts Step 3.5 to calibrate review depth based on risk level; structure recipe gains Section 4 with graphify instructions and baseline tracking.

Confidence Score: 5/5

This PR is safe to merge. All tooling runs from the base-branch checkout, fork-PR code injection is not possible, and the graphify version is pinned.

The security model is correctly implemented: structural_impact.py is always executed from the base-branch sparse checkout rather than from the PR's files, preventing fork PRs from running arbitrary code with GH_TOKEN in scope. The graph round-trip (to_jsonjson_graph.node_link_graph) is compatible based on graphify's node-link serialization. Issues from prior review rounds (ID stability, dedup truncation, risk-reason accuracy, version pinning) are all addressed. The step is non-blocking (continue-on-error: true), so failures degrade gracefully rather than blocking reviews.

No files require special attention. The workflow security model and the Python tool logic are both straightforward.

Important Files Changed

Filename Overview
.agents/tools/structural_impact.py New 411-line analysis tool; builds a full-codebase directed AST graph and reports risk level, god nodes, import violations, and cross-package deps. Previously-flagged issues (ID stability, dedup truncation, risk-reason accuracy) addressed in prior commits. Logic is sound.
.github/workflows/agentic-ci-pr-review.yml Adds structural impact pre-computation step; correctly runs the tool from the base-branch sparse checkout so fork PRs cannot inject code. Version pin, continue-on-error, and --repo-root flag all look correct.
.github/workflows/agentic-ci-daily.yml Adds conditional graphify install for the structure suite and expands the actions/cache path to persist graphify-out/ for week-over-week graph diffs. Pinned to 0.4.23. Straightforward.
.agents/recipes/pr-review/recipe.md Adds pre-read of the structural impact file and appends the section to the review output before the Verdict. Clean integration with the existing recipe flow.
.agents/recipes/structure/recipe.md Adds Section 4 with graphify instructions, baseline tracking, and week-over-week diff guidance. Summary section extended with god-node and cross-package-edge metrics. No issues.
.agents/skills/review-code/SKILL.md Inserts Step 3.5 to read the pre-computed structural impact file (if present) and calibrate review depth accordingly. Graceful skip when the file is absent.
.gitignore Adds graphify-out/ to gitignore so the CI-generated graph cache is never committed. Correct.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant BC as Base-Branch Checkout (base-agents/)
    participant WS as PR Workspace (github.workspace)
    participant TMP as /tmp

    GH->>GH: Sparse-checkout base branch (.agents/recipes + .agents/tools)
    GH->>GH: gh pr diff --name-only | grep .py > changed-py.txt
    GH->>BC: python base-agents/.agents/tools/structural_impact.py --repo-root workspace --changed-files ...
    BC->>WS: _collect_source_files() reads all *.py via --repo-root
    BC->>BC: extract() -> build_from_json() -> cluster() -> god_nodes()
    BC->>BC: changed_node_ids derived from graph (source_file match)
    BC->>TMP: Write structural-impact-{pr}.md (risk + findings)
    GH->>GH: Run review agent
    GH->>TMP: Agent reads structural-impact-{pr}.md (Step 3.5)
    GH->>TMP: Agent appends Structural Impact section to review-{pr}.md
Loading

Reviews (6): Last reviewed commit: "Merge branch 'main' into andreatgretel/f..." | Re-trigger Greptile

Comment thread .github/workflows/agentic-ci-pr-review.yml Outdated
Comment thread .github/workflows/agentic-ci-daily.yml
Comment thread .agents/tools/structural_impact.py Outdated
Pin graphifyy==0.4.23 in both CI workflows to prevent
breaking changes from unpinned installs. Fix _dedup()
label truncation at 30 chars that could merge distinct
entities sharing a common prefix.
Comment thread .agents/tools/structural_impact.py Outdated
…filenames

Replace unquoted $CHANGED_PY word-split with mapfile + array
expansion to prevent glob expansion and correctly handle
filenames containing spaces or special characters.
Comment thread .agents/tools/structural_impact.py Outdated
Derive changed_node_ids from the already-built graph by matching
source_file paths instead of running a separate extraction pass.
Removes implicit dependency on graphify ID stability across
independent extractions.

Fix MEDIUM risk reason to reflect the actual trigger (cluster
spread vs high-connectivity entity) instead of always reporting
cluster count.
…stale artifacts

Split the workflow step to isolate GH_TOKEN from graphifyy execution,
preventing a compromised package release from exfiltrating write-scoped
tokens.

Scan both edge directions in _cross_package_edges so inbound dependents
and violations where the changed node is the target are visible. Detect
deleted files and report them as a risk signal.

Include relation type in dedup key so distinct edge types between the
same labels are not collapsed.

Clean stale /tmp artifacts before running analysis to prevent reruns
from reading old reports.
@nabinchha

Copy link
Copy Markdown
Contributor

Thanks for putting this together, @andreatgretel — the retroactive validation against three merged PRs is a really nice way to make the case for the signal quality.

Summary

Adds a graphify-based AST analyzer at .agents/tools/structural_impact.py that builds a directed graph of the codebase and produces a risk report consumed by both the PR-review workflow (pre-computed structural context for the agent) and the Wednesday structure audit (with week-over-week diff). Implementation matches the stated intent, and several rounds of self-review/Codex feedback have already addressed the most critical issues (version pinning, dedup correctness, MEDIUM risk reason, ID stability, edge-coverage scan-both-directions, base-branch security model, and changed-files word-splitting).

Findings

Warnings — Worth addressing

.agents/tools/structural_impact.py:85 — Missing type annotation on G parameter

  • What: _cross_package_edges(G, node_ids: set[str] | None = None) doesn't annotate G. STYLEGUIDE.md is explicit: "ALWAYS add type annotations to all functions, methods, and class attributes."
  • Why: Pure function in a new file, so this sets the precedent for follow-on contributions. _build_graph returns a bare dict for the same reason.
  • Suggestion: Since networkx is on its way out of the repo, annotating with nx.DiGraph would point in the wrong direction. The pragmatic answer for a private helper is G: Any — STYLEGUIDE discourages Any for return types, but it's acceptable on params when the underlying type is intentionally not part of the project's surface. For _build_graph, swap the dict-literal return for a small @dataclass (STYLEGUIDE prefers dataclass over TypedDict for "typed struct with no validation"), which also lets callers do analysis.graph instead of analysis["graph"]:
from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True, slots=True)
class _Analysis:
    graph: Any
    communities: dict[int, list[str]]
    cohesion: dict[str, Any]
    god_nodes: list[dict[str, Any]]

def _build_graph(files: list[Path]) -> _Analysis: ...
def _cross_package_edges(G: Any, node_ids: set[str] | None = None) -> ...: ...

Call sites become G, communities, gods = analysis.graph, analysis.communities, analysis.god_nodes. If a richer type for G is wanted later without taking a dep, a Protocol with just the methods we use (.nodes, .edges, .degree, number_of_nodes, number_of_edges) is the structural-typing route.

.agents/tools/structural_impact.py:30-34_PACKAGE_SUBDIRS silently misses files in new packages

  • What: The package-source roots are hardcoded to the three current packages. If someone adds a new package (packages/data-designer-foo/...), _collect_source_files won't pick it up and _get_package won't classify it, so its files would silently report "0 AST entities" with no warning.
  • Why: The graphify report would falsely show a localized change for what could be a major new-package PR. This is a quiet failure mode — the kind the structural analyzer is meant to catch, not produce.
  • Suggestion: Either (a) discover packages dynamically by globbing packages/*/src/data_designer/ and inferring layer from the parent dirname, or (b) add a guard: when a --changed-files path doesn't fall under any known subdir, emit a visible warning line in the report (e.g., _Note: 3 changed files outside known package roots — analysis may be incomplete_). Option (a) is more future-proof; option (b) is a minimal hedge.

Suggestions — Take it or leave it

.agents/tools/structural_impact.py — No unit tests for pure helpers

  • What: PR description marks unit tests as N/A ("CI tool, no testable library code"), but _get_package, _dedup, and _collect_source_files are pure functions that don't depend on graphify and are exactly the kind of thing that breaks silently during refactors.
  • Why: AGENTS.md says "no untested code paths"; even though .agents/tools/ is meta-tooling rather than package code, the parser-classifier logic carries enough subtle behavior (substring vs path-segment matching, dedup key composition) that tests would pay for themselves the first time someone touches it. Codex/Greptile already caught two latent bugs here in the same review cycle.
  • Suggestion: A small tests/test_structural_impact.py covering _get_package (engine/config/interface/external paths), _dedup (collisions on shared prefixes — exactly the bug fixed in bfc5db69), and a smoke test for _collect_source_files against a tmp dir would be ~30 lines and prevent regressions.

.agents/tools/structural_impact.py:313 — Bare except Exception

  • What: The diff fallback catches Exception broadly to print _Could not diff against previous graph: {exc}_.
  • Why: STYLEGUIDE: "Prefer specific exception types over bare except." The intent (graceful degradation) is fine, but a broad catch will also swallow KeyboardInterrupt-adjacent bugs and typos.
  • Suggestion: Narrow to the realistic set, e.g., except (json.JSONDecodeError, KeyError, TypeError, OSError) as exc:. If graphify defines its own diff exception, include that.

.agents/tools/structural_impact.py:297, 317 — In-function imports

  • What: from networkx.readwrite import json_graph (line 297, inside conditional) and from graphify.export import to_json (line 317, unconditional within _full_mode).
  • Why: STYLEGUIDE: "Place imports at module level, not inside functions (exception: unavoidable for performance reasons)." to_json is always called when _full_mode runs, so deferring it doesn't save anything; networkx is already a transitive dep of graphifyy and loaded eagerly.
  • Suggestion: Hoist both to the existing top-level import block.

.agents/tools/structural_impact.py:25, 340 — Module-level mutable _REPO_ROOT

  • What: _REPO_ROOT is a module global mutated in main() via global _REPO_ROOT, then read by _collect_source_files, _rel, and _full_mode.
  • Why: Common CLI shortcut, but it makes the helpers harder to unit-test (you'd need to mutate a global to vary the input) and couples three otherwise-pure functions to module state.
  • Suggestion: Optional refactor — thread repo_root through as a parameter on the helpers that need it. Not blocking; just easier on future-you when adding tests per the Suggestion above.

.github/workflows/agentic-ci-daily.yml:124-126Install graphify shares the project venv

  • What: The PR-review workflow isolates graphifyy in /tmp/graphify-venv, but the daily workflow installs it into the project's .venv directly (.venv/bin/python -m pip install graphifyy==0.4.23). networkx is currently a real data-designer-engine dep so this is invisible today.
  • Why: Once the planned networkx removal lands (it's currently only used in samplers per offline context), this step will silently re-introduce networkx into .venv as a transitive of graphifyy. That partially undoes the dep cleanup, and you only notice if someone audits pip list mid-job.
  • Suggestion: Use the same temp-venv pattern as the PR-review workflow:
- name: Install graphify
  if: matrix.suite == 'structure'
  run: |
    python -m venv /tmp/graphify-venv
    /tmp/graphify-venv/bin/python -m pip install graphifyy==0.4.23 2>&1 | tail -3

Then invoke structural_impact.py via /tmp/graphify-venv/bin/python in the audit step. Symmetrical to the PR-review setup, and isolates graphifyy's transitives from the project venv permanently.

.agents/tools/structural_impact.py:196 vs 79 — Path-normalization asymmetry

  • What: changed_paths = {str(p.resolve()) for p in changed_files} resolves symlinks, but the source files passed to extract(files) (via _collect_source_files) are built as _REPO_ROOT / sub and only inherit whatever resolution _REPO_ROOT had. If _REPO_ROOT is already resolved and the package dirs contain no symlinks (the realistic CI case), they match. If a self-hosted runner ever has a symlinked checkout dir or a bind-mounted source tree, the sets could diverge and silently produce "0 entities."
  • Why: Retroactive validation against three real PRs shows it works today, so this is latent rather than active. But it's the same class of silent-mismatch bug we've fixed twice already in this PR.
  • Suggestion: Either drop .resolve() on changed_paths and trust _REPO_ROOT resolution, or also .resolve() the paths inside _collect_source_files. Symmetric is what we want.

What Looks Good

  • The split-step security model is the right call — structural_impact.py runs in a separate workflow step with no env: block, so even a compromised graphifyy release can't see GH_TOKEN or the API key. The base-branch sparse checkout for .agents/tools (so fork PRs can't inject the runner code itself) closes the obvious hole. The expanded SECURITY comment makes the intent legible to future maintainers.
  • The retroactive validation table (PRs feat: add skip.when conditional column generation #502, fix: bridge model.generate() to agenerate() for custom columns in async engine #545, feat: add RunConfig jinja rendering engine #557 with correlation between graphify risk and actual review findings) is exactly the kind of evidence that justifies a new tool. It also makes the LOW/MEDIUM/HIGH thresholds defensible rather than arbitrary.
  • Iterative response to review feedback is impressive: in six commits the PR went through five distinct fixes (pinning, dedup truncation, MEDIUM risk reason, ID stability via graph-derived nodes, both-direction edge coverage), each with a clear commit message explaining what changed and why. The integration is meaningfully more robust because of it.

Verdict

Ship it (with nits) — Only Warnings around type annotations and the hardcoded package list, plus a handful of Suggestions. Nothing blocking, but the type annotations are worth addressing in this PR since they set the convention for the new file. Everything else can land as-is or in a follow-up.


This review was generated by an AI assistant.

…w except, isolate daily graphify

- structural_impact.py:
  - replace bare _build_graph dict return with frozen _Analysis dataclass
  - add G: Any annotation on _cross_package_edges (STYLEGUIDE: all params typed)
  - hoist `from graphify.export import to_json` and
    `from networkx.readwrite import json_graph` to module top
    (no perf justification for deferred import)
  - narrow `except Exception` in graph-diff fallback to
    (JSONDecodeError, KeyError, TypeError, OSError)
- agentic-ci-daily.yml: install graphifyy into /tmp/graphify-venv instead of
  the project .venv, matching agentic-ci-pr-review.yml. Keeps graphify's
  transitives (networkx) out of the project venv permanently.
- structure/recipe.md: invoke the tool via /tmp/graphify-venv/bin/python
  to match the workflow change.
A new package under packages/ that isn't in _PACKAGE_SUBDIRS is silently
absent from the graph - the analyzer would falsely report LOW risk with
0 entities. Add a _Note line in the changed-files report when any changed
or deleted file lives under packages/<unknown>/, so the failure mode the
analyzer is supposed to surface isn't itself silent.

_KNOWN_PACKAGE_DIRS is derived from _PACKAGE_SUBDIRS so future additions
stay in sync without a second source of truth.
@andreatgretel

Copy link
Copy Markdown
Contributor Author

Thanks @nabinchha, this was a careful review and the type-annotation push set the right precedent for the file. Almost everything went into two follow-up commits on the branch:

508f119 — STYLEGUIDE-cited fixes + daily venv isolation

  • _build_graph now returns a frozen _Analysis dataclass; call sites use attribute access. _cross_package_edges(G: Any, ...) per your suggestion - graphify intentionally not in the project's typed surface, and networkx is on its way out so nx.DiGraph would point in the wrong direction.
  • Hoisted from graphify.export import to_json and from networkx.readwrite import json_graph to module top - no perf justification for either being deferred.
  • Narrowed the graph-diff fallback to (json.JSONDecodeError, KeyError, TypeError, OSError), matching what the try block can realistically raise.
  • Daily workflow now installs graphifyy into /tmp/graphify-venv, symmetric to the PR-review setup. Tweaked the structure recipe to invoke the tool via that interpreter so graphify's transitives stay out of .venv permanently - aligns with the planned networkx removal you mentioned.

07b0e74 — unknown-package hedge

  • Picked option (b) from your suggestion. _KNOWN_PACKAGE_DIRS is derived from _PACKAGE_SUBDIRS (single source of truth), and the changed-files report now emits a _Note: changes touch unknown package(s) (...)_ line when any changed or deleted file lives under packages/<unknown>/. The silent-miss failure mode the analyzer was producing is now self-reporting.
  • Smoke-tested with a synthetic packages/data-designer-foo/... path and confirmed the warning fires there but stays quiet for known packages.

Deferred to #601 — unit tests for _get_package, _dedup, _collect_source_files, plus the related _REPO_ROOT parameter-threading and path-resolution symmetry. The three are cheap to do together and the test wiring needs a small CI decision (.agents/tools/ is outside the package suites), so it lands cleaner as its own change.

@nabinchha

Copy link
Copy Markdown
Contributor

Thanks for the careful work here, @andreatgretel — the retroactive validation table on PRs #502/#545/#557 is a really nice touch and gives me good confidence that the signal is worth the complexity.

Summary

This PR adds a graphify-based AST analysis tool (structural_impact.py) wired into both the agentic PR-review workflow (pre-computed risk + god-node context for the agent) and the Wednesday structure audit (week-over-week graph diff). The implementation matches the stated intent in the description, the security model (running the tool from a base-branch sparse checkout) is sound, and the prior round of Greptile/Codex feedback has already been addressed in commits bfc5db69, 55f0bdda, and a63e74da.

I had a few thoughts on top of what's already been resolved.

Findings

Warnings — Worth addressing

.github/workflows/agentic-ci-pr-review.yml:170-176 — Filenames from gh pr diff flow into argparse without a -- terminator

  • What: mapfile -t CHANGED_PY < /tmp/changed-py.txt pulls names straight from gh pr diff --name-only and expands them as positional values to --changed-files. A fork PR can add or rename a Python file to something like --output=foo.py (it still ends in .py, so it passes the upstream grep '\.py$'), and argparse will happily consume it as the --output option instead of treating it as a filename. I confirmed this locally:
    Namespace(changed_files=['normal.py'], output='/tmp/evil', repo_root='/tmp')
    
  • Why: It doesn't break the security model (the script runs from base-agents/ and only writes to /tmp and graphify-out/), but an attacker-controlled filename can silently misdirect the report path, suppress analysis (-h), or set --repo-root to a junk dir so the analysis comes back empty / misleading. Given the whole point of the tool is "surface architectural risk before review starts", a quietly suppressed report on a hostile PR is the failure mode you most want to avoid.
  • Suggestion: Two small options — either prefix each path with ./ so it can never look like a flag, or filter dash-prefixed names out before passing them in:
    mapfile -t CHANGED_PY < <(sed 's|^|./|' /tmp/changed-py.txt)
    Belt-and-braces would be to also assert not raw.startswith("-") inside main() after parsing.

.agents/tools/structural_impact.py:70-79 and :34-43_get_package and _PACKAGE_SUBDIRS are independent sources of truth

  • What: _KNOWN_PACKAGE_DIRS is now derived from _PACKAGE_SUBDIRS (nice — the unknown-package warning stays in sync), but _get_package() still hard-codes the substrings "data-designer-engine", "data-designer-config", and the "data-designer/.../src" heuristic. If we ever add a fourth package, _PACKAGE_SUBDIRS updates would let the unknown-package check pass — but _get_package() would silently return "" for every node from that package, and _cross_package_edges() would then drop every edge touching it (the if not u_pkg or not v_pkg or u_pkg == v_pkg: continue line). The full-mode audit and the changed-files mode would both report no cross-package activity for the new package, with no warning.
  • Why: It's the same "silent absence" failure mode the unknown-package note was designed to prevent, but for _full_mode (which has no such warning) and for the cross-package edge counting in changed-files mode.
  • Suggestion: Drive _get_package() from _PACKAGE_SUBDIRS too — match each package's subdir prefix against the resolved path. Something like:
    _PKG_NAME_BY_SUBDIR = {
        Path("packages") / "data-designer-engine" / "src" / "data_designer" / "engine": "engine",
        Path("packages") / "data-designer-config" / "src" / "data_designer" / "config": "config",
        Path("packages") / "data-designer"        / "src" / "data_designer":            "interface",
    }
    
    def _get_package(filepath: str) -> str:
        try:
            rel = Path(filepath).resolve().relative_to(_REPO_ROOT)
        except ValueError:
            return ""
        for sub, name in _PKG_NAME_BY_SUBDIR.items():
            if rel.parts[: len(sub.parts)] == sub.parts:
                return name
        return ""
    This also fixes the (very minor) edge case where a path like /some/where/data-designer-engine-fork/src/... would be misclassified by the current substring match.

Suggestions — Take it or leave it

.agents/tools/structural_impact.py:231 and :311{"rank": ..., **g} order may be silently overridden

  • What: Both affected_gods = [{"rank": gods.index(g) + 1, **g} for g in gods if ...] and ranked_gods = [{"rank": i, **g} for i, g in enumerate(gods[:10], 1)] set "rank" first and then spread g. If graphify ever starts returning a "rank" key inside its god-node dicts (it doesn't today on 0.4.23, but the dict shape isn't documented as stable), our computed rank gets silently overwritten by graphify's, and _fmt_gods reads the wrong value.
  • Why: Cheap defense against an upstream change we're not in control of.
  • Suggestion: Flip to {**g, "rank": ...} so the explicit rank wins. Same shape, same cost.

.agents/tools/structural_impact.py:142,144 — Label truncation at 50 chars can still collide in _dedup

  • What: Now that _dedup keys on the full stored label, the truncation at [:50] happens upstream when we build the entry, so two distinct entities sharing a 50-char prefix still collapse into one row. Rarer than the 30-char case Greptile flagged, but the same class of bug.
  • Why: At the point we're computing the dedup key we already have G.nodes[u]["label"] available — there's no benefit to using the truncated form as the identity.
  • Suggestion: Either keep both a label (truncated for display) and an id field on the entry and dedup on id, or just drop the [:50] and truncate at format time inside _fmt_violations / _fmt_cross_pkg.

.agents/recipes/structure/recipe.md:117 — Wall-clock claim out of sync with the PR description

  • What: The recipe says the graphify pass runs in ~2s, but the PR description (and validation runs) put it at ~6s.
  • Why: Tiny, but the recipe is a prompt the agent reads — over-stating speed could mislead reviewers about whether to bother re-running.
  • Suggestion: Bump the recipe to match the description (~6s) or drop the specific number.

.agents/tools/structural_impact.py:304-330_full_mode doesn't surface the unknown-package warning

  • What: The unknown-package detection in _changed_files_mode is great. _full_mode is the long-running audit that's most likely to be the thing that surfaces "we added a new package and forgot to update the analyzer", but it has no equivalent check.
  • Why: A new packages/data-designer-foo/ would walk through the full audit reporting nothing about itself.
  • Suggestion: After _collect_source_files(), glob packages/*/ and warn for any directory not in _KNOWN_PACKAGE_DIRS. Cheap and keeps the two modes symmetric.

What Looks Good

  • Security model on the workflow. Splitting the structural-impact step into its own block, sourcing structural_impact.py from base-agents/, and using the --repo-root override to keep the analysis pointed at the PR's source is a clean way to keep GH_TOKEN out of graphifyy's blast radius. The comment on line 145-147 explaining why makes it easy for the next reader to preserve.
  • Retroactive validation. Running the tool against three already-merged PRs and showing that the high-connectivity entities the tool flagged correlate with the actual critical findings is exactly the right way to justify a new heuristic. That table is what turned this from "another CI tool" into "this is worth the maintenance cost".
  • Iterative response to feedback. The commit log walking through each fix (pinned graphifyy, full-label dedup, MEDIUM-risk reasoning, derive changed_node_ids from the built graph, scan both edge directions, narrow the bare except, hoist imports, frozen dataclass) makes the change easy to land — every previous round of feedback has a corresponding commit and a reply on the thread.

Verdict

Ship it (with nits) — once the argparse-injection guard is in (the only finding I'd actually want addressed before merge), everything else is take-it-or-leave-it. The package-naming source-of-truth point is worth tackling, but it's not blocking — adding a new package is rare, and the unknown-package warning gives us a soft tripwire today.


This review was generated by an AI assistant.

@andreatgretel andreatgretel merged commit b502dd3 into main May 5, 2026
49 checks passed
@andreatgretel andreatgretel deleted the andreatgretel/feat/graphify-structural-rebase branch May 5, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: agentic CI - automated PR reviews and scheduled maintenance

2 participants