gstack-global-discover: session counts conflate originator types and undercount CC by ~5x

## Summary

`gstack-global-discover` reports `sessions.{claude_code,codex,gemini}` counts that misrepresent actual development activity in two ways. `/retro global` (which builds narrative on these counts) inherits both errors.

Tested on gstack 1.26.2.0 with a 31-day window across 9 repos.

## Problem 1 — codex `sessions.codex` conflates Codex Desktop with codex_exec subagent calls

The script counts every `~/.codex/sessions/.../rollout-*.jsonl` whose `payload.cwd` resolves to a repo. But codex rollout files are produced by very different agents, and the `originator` field distinguishes them:

| originator | what it is | dev signal |
|---|---|---|
| `Codex Desktop` | user driving codex CLI interactively | yes — real codex dev |
| `codex_exec` | `codex exec -` (cron / scripts / subagent invocations) | no — usually CC firing codex as cross-model review or external voice |
| `Claude Code` | codex via CC's MCP / subagent integration | no — codex is the tool, CC is the driver |

Real numbers from one user's 31d window:

```
ak-ai-vela:        codex_exec=97  Codex Desktop=1   ← reported as 'codex 98 sessions' but vela was 100% CC dev
ak-where-to-go:    codex_exec=2   Codex Desktop=92  ← real codex dev phase (matches user's recollection)
ai-blogger-lab:    codex_exec=88  Codex Desktop=10  ← codex_exec are CC subagents during dev/review, not codex dev
ak-fund-advisor:   codex_exec=49  Codex Desktop=1   ← all CMR subagents, not codex dev
ak-cc-wiki:        codex_exec=57  Codex Desktop=0   ← CC subagents during wiki writing
```

**Effect**: `/retro global` narrative concluded \"codex was the primary execution tool (414 sessions across 7 repos)\" when in fact codex drove dev for **only one repo's middle phase** (~92 Desktop sessions in where-to-go). The other ~309 codex_exec entries were CC firing codex as cross-model review subagent.

## Problem 2 — CC session count under-reports by ~5x on cron-driven projects

Discovery reported `claude_code: 129` total sessions in the 31d window. A direct scan of `~/.claude/projects/*/`*.jsonl` filtered by `mtime >= since` shows ~621 in the same window.

The biggest gap is `ai-blogger-lab`:
- Discovery: 6 CC sessions
- Direct scan: ~450 CC jsonl files (all CCR / cron-driven; no human in loop)

Plausible causes (didn't fully diagnose):
- Discovery may skip CC project dirs that map to git worktree paths (`-claude-worktrees-*` / `--worktrees-*` show up as separate dirs)
- Or it dedupes by some session_id key that collapses identical-cwd entries

Either way the count is wrong by enough to flip narrative conclusions.

## Suggested fix

1. **Parse codex `payload.originator`** in `scanCodex()` and bucket separately:
   - `codex_desktop_sessions` — interactive dev signal
   - `codex_exec_invocations` — subagent / cron / scripted (annotate as \"called by another agent\" in retro)
   - `claude_code_subagent_invocations` — codex via CC

   Retro narrative should then attribute Codex Desktop counts to \"codex dev\" and codex_exec counts to \"<caller> using codex as subagent\" — not lump them.

2. **Investigate the CC undercount** — likely worktree path normalization. Repro: any project whose CC sessions live in `WorkSpace-<repo>--worktrees-<branch>/` style dirs.

3. **Annotate `/retro global` output** that \"sessions\" = \"tool invocations / file count\", not \"distinct dev sessions\". Especially with CCR / cron drivers in the mix, the same repo can show 450 CC \"sessions\" with zero interactive work.

## Repro

```bash
# Originator breakdown for a 31d window:
python3 -c \"
import os, json, time
since = time.time() - 31*86400
root = os.path.expanduser('~/.codex/sessions')
counts = {}
for dirpath, _, files in os.walk(root):
    for f in files:
        if not (f.startswith('rollout-') and f.endswith('.jsonl')): continue
        fp = os.path.join(dirpath, f)
        if os.stat(fp).st_mtime < since: continue
        with open(fp,'rb') as fh: buf = fh.read(131072)
        line = buf.split(b'\n',1)[0].decode('utf-8','replace')
        d = json.loads(line)
        p = d.get('payload',{})
        repo = os.path.basename(p.get('cwd',''))
        orig = p.get('originator','?')
        counts[(repo,orig)] = counts.get((repo,orig),0)+1
for k,v in sorted(counts.items(), key=lambda x:-x[1]):
    print(f'{k[0]:35s} {k[1]:25s} {v:4d}')
\"

# Compare against discovery output:
~/.claude/skills/gstack/bin/gstack-global-discover --since 31d
```

## Why this matters

Without these fixes, `/retro global` produces confidently-wrong narratives. In my case it told me codex was my main dev tool (driving 7 repos) when codex actually drove dev for one repo's middle phase. CC was the primary driver everywhere — including powering blogger-lab's cron pipeline through CCR. The session counts were the entire foundation of the retro's \"tool usage analysis\" section.

Same shape as a class of bugs already documented in vault wiki: **a metric that conflates distinct things will mislead any narrative built on it, regardless of how thorough the narrative is.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gstack-global-discover: session counts conflate originator types and undercount CC by ~5x #1315

Summary

Problem 1 — codex `sessions.codex` conflates Codex Desktop with codex_exec subagent calls

Problem 2 — CC session count under-reports by ~5x on cron-driven projects

Suggested fix

Repro

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

originator	what it is	dev signal
`Codex Desktop`	user driving codex CLI interactively	yes — real codex dev
`codex_exec`	`codex exec -` (cron / scripts / subagent invocations)	no — usually CC firing codex as cross-model review or external voice
`Claude Code`	codex via CC's MCP / subagent integration	no — codex is the tool, CC is the driver

gstack-global-discover: session counts conflate originator types and undercount CC by ~5x #1315

Description

Summary

Problem 1 — codex sessions.codex conflates Codex Desktop with codex_exec subagent calls

Problem 2 — CC session count under-reports by ~5x on cron-driven projects

Suggested fix

Repro

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Problem 1 — codex `sessions.codex` conflates Codex Desktop with codex_exec subagent calls