You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thanks for gstack — it's been a core part of my Claude Code workflow
for two consecutive sprints. I wanted to share data on /review false
positive rate that might be useful for tuning the prompt.
Context
Project: Django + DRF + PostgreSQL full-stack app (PoC for construction
site management)
Sprint 2.5 (/review): 4 false positives out of 8 findings (50%)
Both runs raised concerns that were resolvable in under 5 minutes by
inspecting the relevant code or model definitions, suggesting the
adversarial review prompt may be raising hypotheses without first applying
basic self-verification.
Specific examples from Sprint 2.5
The four false positives all fit a pattern: "resolvable in <5 minutes by
viewing the actual code or running a simple grep".
#
Concern raised
Resolution
FP-1
dict.get() might be None-unsafe
Django form's cleaned_data is {}-initialized — visible by reading the form code
FP-2
rental.save() might lose fields
Standard Django ORM INSERT behavior for unsaved instances
FP-3
update_fields might miss updated_at
Field doesn't exist on the model — grep resolves immediately
F-3: Django 3.1+ _post_clean() calls validate_constraints() before save_model, causing UniqueConstraint to fire before the ServiceLayer
can resolve the conflict (real bug, surfaced only via browser QA)
These were valuable findings — they required cross-layer reasoning
(test/admin/form/DB) and were not resolvable by simple code inspection.
Suggested improvement
What worked for me was adding a self-check before reporting findings:
Before reporting a finding, ask:
Can I resolve this by view-ing 1-2 files in under 5 minutes?
Yes → resolve it, don't report it
No → report it with Y-10 evidence
Is this reproducible by existing pytest tests?
Yes → likely already covered, re-check before flagging
No → likely a real-environment issue, worth flagging
Is the answer in the framework's surface-level documentation?
Yes → skip (or just cite the docs)
No → genuine internal-behavior concern, worth flagging
I added this as a project-level rule in my CLAUDE.md. Will measure
Sprint 3+ false positive rates to validate.
Question
Would gstack be open to:
(a) Adding a "self-verification gate" to the /review prompt before
finding generation, or
(b) Documenting this pattern in gstack docs (e.g., as a known caveat
with Django/mature frameworks)?
Happy to contribute either way. Let me know if you'd like to see the full
context (sprint retrospective notes are public in my repo).
Description
Hi! Thanks for gstack — it's been a core part of my Claude Code workflow
for two consecutive sprints. I wanted to share data on
/reviewfalsepositive rate that might be useful for tuning the prompt.
Context
site management)
/shipadversarial review): 1+ false positive (Finding browse skill: default to sonnet to save tokens #8)/review): 4 false positives out of 8 findings (50%)Both runs raised concerns that were resolvable in under 5 minutes by
inspecting the relevant code or model definitions, suggesting the
adversarial review prompt may be raising hypotheses without first applying
basic self-verification.
Specific examples from Sprint 2.5
The four false positives all fit a pattern: "resolvable in <5 minutes by
viewing the actual code or running a simple grep".
dict.get()might be None-unsafecleaned_datais{}-initialized — visible by reading the form coderental.save()might lose fieldsupdate_fieldsmight missupdated_atgrepresolves immediatelyWhat was correctly identified (true positives)
The same
/reviewrun also correctly identified:rental.save()directly bypassessave_modelmutationtesting (real coverage gap)
_post_clean()callsvalidate_constraints()beforesave_model, causing UniqueConstraint to fire before the ServiceLayercan resolve the conflict (real bug, surfaced only via browser QA)
These were valuable findings — they required cross-layer reasoning
(test/admin/form/DB) and were not resolvable by simple code inspection.
Suggested improvement
What worked for me was adding a self-check before reporting findings:
I added this as a project-level rule in my
CLAUDE.md. Will measureSprint 3+ false positive rates to validate.
Question
Would gstack be open to:
/reviewprompt beforefinding generation, or
with Django/mature frameworks)?
Happy to contribute either way. Let me know if you'd like to see the full
context (sprint retrospective notes are public in my repo).
Happy hacking 🙏