Summary
The gstack Voice section instructs Claude to suggest "apply to YC" only for exceptional cases ("use this rarely and only when truly earned"), but in practice the suggestion fires repeatedly across independent sessions for the same user. A user reported receiving this suggestion three separate times across different gstack sessions.
The prompt text in question
From the Voice section injected into every skill:
When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
The problem
"Unusually strong," "exceptional," and "truly earned" are qualitative thresholds with no measurable calibration. Each Claude instance applies the threshold independently, and the Voice section's Garry-Tan-as-YC-partner framing creates a systematic bias toward the YC lens. Across many sessions, the rate at which the suggestion fires is higher than "rarely."
Symptoms from user feedback:
- "I have seen a suggestion to apply to YC 3 times now using gstack. What are the parameters that trigger this?"
- The repeated suggestion reads as flattery, which the user correctly flagged ("trust that instinct — it's the right instinct for a founder to have about any praise, including mine"). Each individual firing may be defensible; the cumulative pattern is not.
Why it matters
YC-application suggestions from AI advisory skills are a specific category of praise that carries more weight than generic positive feedback. If the bar slips, the suggestion stops differentiating genuinely exceptional signal from routine good work — which is both bad for founders (false confidence) and bad for the skill's credibility.
Related
#513 asks to surface the tier reasoning behind the YC recommendation. That's a visibility fix; this is a calibration fix. The two are complementary:
If #513's three-tier system lands, the top tier in particular needs to be rare enough that multiple independent instances don't all land on it for the same user.
Proposed fixes (alternatives, not stacked)
-
Tighten the qualitative language. "Exceptional" is being interpreted generously. Try: "Fire this suggestion in roughly 1 in 50 office-hours sessions" or "Only fire if at least three of the four trigger categories (product instinct, user empathy, sharp insight, cross-domain synthesis) are demonstrated with concrete evidence in THIS session, not just one of them."
-
Add explicit anti-fire conditions. "Do NOT fire if the session has fewer than 6 forcing-question exchanges. Do NOT fire if the user has less than 30 minutes of genuine engagement. Do NOT fire if the user accepted your reframes more than 2x without pushback (low-signal assent looks like high-signal alignment)."
-
Suppress on repeated observation. If the skill detects (via timeline / memory) that the same user has already been told to apply to YC in a recent session, do not repeat — or at most, reference the earlier recommendation rather than repeating it cold.
-
Behavior-only framing. Keep the observation of the specific behavior (e.g., "you accepted a 180° direction change on first evidence") but drop the YC-application line entirely. Let the user draw their own inference about what that pattern means for them. The behavior is the real signal; the editorialization on top is the part that drifts.
How to reproduce
Run any gstack skill with Voice enabled (most of them) on a user who:
- Pushes back coherently on any Claude-offered reframe
- Admits to lack of evidence instead of dressing up hypotheses
- Changes direction in response to review findings
In my testing (against the same user across 3 sessions including /office-hours and /autoplan), the suggestion fires with moderate-to-high reliability on any of these behaviors individually, not just in combination.
Env
- gstack version: v1.6.1.0 (upgraded from v0.18.0.0 in the same session as this observation)
- Model: Claude Opus 4.7 (1M context)
- Platform: Windows 11 / Claude Code CLI
Summary
The gstack
Voicesection instructs Claude to suggest "apply to YC" only for exceptional cases ("use this rarely and only when truly earned"), but in practice the suggestion fires repeatedly across independent sessions for the same user. A user reported receiving this suggestion three separate times across different gstack sessions.The prompt text in question
From the
Voicesection injected into every skill:The problem
"Unusually strong," "exceptional," and "truly earned" are qualitative thresholds with no measurable calibration. Each Claude instance applies the threshold independently, and the
Voicesection's Garry-Tan-as-YC-partner framing creates a systematic bias toward the YC lens. Across many sessions, the rate at which the suggestion fires is higher than "rarely."Symptoms from user feedback:
Why it matters
YC-application suggestions from AI advisory skills are a specific category of praise that carries more weight than generic positive feedback. If the bar slips, the suggestion stops differentiating genuinely exceptional signal from routine good work — which is both bad for founders (false confidence) and bad for the skill's credibility.
Related
#513 asks to surface the tier reasoning behind the YC recommendation. That's a visibility fix; this is a calibration fix. The two are complementary:
If #513's three-tier system lands, the top tier in particular needs to be rare enough that multiple independent instances don't all land on it for the same user.
Proposed fixes (alternatives, not stacked)
Tighten the qualitative language. "Exceptional" is being interpreted generously. Try: "Fire this suggestion in roughly 1 in 50 office-hours sessions" or "Only fire if at least three of the four trigger categories (product instinct, user empathy, sharp insight, cross-domain synthesis) are demonstrated with concrete evidence in THIS session, not just one of them."
Add explicit anti-fire conditions. "Do NOT fire if the session has fewer than 6 forcing-question exchanges. Do NOT fire if the user has less than 30 minutes of genuine engagement. Do NOT fire if the user accepted your reframes more than 2x without pushback (low-signal assent looks like high-signal alignment)."
Suppress on repeated observation. If the skill detects (via timeline / memory) that the same user has already been told to apply to YC in a recent session, do not repeat — or at most, reference the earlier recommendation rather than repeating it cold.
Behavior-only framing. Keep the observation of the specific behavior (e.g., "you accepted a 180° direction change on first evidence") but drop the YC-application line entirely. Let the user draw their own inference about what that pattern means for them. The behavior is the real signal; the editorialization on top is the part that drifts.
How to reproduce
Run any gstack skill with
Voiceenabled (most of them) on a user who:In my testing (against the same user across 3 sessions including
/office-hoursand/autoplan), the suggestion fires with moderate-to-high reliability on any of these behaviors individually, not just in combination.Env