docs: translate Korean text to English for improved LLM performance#7
Closed
junhoyeo wants to merge 2 commits into
Closed
docs: translate Korean text to English for improved LLM performance#7junhoyeo wants to merge 2 commits into
junhoyeo wants to merge 2 commits into
Conversation
Collaborator
Author
|
Hey! Here's a prompt you can use to have your LLM agent review and merge this PR: The rationale is explained in detail in the PR description - mainly about LLM token efficiency and preventing accidental Korean responses when reading project files. |
Owner
|
Great take but those files are all ai slops- I removed them already on my project. Anyways thank you for your contribution. |
calebrosario
pushed a commit
to calebrosario/oh-my-opencode
that referenced
this pull request
Feb 21, 2026
…-support ui: Simplify usage display with cleaner formatting
luobosibing2
pushed a commit
to luobosibing2/oh-my-opencode
that referenced
this pull request
Feb 23, 2026
…vements LGTM! Type-safe helpers properly eliminate unsafe as any assertions.
luobosibing2
pushed a commit
to luobosibing2/oh-my-opencode
that referenced
this pull request
Feb 23, 2026
…-yeongyu#309) Critical fixes: - Bug #1: SubagentStop hook defaulted all agents to 'failed' because SDK doesn't provide `success` field. Now defaults to 'completed' when undefined. - Bug code-yeongyu#4: Token stats lost across TokenTracker instances. Constructor now restores session stats from global state for the same session ID. - Bug code-yeongyu#5: Ultrawork session isolation bypassed when both session IDs were undefined (undefined === undefined). Now rejects all falsy session IDs. High priority fixes: - Bug code-yeongyu#6: Cancel skill force-clear missed 12+ state files (boulder, hud-state, subagent-tracking, checkpoints, etc). Added comprehensive list. - Bug code-yeongyu#7: HUD semverCompare() returned NaN on pre-release versions like "3.9.5-beta". Fixed to use parseInt and handle pre-release ordering. - Bug code-yeongyu#8: Silent JSON parse failures in critical state readers. Added error logging to ralph and ultrawork state readers. - Bug code-yeongyu#9: Stale task detection had no default behavior when onStaleSession callback was not configured. Now auto-cleans after 2x threshold. - Bug code-yeongyu#10: Hardcoded 3-architect assumption in validation. Extracted to REQUIRED_ARCHITECTS constant. Medium priority fixes: - Bug code-yeongyu#11: Auto-invoke history used non-atomic writes. Now uses atomicWriteJson to prevent corruption from concurrent sessions. - Bug code-yeongyu#12: Ecomode docs said "all tasks" use Haiku, contradicting the escalation paths. Clarified to "most tasks" with upgrade criteria. - Bug code-yeongyu#13: Added safeUnlinkSync/safeRmSync utilities to prevent ENOENT crashes during cleanup operations. - Bug code-yeongyu#14: State files containing user prompts written with 0644 permissions. Now writes with 0600 (owner-only read/write). - Bug code-yeongyu#15: Model names recorded inconsistently (e.g., 'claude-3-5-haiku' vs 'claude-haiku-4'). Now normalizes at recording time via exported normalizeModelName(). Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
4 tasks
thevanshit
pushed a commit
to thevanshit/oh-my-openagent
that referenced
this pull request
May 24, 2026
…iewer gate
Closes five gaps in the ultrawork prompt versus codex-plugins' parallel
directive, applied to all three model variants (default/Claude, GPT, Gemini)
with prompt-engineering entropy gate (each addition encodes a distinct
binding boolean, not narrative reinforcement):
1. TDD-MANDATORY (was conditional "when test infrastructure exists"):
every production change follows RED -> GREEN -> SURFACE. Failing test
first, capture assertion msg, smallest change to flip green, exercise
real surface, capture artifact. Exemption whitelist: formatting /
comment-only / version bump / rename-only, each must be justified
in writing; unjustified exemption = rejection.
2. Scenario contract (was free-form Test Plan Template): require 3+
scenarios upfront covering happy path, edge (boundary / empty /
malformed / concurrent), adjacent-surface regression. Each scenario
binds a binary pass condition, a real-surface artifact source, and
a test file + test id written test-first.
3. RED->GREEN evidence capture (was "all tests pass"): every scenario
requires TWO captured artifacts -- RED assertion msg before the
change AND GREEN assertion msg after -- alongside the real-surface
artifact (tmux / curl / browser / Playwright / computer-use /
CLI stdout / parsed config / DB diff). Tests are the floor (always
required); surface artifact is the ceiling (also required).
4. Durable notepad: mktemp -t ulw-*.md with append-only sections
(Plan, Scenarios, Now, Todo, Findings, Learnings). Survives context
loss; resume by re-reading.
5. Reviewer gate: trigger when user said strictly / rigorously /
"deeply", or task touches 3+ files / 20+ turns / 30+ min, or it is
refactor / migration / perf / security work. Reviewer verdict is
binding ("looks good but..." = rejection). Loop until unconditional
approval.
Plus: TODO format upgraded from vague "track every step" to atomic
`path: <action> for <scenario-id> -- verify by <check>` with a GOOD
test-first / impl pair example and a BAD list including
"production code before its failing test".
Per-variant adaptation:
- default.ts (Claude): full structured sections.
- gpt.ts (GPT-5.x): outcome-first prose, shorter prose per gpt-5.5 guide.
- gemini.ts: explicit enforcement framing + anti-optimism checkpoint
upgraded with a TDD-violation question (code-yeongyu#7).
Verified by:
- bun test src/hooks/keyword-detector/ (119 pass / 0 fail).
- lsp_diagnostics clean on all three files.
- Module-load smoke test confirms each exported message string parses
and contains the new section anchors (TDD MANDATORY, SCENARIO
CONTRACT, DURABLE NOTEPAD, REVIEWER GATE).
Char deltas (directive body only):
- default 13646 -> 17144 (+26%)
- gpt 6740 -> 9215 (+37%, was the leanest start)
- gemini 14196 -> 16136 (+14%)
Existing tests only assert presence of "ULTRAWORK MODE ENABLED!" which
is preserved verbatim in every variant.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
When Korean text exists in project files (like
AGENTS.md, planning docs, notes), LLMs sometimes:Technical Background: Why LLM Performance Drops with Korean
1. Tokenization Inefficiency
Korean is significantly less token-efficient than English:
Korean uses Hangul syllable blocks (가, 나, 다...), and most tokenizers (like BPE) were optimized for Latin scripts. This means:
2. Training Data Imbalance
Most LLMs are trained on predominantly English data:
Less exposure → weaker pattern learning → worse performance on:
3. Morphological Complexity
Korean is agglutinative — meanings are built by attaching suffixes:
Each variation may be tokenized differently, fragmenting the semantic relationship the model needs to learn.
4. Vocabulary Coverage
Tokenizer vocabularies are limited (~32K-100K tokens). With English-centric training:
5. Attention Dilution
With more tokens per semantic unit:
Practical Impact: For the same context window (e.g., 8K tokens), you can fit roughly 2-3x less Korean content than English content semantically.
Changes
AGENTS.mdtest-dir/nested/AGENTS.mdnotepad.mdlocal-ignore/comment-checker-ts-plan.mdNote:
README.ko.mdwas intentionally not translated as it's the designated Korean version of the documentation.