Unify token counting between check and tokens count#146
Conversation
There was a problem hiding this comment.
Pull request overview
This PR makes token counting consistent across waza check and waza tokens count by switching Skill.UnmarshalText() to use the shared BPE tokenizer counter and avoiding line-ending normalization that changes BPE tokenization results.
Changes:
- Introduce
tokens.DefaultCounter()(init-once shared BPE counter) andtokens.CountLines()utility. - Update skill parsing and checks/CLI token counting to use BPE counting on raw content (no CRLF→LF normalization).
- Add/adjust tests for line counting, shared counter behavior, and LRU cache concurrency.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/tokens/tokens.go | Adds CountLines and shared DefaultCounter() initialization. |
| internal/tokens/tokens_test.go | Adds test coverage for CountLines. |
| internal/skill/skill.go | Uses shared BPE counter during UnmarshalText() token calculation. |
| internal/checks/token_limits.go | Stops CRLF normalization before token counting. |
| internal/checks/token_budget_test.go | Adds test asserting token budget uses shared BPE counting on raw content (LF/CRLF). |
| cmd/waza/tokens/helpers.go | Removes local countLines helper (replaced by tokens.CountLines). |
| cmd/waza/tokens/count.go | Stops CRLF normalization and uses tokens.CountLines. |
| cmd/waza/tokens/compare.go | Uses tokens.CountLines for line counting. |
| internal/tokens/bpe/lru.go | Makes LRU cache operations mutex-protected for concurrent access. |
| internal/tokens/bpe/lru_test.go | Adds a concurrent-access test for the LRU cache. |
| cmd/waza/dev/helpers_test.go | Adds output-matching helper that masks token counts while still verifying initial token count. |
| cmd/waza/dev/loop_test.go | Updates dev loop tests to use the new output-matching helper and initial-skill token counts. |
| cmd/waza/cmd_check_test.go | Adds integration test ensuring check token budget matches tokens count for the same file. |
ff20c2c to
7898cb1
Compare
There was a problem hiding this comment.
Pull request overview
Aligns token counting across waza check and waza tokens count by switching skill token computation to the shared BPE tokenizer and preserving raw line endings during tokenization to match BPE behavior.
Changes:
- Use a shared default BPE counter (
tokens.DefaultCounter) for consistent token counts. - Stop CRLF→LF normalization before tokenizing; centralize line counting via
tokens.CountLines. - Add/update tests and introduce
testutil.StripTokenCountsto make CLI output assertions robust across platforms/tokenization changes.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/tokens/tokens.go | Adds CountLines and introduces a shared default counter used by NewCounter. |
| internal/tokens/tokens_test.go | Adds CountLines unit tests. |
| internal/tokens/bpe/lru.go | Makes LRU cache concurrency-safe via a mutex. |
| internal/tokens/bpe/lru_test.go | Adds a concurrent access test for the LRU cache. |
| internal/testutil/tokenstrip.go | Adds helper to mask token-count numbers in CLI output tests. |
| internal/skill/skill.go | Switches Skill.Tokens to BPE-based counting and uses tokens.CountLines. |
| internal/checks/token_limits.go | Stops normalizing CRLF to LF before counting tokens. |
| internal/checks/token_budget_test.go | Adds coverage ensuring token budgeting uses shared default tokenizer and raw content. |
| cmd/waza/tokens/helpers.go | Removes local countLines helper (replaced by tokens.CountLines). |
| cmd/waza/tokens/count.go | Stops normalizing CRLF to LF and uses tokens.CountLines. |
| cmd/waza/tokens/compare.go | Uses tokens.CountLines for line counts. |
| cmd/waza/tokens/count_test.go | Masks token counts in output comparisons. |
| cmd/waza/tokens/check_test.go | Masks token counts in output comparisons. |
| cmd/waza/dev/helpers_test.go | Adds requireOutputMatch to mask token counts but still validate the displayed token count. |
| cmd/waza/dev/loop_test.go | Updates assertions to use requireOutputMatch and validates token banner value against parsed skill tokens. |
| cmd/waza/cmd_check_test.go | Adds an integration-ish test ensuring check and tokens count agree for the same file. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #146 +/- ##
=======================================
Coverage ? 73.52%
=======================================
Files ? 141
Lines ? 15858
Branches ? 0
=======================================
Hits ? 11660
Misses ? 3351
Partials ? 847
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Aligns token counting across waza check and waza tokens count by switching Skill.Tokens to use the BPE tokenizer (and preserving raw line endings) so both commands report consistent counts.
Changes:
- Introduces a shared default BPE token counter and reuses it across call sites.
- Stops normalizing CRLF to LF before tokenization; updates line counting to a shared helper.
- Updates/strengthens tests to be resilient to platform line endings and token-count variability in formatted output.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/tokens/tokens.go | Adds CountLines and a shared DefaultCounter; routes NewCounter through the shared default for BPE. |
| internal/tokens/tokens_test.go | Adds unit tests for CountLines including CRLF inputs. |
| internal/tokens/bpe/lru.go | Makes LRU cache protected by a mutex for concurrent access. |
| internal/tokens/bpe/lru_test.go | Adds a concurrency test for the LRU implementation. |
| internal/testutil/tokenstrip.go | Adds helper to mask token-related numbers in CLI output assertions. |
| internal/skill/skill.go | Switches Skill.Tokens to BPE counting and uses shared CountLines. |
| internal/checks/token_limits.go | Stops normalizing line endings prior to token counting. |
| internal/checks/token_budget_test.go | Adds regression test to ensure token budget uses shared tokenizer across LF/CRLF raw content. |
| cmd/waza/tokens/helpers.go | Removes local countLines helper (now using tokens.CountLines). |
| cmd/waza/tokens/count.go | Stops CRLF normalization; uses tokens.CountLines. |
| cmd/waza/tokens/compare.go | Uses tokens.CountLines for line deltas. |
| cmd/waza/tokens/count_test.go | Computes expected token counts via DefaultCounter and masks token counts for table output. |
| cmd/waza/tokens/check_test.go | Masks token counts in output comparisons for stability. |
| cmd/waza/dev/loop_test.go | Masks token counts and validates displayed token count vs initial skill’s BPE tokens. |
| cmd/waza/dev/helpers_test.go | Adds requireOutputMatch helper using token masking + first token-count validation. |
| cmd/waza/cmd_check_test.go | Adds end-to-end test that check and tokens count agree for the same skill file. |
richardpark-msft
left a comment
There was a problem hiding this comment.
One request - let's file an issue to move away from having to screen-scrape our own output, but it's fine for now and at last we're consolidating, so it's still an improvement.
waza check(Token Budgets section) andwaza tokens countreported different token counts for the same file becausecheckuses theSkill.Tokensvalue set byskill.UnmarshalText()as a simple multiple of the input's character count, whereaswaza tokens countuses the BPE tokenizer. With this PR,skill.UnmarshalText()uses the BPE tokenizer viatokens.DefaultCounter(), makingSkill.Tokensmore accurate for all consumers and consistent withtokens countoutput.While I was at it, I also stopped normalizing line endings--replacing
\r\nwith\n--before tokenizing because the BPE algorithm encodes these sequences as a different number of tokens. I added a test helper to prevent this change breaking tests on certain platforms.