All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed temperature default breaking all LLM calls —
action.ymldefaulted temperature to0, but extended thinking (enabled in v0.7.0) requirestemperature=1per Anthropic's API. Every rule check silently errored with a 400, causing false "0 issues found" results. Changed default to1.
- Fixed circular import crashing all GitHub Action runs —
style_checker/github.pyshadowed the PyGithubgithubpackage. Whengithub_handler.pyimportedfrom github import Github, Python resolved it to the localgithub.pyinstead of PyGithub, causing a circular import. Renamedgithub.pytoaction.pyto eliminate the name collision.
- Extended thinking — Claude now reasons internally before responding, which eliminates false positives (0% vs ~43% previously). The model verifies each candidate violation before including it in the output. Requires
temperature=1.0(Anthropic constraint for extended thinking). See docs/testing-extended-thinking.md for experiment results. - Prompt version archiving — Previous prompts saved in
style_checker/prompts/v0.6.1/for comparison. Future prompt changes will be archived similarly. - Testing documentation — New docs/testing-extended-thinking.md documenting the extended thinking experiments, prompt iteration results, and ground truth validation.
- Minimal rule-agnostic prompts — All 8 category-specific prompts replaced with a single unified ~40 line prompt (down from ~120 lines each). The prompt no longer contains category-specific instructions — scope and analysis context come from the rule definitions themselves. This reduces signal dilution and lets the model focus on the task.
- Default temperature changed to 1.0 — Required for extended thinking. The thinking budget (10,000 tokens) ensures careful reasoning despite higher temperature.
- Local CLI:
qestyle— New command-line tool that shares the same review engine, prompts, rules, and fix logic as the GitHub Action. Install withpip install git+https://github.com/QuantEcon/action-style-guide.git. Runqestyle lecture.mdto review, apply fixes, and write a report (qestyle-lecture.md). Use--dry-runto skip applying fixes. Warns if the file has uncommitted changes before applying fixes. Supports--categories,--output,--model, and--temperatureoptions. - Category names as PR labels — When specific categories are requested (e.g.,
@qe-style-checker lectures/file.md code,math), the category names are added as labels on the generated PR. Default labels (automated,style-guide,review) are always included.
- Removed
tool-style-checker/directory — Replaced by the built-inqestyleCLI. The standalone tool had diverged from the main action (different LLM calling strategy, stale rule counts, old output format). The CLI uses the exact sameStyleReviewercode path. - Removed
tool-style-guide-development/directory — Rules are now edited directly instyle_checker/rules/; the standalone development tool andstyle-guide-database.mdare no longer needed.
- Clarified qe-code-003 rule — Specified that package installation cells belong near the top of the lecture in one of the first code cells. Explicitly states that standard Anaconda packages (numpy, matplotlib, scipy, pandas, sympy) do NOT require installation cells. Previously the vague rule caused the LLM to hallucinate installation blocks and overwrite section headers.
- Region-based combined Applied Fixes report — Replaced per-violation reporting with region-based diff reporting. When multiple rules edit the same line (e.g., math converts
$\alpha$→α, then writing boldslearning rate), the report now shows one combined entry with the true original → true final text and all contributing rules attributed. Eliminates confusing "identical original/fix" entries caused by sequential rule processing. - Skip no-op fixes in fix_applier — When
current_text == suggested_fix(LLM found no violation but emitted a violation block anyway), the fix is now skipped rather than applied as a no-op.apply_fixes()now returns a 3-tuple(content, warnings, applied_violations)so callers know which fixes actually changed content. - Fixed zero-violation parser bug — When the LLM reported
Issues Found: 0but still emitted violation blocks with commentary like[No change needed]as the suggested fix, the fix_applier would replace real lecture content with that commentary text. Parser now short-circuits whenIssues Foundis 0, skipping all violation parsing. All 8 prompts updated with stronger instructions to not emit violation blocks when no violations are found. - Fixed
review_lecture_smart()architecture bug — Was passing all rules at once to LLM despite documented evidence that single-rule evaluation is far more reliable. Now delegates toreview_lecture_single_rule()for every category. - Removed duplicate
format_pr_body()— Two implementations existed ingithub_handler.py; kept the concise second version. - Removed dead code — Deleted
parser_md.py(unused module),_review_category(),review_lecture(),check_style()methods, unusedload_promptimport, and--github-refCLI argument. - Removed dead test files — Deleted
test_parser_md.py,test_semantic_grouping.py,test_migration.py,verify_setup.py(all tested removed code). - Cleaned up dependencies — Removed
PyYAMLandpython-dateutilfromrequirements.txt(neither imported by action code). - Rebuilt RULES.md from source — Many rules had wrong types and descriptions. Regenerated from rule files (source of truth): 49 rules (32 rule, 13 style, 4 migrate). Previously said 48 rules with 15+ type/description mismatches.
- Fixed README version consistency — Updated badge and text from 0.5.0 to 0.5.1, fixed "50+ rules" to "49 rules".
- Fixed CONTRIBUTING.md stale references — Removed references to nonexistent
LLMProviderabstract base class andcheck_style()method. Updated priority taxonomy fromcritical/mandatory/best_practice/preferencetorule/style/migrate. - Fixed tests/README.md — Removed references to deleted
test_basic.pyandverify_setup.py. Updated to list current test files. - Fixed testing-quick-reference.md — Updated test file references, coverage stats, removed claims about CI pipeline that didn't exist.
- Fixed production-testing.md — Updated workflow example from
@v0.4to@v0.5. - Fixed ci-cd-setup.md — Added note about original document, updated status.
- Fixed
migratetype rule extraction — The regex inextract_individual_rules()only matchedrule|styletypes, silently dropping all 4migraterules (code-004, code-005, jax-004, jax-006). Now matchesrule|style|migrate.
- Configurable
temperatureparameter — Addedtemperatureinput to action.yml, CLI, and LLM provider. Defaults to0(deterministic) for consistent rule checking. Set to higher values (e.g., 0.5, 1.0) for more varied suggestions. Previously used Anthropic's default of 1.0, causing run-to-run variation. - Prompt version tracking — Added
<!-- Prompt Version: 0.5.1 | ... -->comments to all 8 prompt files. Previously onlywriting-prompt.mdhad version tracking. RULE_EVALUATION_ORDERfor all categories — Defined optimal evaluation order (mechanical → structural → stylistic → migrate) for all 8 categories inreviewer.py. Previously onlywritinghad a defined order.test_fix_applier.py(13 tests) — Testsapply_fixes()andvalidate_fix_quality().test_prompt_loader.py(9 tests) — TestsPromptLoaderclass: single/multi category loading, all categories loadable, invalid category, version tracking.test_reviewer.py(15 tests) — Testsextract_individual_rules()andRULE_EVALUATION_ORDER: rule counts (49 total, 32/13/4 by type), field presence, evaluation order consistency.
- Rewrote
test_parsing.py— Was reimplementing comment parsing with aTestHandlerclass instead of testing the realGitHubHandler.extract_lecture_from_comment()method. Now tests the actual code with 11 focused test cases. FixesPytestReturnNotNoneWarning. - Updated CI pipeline — Removed stale
verify_setup.pyreference, switched from flake8/black/isort to ruff, narrowed Python matrix to 3.11-3.13, removeddevelopbranch trigger, removed integration job that depended on deleted file.
- Critical NameError Fix - Fixed production bug preventing v0.5.0 from working
- Added
pr_labelsparameter toreview_single_lecture()function - Function now receives
pr_labelsas explicit parameter instead of accessingargs.pr_labels - Fixed variable naming conflict by renaming local
pr_labelstolabels - This fixes the "NameError: name 'args' is not defined" error at line 109
- Added
-
Production Testing Guide (
docs/production-testing.md) - Comprehensive guide for testing the action- Local CLI testing workflow
- GitHub test repository setup (
test-action-style-guide) - Testing checklist and debugging guide
-
Test Repository (
QuantEcon/test-action-style-guide) - Dedicated repository for action testing- Jupyter Book structure with test lectures
- Workflow for manual and comment-triggered testing
- Clean and violation test lectures for regression testing
- Scheduled weekly regression tests
-
PR Labels Input (
pr-labels) - Allow custom labels on created PRs- New action input:
pr-labels(comma-separated list) - Custom labels added to default labels (automated, style-guide, review)
- Useful for test repos to add labels like 'do-not-merge'
- New action input:
- Simplified comment trigger syntax - Now only supports
@qe-style-checker- Removed legacy
@quantecon-style-guidesyntax - Removed experimental
@github-actions style-guidesyntax - Cleaner codebase with single comment pattern
- Updated documentation to reflect syntax
- Removed legacy
- RELEASE-GUIDE.md - Outdated release documentation removed
-
Renamed rule classification from "Category" to "Type" - Clearer terminology throughout
- Rule files:
**Category:** rule|style→**Type:** rule|style - Code:
rule_category→rule_typein reviewer.py, parser_md.py - Prompts: Updated all 8 category prompts to use "
ruletype" and "styletype" - Documentation: Updated ARCHITECTURE.md and README.md to use "Type" terminology
- Tests updated to use new terminology
- Prevents confusion with topic categories (writing, math, code, etc.)
- Rule files:
-
Documented
migratetype - Third rule type for legacy pattern updates- Used in JAX and code categories for patterns like
tic/toc→qe.Timer() - Treated as suggestions (not auto-applied), similar to
styletype - Added to Rule Types table in ARCHITECTURE.md
- Added new "Type: migrate" section in README.md
- Used in JAX and code categories for patterns like
-
tool-style-checker now shares prompts/rules with main action
- Deleted duplicate
tool-style-checker/prompts/andtool-style-checker/rules/directories - Tool now loads from
style_checker/prompts/andstyle_checker/rules/ - Single source of truth - local testing uses same rules as production
- Updated README to document shared resource architecture
- Deleted duplicate
-
Updated tool-style-guide-development for Type terminology
build_rules.py: Updated regex to parse**Type:**instead of**Category:**style-guide-database.md: Renamed "Categories" section to "Types"- Generated rule files in
rules/: Updated all to use**Type:** - README: Updated documentation to reflect changes
-
ARCHITECTURE.md - Comprehensive developer documentation
- System architecture diagram (Mermaid)
- Data flow for single and weekly processing modes
- Component descriptions with key functions
- Configuration reference (inputs, environment variables)
- Development guide with testing instructions
-
FUTURE-ENHANCEMENTS.md - Roadmap and research notes
- GitHub inline suggestions approach (checkbox-based style suggestions)
- Incremental PR review mode proposal
- Batch processing improvements (resume capability, progress reporting)
- Multi-model support for cost optimization
- Rule confidence scoring concept
-
docs/README.md - Documentation index with quick links
-
LLM integration tests - Updated to use current API signatures
- Changed from
review_lecture(rules_text=...)toreview_lecture_single_rule(categories=...) - All 30 tests now passing (23 unit + 7 integration)
- Added new test for
review_lecture_smart()method - Test coverage improved from 36% to 53%
- Changed from
-
README version badge - Updated from 0.3.17 to match current version
- qe-writing-002 rule conflict - Added constraint to prevent violating qe-writing-001
- qe-writing-002 was breaking long sentences into multiple sentences without blank lines
- This violated qe-writing-001 (one sentence per paragraph)
- Added "Important" note: Breaking sentences up requires blank lines between each sentence
- Changed category from
ruletostylefor advisory guidance - Self-contained constraint (no external links needed)
- qe-writing-002 rule conflict - Added explicit exception to prevent breaking sentences
- qe-writing-002 was breaking long sentences into multiple sentences
- This violated qe-writing-001 (one sentence per paragraph)
- Added "Important" section: Do NOT break sentences, instead simplify within single sentence
- Provides guidance: remove words, simplify clauses, use direct phrasing, restructure
- Prevents conflicting edits between rules processed sequentially
- Remove redundant CRITICAL instruction - Removed ambiguous "Current text and Suggested fix MUST be different" note
- Instruction was redundant (obviously fixes should change something)
- Was causing confusion and potentially skipping valid fixes
- Validation is already handled by
validate_fix_quality()function - Cleaner, simpler prompt with less confusion
- Updated writing prompt version to 0.3.23
- Whitespace fix application - Clarified CRITICAL instruction to allow subtle visual differences
- Changed from "MUST be different" (ambiguous) to "must change something"
- Added explicit note: "Even if the visual difference is subtle (like whitespace changes), ensure the suggested fix actually corrects the violation"
- Fixes issue where whitespace fixes were being skipped because they looked "the same"
- LLM was interpreting "different" too strictly, avoiding reporting valid whitespace violations
- Updated writing prompt version to 0.3.22
- Parser support for tilde fences - Updated violation parser to handle both backtick and tilde code fences
- Parser now accepts
~~~markdown(as instructed in prompts) in addition to```markdown - Fixes issue where Current text and Suggested fix weren't being extracted from LLM responses
- Regex patterns updated:
(?:```|~~~)to match either fence type - Ensures PR comments display violations correctly
- Parser now accepts
- Writing prompt improvements - Updated to reflect single-rule processing architecture and simplified language
- Changed from "check all writing rules" to "check one specific rule"
- Emphasizes checking ONLY the provided rule, not other issues
- Clarified that rules are processed one at a time sequentially
- Simplified rule/style category descriptions (removed verbose explanations)
- Changed "Quality over quantity" to "Apply the rule appropriately"
- More concise and actionable instructions
- Updated version comment to
0.3.19and added "Single rule per LLM call" description - Better focused output with rule-specific summary messages
- qe-writing-001 false positives - Simplified rule definition to prevent reporting already-correct text
- Removed verbose "CRITICAL - Do NOT report" sections
- Changed "Check for" to focus on violations requiring fixes
- Simplified examples by removing redundant "DO NOT report" comments
- More concise rule definition following "simplicity above all" principle
- Duplicate explanation in PR comments - Removed list formatting from violation output templates
- Updated all 8 category prompts to use paragraph format instead of list items
- Changed from
- **Severity:** errorto**Severity:** error(no bullet point) - Eliminates duplicate explanation text that appeared in PR comments
- Cleaner, more readable output format
- Affects: writing, math, code, jax, figures, references, links, admonitions prompts
- qe-writing-001 false positives - Clarified "Check for" criteria to prevent reporting already-correct text
- Updated rule to specify: "Only report when blank lines need to be ADDED to separate sentences"
- Simplified language by removing verbose "DO NOT report" sections
- Focuses on the actual violation: multiple sentences in a single paragraph block (no blank lines between them)
- Fixes issue where LLM would report correctly-formatted text as violations
- LLM prompt output format - Updated all prompts to use tilde fences in output examples
- Changed from triple backticks to
~~~markdownfor "Current text" and "Suggested fix" blocks in prompt templates - LLM now generates responses that match GitHub handler's tilde fence format
- Ensures consistency between what LLM is instructed to output and what GitHub handler expects
- Affects all 8 category prompts: writing, math, code, jax, figures, references, links, admonitions
- Completes migration from 4-backtick approach (v0.3.15) to tilde fence standard (v0.3.16)
- Changed from triple backticks to
- PR comment fence markers - Use tilde (
~~~) fences instead of backticks for markdown blocks- Changed all PR comment code blocks from
````markdownto~~~markdown - Uses tildes for outer fences, preserving backticks for MyST Markdown content
- Prevents fence depth conflicts with nested directives (e.g.,
```{code-cell},```{note}) - More elegant solution per GitHub Flavored Markdown spec
- Updated all prompt files to instruct LLM to use tilde fences
- Updated tests to verify tilde fence usage
- Changed all PR comment code blocks from
- PR comment markdown formatting - Use four backticks for code blocks
- Changed all PR comment markdown code blocks from
```markdownto````markdown - Prevents rendering issues when MyST Markdown content contains nested directives with three-backtick code blocks
- Affects
format_detailed_report(),format_applied_fixes_report(), andformat_style_suggestions_report()
- Changed all PR comment markdown code blocks from
- Separate handling for rule vs style category violations - Two-comment PR system
- Rule category violations (mechanical fixes): Automatically applied to lecture content
- Rules: qe-writing-001, 002, 004, 005, 006, 008
- Applied fixes posted as collapsible PR comment for reference
- Includes all fix details (original text, applied fix, explanation)
- Style category violations (subjective suggestions): Collected but NOT auto-applied
- Rules: qe-writing-003 (logical flow), 007 (visual elements)
- Suggestions posted as OPEN PR comment requiring human review
- Prevents "over enthusiastic" LLM from making subjective changes
- Updated
extract_individual_rules()to capture category field from rule definitions - Modified
review_lecture_single_rule()to filter fixes by category before applying - Added
format_applied_fixes_report()andformat_style_suggestions_report()methods - Two separate PR comments: one collapsible (applied), one open (suggestions)
- Rule category violations (mechanical fixes): Automatically applied to lecture content
- PR comment structure redesigned - Replaced single detailed report with two targeted comments
- Previous: One collapsible comment with ALL violations (mixed rule and style)
- Now: Separate comments for automatic fixes vs suggestions requiring human review
- Applied fixes comment: Collapsed by default (reference only, already applied)
- Style suggestions comment: Open by default (immediate visibility for review)
- Hardcoded rule evaluation order - Rules now checked in optimal sequence
- Defined
RULE_EVALUATION_ORDERconstant inreviewer.py - Writing rules checked in priority: 008 → 001 → 004 → 006 → 005 → 002 → 003 → 007
- Order: mechanical → structural → stylistic → creative
- Previously: Rules checked in file order (001, 002, 003...) regardless of priority
- Now: Rules extracted and checked in optimal order for best results
- Whitespace (008) checked FIRST, visual elements (007) checked LAST
- Easy to maintain: update constant to change order, no need to reorder file
- Defined
- Sequential fix application in single-rule evaluation - Critical bug fix
- Previous implementation collected ALL violations from ALL rules, then applied fixes at the end
- This meant rule 002 was checking against ORIGINAL content, not content fixed by rule 001
- Now applies fixes immediately after each rule before checking the next rule
- Rule 001 → find violations → apply fixes → Rule 002 checks the UPDATED content
- Ensures proper sequential processing: each rule sees the results of previous fixes
- More accurate detection as later rules work with already-cleaned content
- REVERTED: Rule renumbering from v0.3.10 - Restored original rule numbers
- Testing showed renumbering didn't solve the problem - LLM still fixated on one rule (002 instead of 001)
- Maintaining sequential rule numbers based on evaluation order creates unnecessary maintenance burden
- Restored original numbering: 001=paragraph, 002=clarity, 003=flow, 004=caps, 005=bold/italic, 006=titles, 007=visual, 008=whitespace
- Root cause confirmed: LLM cannot reliably check multiple rules in a single pass, regardless of numbering or explicit STEP-by-STEP instructions
- Single-rule evaluation approach - New architecture for guaranteed rule coverage
- Instead of asking LLM to check all rules at once, loop through rules one at a time
- Each rule gets its own focused LLM call with that specific rule injected at bottom of prompt
- Guarantees every rule is evaluated independently
- Trade-off: 8× API calls for writing category, but reliable comprehensive coverage
- More expensive but ensures no rules are skipped or ignored
- Removed STEP-by-STEP sequential evaluation instructions - Didn't work
- LLM consistently ignores explicit ordering instructions when given multiple rules
- Simplified prompt back to basic "check systematically" approach
- Single-rule architecture makes sequential ordering unnecessary
- BREAKING: Renumbered writing rules to match evaluation priority - LLM was ignoring STEP-by-STEP order
qe-writing-001: Whitespace formatting (was 008) - STEP 1qe-writing-002: Paragraph structure (was 001) - STEP 2qe-writing-003: Capitalization (was 004) - STEP 3qe-writing-004: Title capitalization (was 006) - STEP 4qe-writing-005: Bold/italic formatting (unchanged) - STEP 5qe-writing-006: Clarity and conciseness (was 002) - STEP 6qe-writing-007: Logical flow (was 003) - STEP 7qe-writing-008: Visual elements (was 007) - STEP 8- Rationale: Testing revealed LLM was completely ignoring STEP instructions and always checking rule 001 first (29 violations for 001, zero for 008 despite explicit STEP 1: check 008 first). Hypothesis: LLM is biased toward checking lower-numbered rules first. By aligning rule numbers with evaluation priority, we work with this behavior instead of fighting it.
- Rules now match their evaluation order: 001 is checked in STEP 1, 002 in STEP 2, etc.
- Prompt version tracking - Added version comment to prompts for better debugging
- Format:
<!-- Prompt Version: 0.3.10 | Last Updated: 2025-10-10 | Description --> - Displayed in logs: "✓ Using writing prompt v0.3.10"
- Helps verify correct prompt is loaded (important for GitHub Actions cache)
- Replaces previous "SEQUENTIAL RULE EVALUATION" detection check
- Format:
- Added action version to PR description - PR body now includes version number in summary section
- Fixed PR body length error - Exceeded GitHub's 65KB limit for PR descriptions
- Changed from listing all violation details to summarizing by rule
- Groups violations by rule and shows first 2 examples only
- Shows count of occurrences per rule (e.g., "15 occurrences")
- Includes automatic truncation at 60KB with warning if still too long
- Reduces PR body size by ~90% for large violation counts
- Users can still see all details in the diff
- Strengthened rule evaluation order in writing prompt - Made order mandatory
- Changed from "for optimal results" to "CRITICAL: Apply rules in this EXACT order"
- Changed from bullet points to numbered list for emphasis
- Added "This sequence is MANDATORY, not optional"
- Added "Do NOT skip ahead or check rules out of sequence"
- Added instruction to check each rule in order before moving to next
- Prevents LLM from applying rules out of sequence
- Prevent identical current/fix violations - Quality control for LLM responses
- Added CRITICAL instruction: "Current text" and "Suggested fix" MUST be different
- If LLM cannot provide different fix, must NOT report as violation
- Prevents confusing quality warnings where current and fix are identical
- Action now fails on errors - Exit with code 1 when LLM errors occur
- Previously printed error but continued with exit code 0
- Now properly exits with failure code when errors detected
- Helps catch issues like "Overloaded" API errors
- Both single and bulk modes now fail appropriately on errors
- Updated qe-writing-001 - Clarified paragraph block definition to prevent false positives
- Strengthened definition: "Each paragraph block (text separated by blank lines)"
- Added explicit statement: "Line breaks within text (without blank lines) do NOT create new paragraphs"
- Added example showing CORRECT usage (text already following the rule)
- Added "Key distinction" section explaining blank lines vs line breaks
- Updated implementation note to emphasize paragraph blocks are defined by blank lines
- Fixes issue where LLM incorrectly flagged already-correct text as violations
- qe-writing-001 false positives - LLM now correctly understands paragraph block boundaries
- Previously flagged correct text (sentences separated by blank lines) as violations
- Added clearer examples and stronger language about blank lines vs line breaks
-
Order of Evaluation - Added rule evaluation sequence to writing prompt
- Added explicit evaluation order to writing-prompt.md (instruction #3)
- Rules now processed from mechanical fixes → structural → stylistic → creative
- Sequence: whitespace → paragraph structure → capitalization → titles → formatting → clarity → flow → visual
- Improves consistency and quality of LLM-generated suggestions
- Each rule benefits from corrections made by earlier rules in the sequence
- Kept in prompt only (single source of truth, easier maintenance)
-
Version Display - Action now prints version at startup
- Shows full version number in GitHub Action output (e.g., "v0.3.6")
- Helps identify which version is running when using floating tags like
@v0.3 - Version defined in
style_checker/__init__.py
- Version bumped to 0.3.6
- Updated qe-writing-001 - Clarified line break handling
- Added clarification that sentences can span multiple lines in markdown source
- Rule focuses on logical paragraph structure (one sentence per paragraph block), not physical line breaks
- Added example showing single sentence spanning multiple lines
- Updated implementation note to explain paragraph definition
- Paragraphs are defined by blank lines, not line breaks within the text
- New writing rule: qe-writing-008 - Whitespace linting
- Detects multiple consecutive spaces between words in MyST Markdown source
- Suggests reducing excessive whitespace to single spaces
- Improves markdown source consistency and readability
- Excludes code blocks, inline code, math blocks, and intentional formatting
- Linting-focused rule (doesn't affect HTML output, only source quality)
- Updated qe-writing-001 - Clarified scope
- Added "Associated rules" section referencing qe-writing-008 for whitespace issues
- Updated implementation note to reference the new whitespace rule
- Focuses on sentence structure and paragraph organization
- Updated writing prompt - Added whitespace formatting to checklist
- Removed "Corrected Content" from LLM responses - Major performance improvement
- Prompts no longer request full corrected lecture in response
- Fixes applied programmatically using existing
apply_fixes()function - Reduces output tokens by ~50% (e.g., 40K char lecture now ~20K tokens cheaper)
- Faster API responses and lower costs
- Matches
tool-style-checkerarchitecture for consistency - Parser updated to not expect "Corrected Content" section
- All 8 prompt files updated to new streamlined format
- Token savings: ~20,000 output tokens saved per 40K character lecture
- Cost reduction: Approximately 50% reduction in output token costs
- Speed improvement: Faster API responses (less content to generate)
- Same functionality: Fixes still applied, just more efficiently
- CRITICAL BUG FIX: Parser format mismatch causing 100% failure in violation detection
- Issue: v0.3.0 and v0.3.1 reported "No issues found" even when violations existed
- Root Cause: Prompts requested free-form format but parser expected structured format
- Impact: Parser couldn't extract violations from Claude responses → always reported 0 issues
- Solution: Updated all 8 prompt files to request parser-compatible output format
- Prompts now explicitly request
## Issues Found\n[NUMBER]with structured violations - Parser can now successfully extract violation count and details
- Fixes complete failure of violation detection in v0.3.0 and v0.3.1
- See
BUG-REPORT-PARSER-FORMAT-MISMATCH.mdfor detailed analysis
- All prompt files now include explicit, detailed output format specifications
- Format matches what
parse_markdown_response()expects for reliable parsing
- Streaming Fallback: Added automatic fallback to streaming API for large lectures
- Anthropic now requires streaming for requests that may take longer than 10 minutes
- Non-streaming API tried first for better performance
- Automatically falls back to streaming if required
- Fixes error: "Streaming is required for operations that may take longer than 10 minutes"
- Particularly important for very large lectures (40K+ characters)
This is a major release with breaking changes. Version 0.3.0 is not backward compatible with 0.2.x.
- Removed: Legacy trigger
@quantecon-style-guide(use@qe-style-checkerinstead) - Removed:
style-guide-urlinput parameter (rules now built-in) - Removed:
allcategory keyword (omit categories to check all) - Removed:
style-guide-database.md(replaced by focused prompts + rules) - Removed: OpenAI and Google Gemini provider support (now Claude Sonnet 4.5 only)
- Removed:
llm-provider,openai-api-key, andgoogle-api-keyinputs - Changed:
anthropic-api-keyis now required (was optional) - Changed: Updated to
@v0.3in workflow examples
- Focused Prompts Architecture: Hand-written prompts + detailed rules for better quality and lower costs
- 8 category-specific prompts (~85 lines each) in
style_checker/prompts/ - 8 detailed rule files (~120-235 lines each) in
style_checker/rules/ - Total: 1901 lines (48% reduction from auto-generated 3610 lines)
- Matches proven
tool-style-checkerpattern
- 8 category-specific prompts (~85 lines each) in
- Sequential Category Processing: Categories are now processed one at a time, feeding updated content between each
- Ensures all fixes are applied without conflicts
- Later categories see changes from earlier categories
- More reliable and complete results
- Matches
tool-style-checkerapproach
- Smart Prompt Loading:
PromptLoaderclass combines prompts + rules dynamically - Better Test Organization: All tests consolidated in
tests/directory - Rule Development Workflow: New
tool-style-guide-development/folder for managing style guide rulesstyle-guide-database.md: Single source of truth for rule developmentbuild_rules.py: Script to generate category-specific rule filesREADME.md: Complete workflow documentation- Separates rule development from action runtime
- Enables independent updates to style guide content
- Simplified Configuration: Removed unnecessary inputs for cleaner setup
- Claude Sonnet 4.5 Exclusive: Simplified to use only the best LLM for style checking
- Default model:
claude-sonnet-4-5-20250929 - Excellent comprehension for nuanced style rules
- Can add other providers in future releases if needed
- Default model:
- Simplified Architecture: Removed runtime database parsing
- Action reads directly from
style_checker/rules/*.mdfiles - No more
StyleGuideDatabaseobject or parsing overhead - Faster startup and clearer code flow
parser_md.pykept for backward compatibility in tests
- Action reads directly from
- Simplified API Calls: Removed streaming interface
- Focused prompts are small enough (~5-12K tokens) for standard API calls
- Cleaner code without streaming context managers
- Better error handling and debugging
- Streaming was only needed for old large auto-generated prompts
- Updated Documentation: All references updated to v0.3.0
- Cleaner Repository: Removed outdated documentation and legacy code
- Improved README: Clearer quick start, removed verbose content
- Deprecated
generate_prompts.py(replaced by focused prompts) - Verbose CHANGELOG entries condensed
- Outdated migration documentation
- Legacy syntax references
- Update workflow to use
@v0.3instead of@v0.2 - Remove
style-guide-urlparameter from workflow configuration - Use
@qe-style-checkertrigger (not@quantecon-style-guide) - Remove
allcategory - omit categories to check all
- GitHub Actions deprecation warning (replaced
::set-outputwithGITHUB_OUTPUT)
- Documentation cleanup and updates
- Deprecated
parser.pymodule
- Semantic group parallelization (2-3x faster, 25% cheaper)
- Markdown-based style guide database
- Parallel processing of up to 4 groups simultaneously
- Migrated from YAML to Markdown format
- Simplified main.py (removed chunking logic)
- Old YAML parser and database
- Deprecated chunking methods
- Migrated from JSON to Markdown format for LLM responses
- Comprehensive testing infrastructure
- CI/CD pipeline
- Integration tests
- Improved JSON parsing for malformed Claude responses
- Comprehensive review summary
- Better progress indicators
- Hardcoded fallback to old Claude model
- Implemented streaming for Claude API requests
- Updated default model to Claude Sonnet 4.5
- Reduced max_tokens for Claude 3.5 Sonnet compatibility
- Module import path issue in GitHub Actions
- Initial development release
- AI-powered style guide checking
- Single lecture and bulk review modes
- Automatic PR creation