Closed
Conversation
…ifier Split LLM classification into two passes to fix verdict-reasoning contradictions (4/26 in PR #2493). Pass 1 produces reasoning and PR attribution without a verdict. Pass 2 reads the reasoning and assigns the verdict. This separates code analysis (hard) from labeling (easy), eliminating cases where the LLM commits to a verdict early and writes contradictory reasoning. Also adds --pyrefly-diff CLI flag to include the pyrefly PR code diff in each LLM call, enabling per-project attribution of which code change caused errors to appear or disappear.
Restructure format_markdown() to show an overview table with linked function names and file paths, collapsible detailed analysis, and a suggested fix section. Add helpers for function-name linkification and root cause extraction from PR attribution text.
Add --suggest CLI flag, Suggestion/SuggestionResult dataclasses, and generate_suggestions() LLM client that produces actionable source code fix suggestions from classification results and the PR diff.
Use a stricter regex (_INTERNAL_FUNCTION_PATTERN) that requires underscores to distinguish pyrefly internal function names like check_for_imported_final_reassignment() from common Python method names like get(), match(), set() that appear in error messages.
Asks reviewers to react with 👍 or 👎 so we can track classifier accuracy over time.
…e display - Collect error_kinds from both project.added and project.removed so improvement-only diffs (all removals) still populate the Error Kinds column - Add file path fallback in _extract_root_cause when no function name is found - Add high-level summary paragraph aggregating patterns across projects - Rename table header from "Error Kind" to "Error Kinds"
…lity - Change workflow trigger from workflow_dispatch to workflow_run so it runs automatically when "Run mypy_primer" completes - Move feedback prompt out of <sub> tag for better visibility
b135591 to
01c3601
Compare
48fa87d to
9dd6c3a
Compare
- Add --pr-description CLI flag to pass PR title/body to the classifier - Include PR description in the LLM user prompt as "author's stated intent" - Add INTENTIONAL REGRESSIONS guidance to system prompt so the LLM distinguishes conformance-driven regressions from unintentional bugs - Update workflow to fetch PR title/body and pass it to the classifier
9dd6c3a to
eef5ab5
Compare
grievejia
approved these changes
Feb 28, 2026
Contributor
grievejia
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tested the tool on some PRs manually.
Example run: #2567
I would like to gather some feedback on the took before tuning it further, so I am proposing to make the workflow automatic.
I added a feedback thumbs up/thumbs down to gather feedback in the last diff. I increased the font size to make it more visible.