Skip to content

Conversation

@armenzg
Copy link
Member

@armenzg armenzg commented Dec 22, 2025

Currently, we send GitHub webhook events to Overwatch which are processed and then Seer's called from it.

The changes included are preparatory changes to switch from sending the events from Overwatch to Seer.

This will enable porting one feature at a time.

This adds webhook handling for:

  • Issue comments
  • Pull request: open & close
  • Pull request: reviews
  • Pull request: comments

Fixes CW-86.

@armenzg armenzg self-assigned this Dec 22, 2025
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 22, 2025
@armenzg armenzg changed the title 12 22/sentry processing/armenzg feat(code_review): Support sending webhook events to Seer Dec 22, 2025
@armenzg armenzg force-pushed the 12_22/sentry_processing/armenzg branch from 3b5f6de to e9d2e29 Compare December 22, 2025 19:42
@linear
Copy link

linear bot commented Dec 22, 2025

@codecov
Copy link

codecov bot commented Dec 22, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
260 1 259 19
View the top 1 failed test(s) by shortest run time
::tests.sentry.seer.code_review.test_utils
Stack Traces | 0s run time
#x1B[31mimport file mismatch:
imported module 'test_utils' has this __file__ attribute:
  .../sentry/profiles/test_utils.py
which is not the same as the test file we want to collect:
  .../seer/code_review/test_utils.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Member

@suejung-sentry suejung-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These staged changes make sense to me and behind those flags sound like a good way to test it out little by little


# Determine request_type based on event_type
# For now, we only support pr-review for these webhook types
request_type: Literal["pr-review", "unit-tests", "pr-closed"] = "pr-review"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this opportunity to sunset unit-tests feature. This used to be the test generation feature that we will not take to GA


# Build RepoDefinition
repo_definition = {
"provider": "github", # All GitHub webhooks use "github" provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning/heads up there can be confusion about when this value should be github or integrations:github (the latter being the key we need to query within various Integrations team related endpoints). Looks like in the case of the payload to seer either is acceptable and there is some stripping that happens here.

So fine as you have it, but a word of warning this could be a source of bugs

"provider": "github", # All GitHub webhooks use "github" provider
"owner": repository["owner"]["login"],
"name": repository["name"],
"external_id": str(repository["id"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important field here is base_commit_sha which should be the commit we are analyzing or head SHA in the case of a pull request (here)

"repo": repo_definition,
"pr_id": pr_number,
"codecov_status": None,
"more_readable_repos": [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important field is also bug_prediction_specific_information.organization_id (here) and maybe also organization_slug. We can get rid of the rest of the fields in that bug_prediction_specific_information object.
Would be nice to clean up this payload and bring that field up to the top level.

The other one we need is config (here). We can clean this up a bunch too based on everything that's changed for GA.

        "config": {
            "bug_prediction_enabled": true
            "trigger": "on_new_commit"
        }

These were also added for metrics - trigger_comment_id, trigger_comment_type, trigger_user
Maybe we'd consider nesting that under a "trigger_metadata" key or something

Copy link
Member

@suejung-sentry suejung-sentry Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example payload of the minimum fields afaik

{
    "request_type": "pr-review",
    "external_owner_id": "176330896",
    "data": {
        "repo": {
            "provider": "github",
            "owner": "suejung-org",
            "name": "hello-vite",
            "external_id": "176330896",
            "base_commit_sha": "8fac7540fe09c9041fa9f254fc3af1a72300dfb8"
        },
        "pr_id": 4,
        "bug_prediction_specific_information": {
            "organization_id": 1
        },
        "config": {
            "features": {
                "bug_prediction": true
            },
            "trigger": "on_new_commit",
            "trigger_comment_id": .., // for metrics
            "trigger_comment_type": .., // for metrics
            "trigger_user": ... // for metrics
        }
    }
}

"owner": repository["owner"]["login"],
"name": repository["name"],
"external_id": str(repository["id"]),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required base_commit_sha field in repo definition

The repo_definition dictionary in _transform_webhook_to_codegen_request is missing the base_commit_sha field, which was explicitly identified as an important required field in the PR discussion. According to the reviewer, this should be "the commit we are analyzing or head SHA in the case of a pull request." For pull request events, this value is available at event_payload["pull_request"]["head"]["sha"]. Without this field, Seer won't know which commit to analyze, causing the code review feature to fail or produce incorrect results.

Fix in Cursor Fix in Web

logger = logging.getLogger("sentry.webhooks")


class WebhookProcessor(Protocol):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature all GitHub webhook handlers need to have.

return user_type != "Bot"


def _handle_pr_webhook_for_autofix_processor(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would port this change to handle_github_pr_webhook_for_autofix itself.


if integration is not None:
self._handle(integration, event, org_integrations=org_integrations)
self._handle_organization_deletion(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is not a true subclassing of the _handle we can rename it.

String values preserve original GitHub naming for backward compatibility with existing metrics.
"""

CI_CHECK = "ci_check"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rename the value from check_run since we have just implemented this feature in the last two weeks, it's barely used and I don't care if we have a discontinuity somewhere in the GitHub metrics.

String values preserve original GitHub naming for backward compatibility with existing metrics.
"""

CI_CHECK = "ci_check"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics value change contradicts backward compatibility comment

The docstring states "String values preserve original GitHub naming for backward compatibility with existing metrics," but CI_CHECK has value "ci_check" instead of the original GitHub value "check_run". The old code used CHECK_RUN = "check_run". This breaking change will cause existing metrics dashboards tracking "check_run" events to stop receiving data, contradicting the documented backward compatibility guarantee.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional.

String values preserve original GitHub naming for backward compatibility with existing metrics.
"""

CI_CHECK = "ci_check"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm renaming this metric because is so new and I don't care about breaking continuity.

String values preserve original GitHub naming for backward compatibility with existing metrics.
"""

CI_CHECK = "ci_check"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the tests into test_webhooks.py.

@armenzg armenzg marked this pull request as ready for review December 29, 2025 14:55
@armenzg armenzg requested a review from a team as a code owner December 29, 2025 14:55
@armenzg armenzg requested review from a team December 29, 2025 14:55
@armenzg armenzg requested review from a team as code owners December 29, 2025 14:55
# These are always forwarded
assert GithubWebhookType.INSTALLATION in events
assert GithubWebhookType.INSTALLATION_REPOSITORIES in events
assert len(events) == 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertions have incorrect event count values

The test test_default_events_included asserts 6 specific items are in events (4 configurable + 2 installation events), then asserts len(events) == 5. This is mathematically impossible. Similarly, test_disabling_options_excludes_events with all options disabled leaves only 2 installation events, yet also asserts len(events) == 5. The first test likely needs len(events) == 6, and the second needs len(events) == 2. These tests will fail when executed, or if they pass, indicate the implementation has unexpected behavior.

Additional Locations (1)

Fix in Cursor Fix in Web

String values preserve original GitHub naming for backward compatibility with existing metrics.
"""

CI_CHECK = "ci_check"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric value change contradicts backward compatibility guarantee

The docstring at line 432 explicitly states "String values preserve original GitHub naming for backward compatibility with existing metrics." However, CI_CHECK = "ci_check" does NOT match GitHub's event type naming of "check_run" (as defined in GithubWebhookType.CHECK_RUN). This is inconsistent with other values like MERGE_REQUEST = "pull_request" which DO preserve GitHub naming. The diff shows this was previously CHECK_RUN = "check_run", so changing to "ci_check" breaks the documented backward compatibility and could cause existing metrics dashboards monitoring event_type=check_run to stop working.

Fix in Cursor Fix in Web

# These are always forwarded
assert GithubWebhookType.INSTALLATION in events
assert GithubWebhookType.INSTALLATION_REPOSITORIES in events
assert len(events) == 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertion expects wrong count for default events

The test_default_events_included test asserts that 6 distinct event types are present in the returned set (PULL_REQUEST, PULL_REQUEST_REVIEW, PULL_REQUEST_REVIEW_COMMENT, ISSUE_COMMENT, INSTALLATION, INSTALLATION_REPOSITORIES), but then incorrectly claims len(events) == 5. Given _INSTALLATION_EVENTS contains 2 items and _WEBHOOK_TYPE_TO_OPTION contains 4 items (all defaulting to True), the set contains 6 items.

Similarly, test_disabling_options_excludes_events disables all 4 configurable options, leaving only the 2 installation events, but asserts len(events) == 5 instead of len(events) == 2. Both tests will fail with AssertionError.

Additional Locations (1)

Fix in Cursor Fix in Web

# Build RepoDefinition
repo_definition = {
"provider": "github", # All GitHub webhooks use "github" provider
"owner": repo.owner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repository model lacks owner attribute accessed in transform function

The _transform_webhook_to_codegen_request function accesses repo.owner at line 101, but the Repository model has no owner field or property. The model only has name (which stores the full repository name like "owner/repo"), external_id, organization_id, and other fields. This will cause an AttributeError when handle_other_webhook_event is called. While currently not reached (the WEBHOOK_EVENT_PROCESSORS for ISSUE_COMMENT, PULL_REQUEST_REVIEW, etc. are empty), this will crash when those handlers are enabled. The owner likely needs to be extracted from repo.name.split("/")[0] or from the repository config.

Fix in Cursor Fix in Web

a task to forward the original run ID to Seer so it can rerun the PR review.
Args:
event_type: The type of the webhook event (as string)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The EventType enum value for CHECK_RUN is "ci_check", but GitHub sends "check_run", causing a ValueError during event parsing and breaking check_run webhook handling.
Severity: CRITICAL | Confidence: High

🔍 Detailed Analysis

The EventType enum defines CHECK_RUN with the value "ci_check". However, GitHub sends webhook events with the type "check_run". When the application receives a check_run event, the EventType.from_string("check_run") call in handle_webhook_event fails with a ValueError because no enum member has the value "check_run". This will cause the webhook handler to crash, preventing any check_run events from being processed.

💡 Suggested Fix

Change the value of EventType.CHECK_RUN in src/sentry/seer/code_review/webhooks/types.py back to "check_run" to match the string sent by the GitHub webhook API. This will ensure EventType.from_string("check_run") resolves correctly.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/seer/code_review/webhooks/check_run.py#L90

Potential issue: The `EventType` enum defines `CHECK_RUN` with the value `"ci_check"`.
However, GitHub sends webhook events with the type `"check_run"`. When the application
receives a `check_run` event, the `EventType.from_string("check_run")` call in
`handle_webhook_event` fails with a `ValueError` because no enum member has the value
`"check_run"`. This will cause the webhook handler to crash, preventing any `check_run`
events from being processed.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7995728

"""
# Determine request_type based on event_type
# For now, we only support pr-review for these webhook types
request_type: Literal["pr-review", "pr-closed"] = "pr-review"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The _transform_webhook_to_codegen_request function hardcodes request_type to "pr-review", ignoring the actual event_type parameter, which breaks logic for events like closing a pull request.
Severity: CRITICAL | Confidence: High

🔍 Detailed Analysis

The function _transform_webhook_to_codegen_request is intended to determine the request_type based on the event_type parameter it receives. However, the implementation hardcodes the request_type to "pr-review" and ignores the event_type parameter. This means that events like a pull request closing, which should generate a "pr-closed" request type, will instead incorrectly generate a "pr-review" request. This causes the wrong information to be sent to the Seer service, breaking the intended functionality for handling different pull request events.

💡 Suggested Fix

Update _transform_webhook_to_codegen_request to use the event_type parameter to conditionally set the request_type. For example, check if the event is a pull request close event and set request_type to "pr-closed" accordingly.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/seer/code_review/utils.py#L83

Potential issue: The function `_transform_webhook_to_codegen_request` is intended to
determine the `request_type` based on the `event_type` parameter it receives. However,
the implementation hardcodes the `request_type` to `"pr-review"` and ignores the
`event_type` parameter. This means that events like a pull request closing, which should
generate a `"pr-closed"` request type, will instead incorrectly generate a `"pr-review"`
request. This causes the wrong information to be sent to the Seer service, breaking the
intended functionality for handling different pull request events.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7995728

@armenzg armenzg merged commit f3c8cc7 into master Dec 29, 2025
65 checks passed
@armenzg armenzg deleted the 12_22/sentry_processing/armenzg branch December 29, 2025 16:15
armenzg added a commit that referenced this pull request Dec 31, 2025
This fixes the `check_run` re-run case since I regressed it in #105393
when we switched the `event_type` to `ci_check`.

Standardizes webhook event type handling by replacing the custom
`EventType` enum with the existing `GithubWebhookType` from the GitHub
integration package.

Changes:
- Pass `github_event` from `X-GitHub-Event` header through webhook chain
- Remove custom `EventType` enum and use `GithubWebhookType` throughout
- Update webhook processor signatures to accept `github_event` parameter
- Simplify event routing logic in `process_github_webhook_event` task
- Update all webhook processors and tests to use new signature

This makes the code more consistent with the GitHub integration package
and removes duplicate event type definitions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants