Centralize PR data fetching to prevent duplicate API calls
Summary
Multiple event handlers in the GitHub provider independently fetch pull request details, commit information, and changed files. This results in duplicate API calls when processing a single webhook event that triggers multiple handlers.
Problem
Location: pkg/provider/github/parse_payload.go
Multiple functions fetch the same PR data:
// In handleReRequestEvent() - line ~442
pr, _, err := wrapAPI(v, "get_pull_request", func() (*github.PullRequest, *github.Response, error) {
return v.Client().PullRequests.Get(ctx, runevent.Organization, runevent.Repository, prNumber)
})
// In handleCheckSuites() - line ~474
pr, _, err := wrapAPI(v, "get_pull_request", func() (*github.PullRequest, *github.Response, error) {
return v.Client().PullRequests.Get(ctx, runevent.Organization, runevent.Repository, prNumber)
})
// In handleIssueCommentEvent() - line ~516
pr, _, err := wrapAPI(v, "get_pull_request", func() (*github.PullRequest, *github.Response, error) {
return v.Client().PullRequests.Get(ctx, runevent.Organization, runevent.Repository, issueNumber)
})
Similarly, list_pull_request_files is called in multiple places.
Evidence from E2E Instrumentation
Example 1: Single Comment Strategy Webhook (15 calls)
2x /repos/.../pulls/5237 # get_pull_request
2x /repos/.../pulls/5237/files # list_pull_request_files
Different rate limits (6795 vs 4959) indicate calls from different auth contexts (GitHub App vs OAuth2 webhook).
Example 2: Github PullRequest (21 calls)
2x /repos/.../pulls/40613 # get_pull_request
2x /repos/.../pulls/40613/files # list_pull_request_files
2x /repos/.../commits/.../check-runs # list_check_runs_for_ref
Total impact:
get_pull_request: 3 duplicate calls (12.5% waste)
list_pull_request_files: 5 duplicate calls (21.7% waste)
list_check_runs_for_ref: 3 duplicate calls (8.8% waste)
get_commit_files: 7 duplicate calls (20.6% waste)
- Combined: ~18 wasted calls
Root Cause
-
Independent handlers: Each event handler (handleReRequestEvent, handleCheckSuites, handleIssueCommentEvent) fetches PR data independently.
-
Multiple auth contexts: Some flows use GitHub App auth, others use OAuth2. Each may fetch the same data.
-
No shared state: Handlers don't share fetched data, leading to redundant fetches.
-
Partial caching: While GetFiles() has cachedChangedFiles, this only works within a single Provider instance. If multiple Provider instances are created or if data is fetched before caching occurs, duplicates happen.
Proposed Solution
Option A: Add PR Cache to Provider (Recommended)
Add a cache for PR-related data in the Provider struct:
type Provider struct {
// ... existing fields
// Caches for PR-related data
prCache map[int]*github.PullRequest
prCacheMutex sync.RWMutex
checkRunsCache map[string]*github.ListCheckRunsResults
checkRunsCacheMutex sync.RWMutex
}
func (v *Provider) GetPullRequest(ctx context.Context, org, repo string, prNumber int) (*github.PullRequest, error) {
v.prCacheMutex.RLock()
if cached, ok := v.prCache[prNumber]; ok {
v.prCacheMutex.RUnlock()
return cached, nil
}
v.prCacheMutex.RUnlock()
pr, _, err := wrapAPI(v, "get_pull_request", func() (*github.PullRequest, *github.Response, error) {
return v.Client().PullRequests.Get(ctx, org, repo, prNumber)
})
if err != nil {
return nil, err
}
v.prCacheMutex.Lock()
v.prCache[prNumber] = pr
v.prCacheMutex.Unlock()
return pr, nil
}
Then update all handlers to use v.GetPullRequest() instead of direct API calls.
Option B: Fetch Once, Pass Through (More Invasive)
Restructure event handling to fetch PR data once at the entry point and pass it to handlers:
type eventContext struct {
PR *github.PullRequest
ChangedFiles []string
CheckRuns *github.ListCheckRunsResults
// ... other shared data
}
func (v *Provider) processEvent(ctx context.Context, ...) error {
eventCtx := &eventContext{}
// Fetch common data once
if isPREvent {
eventCtx.PR, _ = v.fetchPR(ctx, prNumber)
eventCtx.ChangedFiles, _ = v.fetchChangedFiles(ctx, prNumber)
}
// Pass to handlers
return v.handleEvent(ctx, eventCtx, ...)
}
Recommendation: Option A is less invasive and can be implemented incrementally.
Files to Modify
For Option A (Recommended)
-
pkg/provider/github/github.go
- Add
prCache, checkRunsCache fields to Provider struct
- Add
GetPullRequest(), GetCheckRunsForRef() helper methods
-
pkg/provider/github/parse_payload.go
- Update
handleReRequestEvent() to use v.GetPullRequest()
- Update
handleCheckSuites() to use v.GetPullRequest()
- Update
handleIssueCommentEvent() to use v.GetPullRequest()
- Update check runs fetching to use cached helper
-
pkg/provider/github/status.go (if applicable)
- Update any direct PR/check-runs API calls to use cached helpers
Testing
- Run existing unit tests
- Run e2e tests with
PAC_API_INSTRUMENTATION_DIR set
- Verify no duplicate
get_pull_request or list_check_runs_for_ref calls
Expected Impact
- Calls saved: ~18 per full e2e test suite
- Per-event savings: 2-4 calls for PR-related events
- Complexity: Medium - requires updating multiple call sites
Acceptance Criteria
Considerations
Cache Invalidation
Not needed. The Provider instance is short-lived (one webhook event), and PR data doesn't change during processing.
Auth Context Differences
If different auth contexts (App vs OAuth2) are used within the same event, they may have different permissions. The cache should be per-Provider, and each auth context should have its own Provider instance. If this is the current design, the duplication might be necessary. Investigation needed to confirm.
Thread Safety
Use sync.RWMutex for all caches to handle concurrent access.
/label ~enhancement ~performance ~github-provider
Centralize PR data fetching to prevent duplicate API calls
Summary
Multiple event handlers in the GitHub provider independently fetch pull request details, commit information, and changed files. This results in duplicate API calls when processing a single webhook event that triggers multiple handlers.
Problem
Location:
pkg/provider/github/parse_payload.goMultiple functions fetch the same PR data:
Similarly,
list_pull_request_filesis called in multiple places.Evidence from E2E Instrumentation
Example 1: Single Comment Strategy Webhook (15 calls)
Different rate limits (6795 vs 4959) indicate calls from different auth contexts (GitHub App vs OAuth2 webhook).
Example 2: Github PullRequest (21 calls)
Total impact:
get_pull_request: 3 duplicate calls (12.5% waste)list_pull_request_files: 5 duplicate calls (21.7% waste)list_check_runs_for_ref: 3 duplicate calls (8.8% waste)get_commit_files: 7 duplicate calls (20.6% waste)Root Cause
Independent handlers: Each event handler (
handleReRequestEvent,handleCheckSuites,handleIssueCommentEvent) fetches PR data independently.Multiple auth contexts: Some flows use GitHub App auth, others use OAuth2. Each may fetch the same data.
No shared state: Handlers don't share fetched data, leading to redundant fetches.
Partial caching: While
GetFiles()hascachedChangedFiles, this only works within a single Provider instance. If multiple Provider instances are created or if data is fetched before caching occurs, duplicates happen.Proposed Solution
Option A: Add PR Cache to Provider (Recommended)
Add a cache for PR-related data in the Provider struct:
Then update all handlers to use
v.GetPullRequest()instead of direct API calls.Option B: Fetch Once, Pass Through (More Invasive)
Restructure event handling to fetch PR data once at the entry point and pass it to handlers:
Recommendation: Option A is less invasive and can be implemented incrementally.
Files to Modify
For Option A (Recommended)
pkg/provider/github/github.goprCache,checkRunsCachefields to Provider structGetPullRequest(),GetCheckRunsForRef()helper methodspkg/provider/github/parse_payload.gohandleReRequestEvent()to usev.GetPullRequest()handleCheckSuites()to usev.GetPullRequest()handleIssueCommentEvent()to usev.GetPullRequest()pkg/provider/github/status.go(if applicable)Testing
PAC_API_INSTRUMENTATION_DIRsetget_pull_requestorlist_check_runs_for_refcallsExpected Impact
Acceptance Criteria
get_pull_requestcalls in instrumentation outputConsiderations
Cache Invalidation
Not needed. The Provider instance is short-lived (one webhook event), and PR data doesn't change during processing.
Auth Context Differences
If different auth contexts (App vs OAuth2) are used within the same event, they may have different permissions. The cache should be per-Provider, and each auth context should have its own Provider instance. If this is the current design, the duplication might be necessary. Investigation needed to confirm.
Thread Safety
Use
sync.RWMutexfor all caches to handle concurrent access./label ~enhancement ~performance ~github-provider