Conversation
WalkthroughThis pull request adds context cancellation support across the enumeration framework. The core changes wrap result emissions in select statements that handle context cancellation, add ctx.Done() checks in pagination and processing loops, and ensure result counters are only incremented upon successful channel sends. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant Source
participant Context
participant ResultChan
rect rgb(200, 220, 240)
note over Source: OLD: Unconditional result emission
Source->>ResultChan: Send result
Source->>Source: s.results++
end
rect rgb(220, 240, 200)
note over Source: NEW: Context-aware result emission
loop Each result
Source->>Context: Check ctx.Done()?
alt Context cancelled
Context-->>Source: Done signal received
Source->>Source: Return early
else Context active
Source->>ResultChan: Send result (non-blocking via select)
alt Send succeeded
ResultChan-->>Source: Result accepted
Source->>Source: s.results++
else Send blocked (unusual)
Source->>Source: Wait on ctx or send
end
end
end
end
Caller->>Context: Cancel context
Context-->>Source: Propagate cancellation
Source->>Source: Exit gracefully
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45-60 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (24)
pkg/subscraping/sources/fullhunt/fullhunt.go (2)
47-51: Inconsistent context handling for error results.Error results are sent directly to the channel without checking context cancellation, which can cause the send to block indefinitely if the context is cancelled and the receiver stops consuming. Additionally,
s.errors++is incremented before the send completes, which is inconsistent with the pattern used for subdomain results (line 68).Apply this diff to make error handling consistent with context cancellation:
resp, err := session.Get(ctx, fmt.Sprintf("https://fullhunt.io/api/v1/domain/%s/subdomains", domain), "", map[string]string{"X-API-KEY": randomApiKey}) if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } session.DiscardHTTPResponse(resp) return }
56-61: Inconsistent context handling for error results.Same issue as lines 47-51: error results are sent directly without checking context cancellation, and the counter is incremented before the send completes.
Apply this diff to make error handling consistent with context cancellation:
var response fullHuntResponse err = jsoniter.NewDecoder(resp.Body).Decode(&response) if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } session.DiscardHTTPResponse(resp) return }pkg/subscraping/sources/threatminer/threatminer.go (1)
40-43: Guard error sends withctx.Done()to avoid blocking after cancellationBoth error paths still use a plain send on
results, which can block indefinitely if the consumer stops reading the channel when the context is cancelled (as per the PR objective to make callers able to abort enumeration). To fully honor cancellation and avoid leaking goroutines, consider wrapping error sends in the sameselectpattern:- if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - session.DiscardHTTPResponse(resp) - return - } + if err != nil { + select { + case <-ctx.Done(): + session.DiscardHTTPResponse(resp) + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + session.DiscardHTTPResponse(resp) + return + } @@ - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }This keeps behavior identical under normal operation but ensures the goroutine can exit promptly when
ctxis cancelled instead of blocking on an unreceived error.Also applies to: 51-53
pkg/subscraping/sources/fofa/fofa.go (1)
98-99: Use context-aware select for result emission and conditional counter increment.The current implementation sends results directly to the channel without context awareness and unconditionally increments the counter. This pattern can cause:
- Blocked sends if the receiver stops consuming after context cancellation
- Inflated result counts when sends don't complete
Per the AI summary, other sources "switch direct channel sends to context-aware selects to prevent blocking and to increment s.results only on successful sends."
Apply this diff to align with the cancellation-safe pattern:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain} - s.results++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}: + s.results++ + }pkg/subscraping/sources/zoomeyeapi/zoomeyeapi.go (1)
74-81: Critical: Error result sends can block after context cancellation.The direct channel send at line 76 (and similarly at line 87) does not check for context cancellation. Since
resultsis an unbuffered channel, these sends will block indefinitely if the receiver has stopped consuming due to context cancellation, violating the requirement that enumeration must stop immediately.Apply this diff to wrap error sends in context-aware select statements:
if err != nil { if !isForbidden { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + session.DiscardHTTPResponse(resp) + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } session.DiscardHTTPResponse(resp) } return }And similarly for the error case at lines 86-91:
if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + _ = resp.Body.Close() + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } _ = resp.Body.Close() return }pkg/subscraping/sources/threatbook/threatbook.go (1)
56-56: Error sends must also check context cancellation to prevent goroutine leaks.All error result sends use direct channel operations without checking
ctx.Done(). Since the results channel is unbuffered (line 38), if the context is cancelled and the receiver stops reading, these sends will block indefinitely, causing the goroutine to leak.Apply a similar select pattern to error sends. For example, for line 56:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + }Apply the same pattern to error sends at lines 65, 73-75, and 83 to ensure consistent context-aware behavior across all result emissions.
Also applies to: 65-65, 73-75, 83-83
pkg/subscraping/sources/onyphe/onyphe.go (1)
102-122: Wrap channel sends in select statements to prevent blocked sends after cancellation.The channel sends to
resultsare not wrapped in select statements withctx.Done()checks. If the context is canceled while the goroutine attempts to send, and the receiver has stopped consuming from the channel, these sends will block indefinitely, causing a goroutine leak.Additionally, result counters should only be incremented when the send succeeds (i.e., when the
case results <- ...executes, not whenctx.Done()fires).Apply this pattern to all result sends:
for _, subdomain := range record.Subdomains { if subdomain != "" { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain} - s.results++ + select { + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}: + s.results++ + case <-ctx.Done(): + return + } } } if record.Hostname != "" { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: record.Hostname} - s.results++ + select { + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: record.Hostname}: + s.results++ + case <-ctx.Done(): + return + } } if record.Forward != "" { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: record.Forward} - s.results++ + select { + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: record.Forward}: + s.results++ + case <-ctx.Done(): + return + } } if record.Reverse != "" { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: record.Reverse} - s.results++ + select { + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: record.Reverse}: + s.results++ + case <-ctx.Done(): + return + } }pkg/subscraping/sources/builtwith/builtwith.go (1)
60-61: Wrap error sends in select statements to prevent blocking.The error result sends at lines 60 and 69 use direct channel sends, which can block indefinitely if the context is cancelled and the receiver stops reading. This is inconsistent with the context-aware pattern applied to subdomain results at lines 77-82.
Apply this pattern to both error sends:
For lines 60-61:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + session.DiscardHTTPResponse(resp) + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + }For lines 69-70:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + session.DiscardHTTPResponse(resp) + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + }Also applies to: 69-70
pkg/subscraping/sources/facebook/ctlogs.go (1)
112-115: Wrap error result sends in select with ctx.Done() to prevent goroutine leaks.The error result sends on lines 114, 121, and 128 are not wrapped in a select statement with
ctx.Done(), unlike the regular result sends (lines 133-138). If the context is cancelled and the caller stops reading from the results channel, these sends will block indefinitely, causing goroutine leaks.Apply this diff to add context handling for error sends:
resp, err := session.Get(ctx, domainsURL, "", nil) if err != nil { s.errors++ - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + } return } bin, err := io.ReadAll(resp.Body) if err != nil { s.errors++ gologger.Verbose().Msgf("failed to read response body: %s\n", err) - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + } return } session.DiscardHTTPResponse(resp) response := &response{} if err := json.Unmarshal(bin, response); err != nil { s.errors++ - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: errorutil.NewWithErr(err).Msgf("failed to unmarshal response: %s", string(bin))} + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: errorutil.NewWithErr(err).Msgf("failed to unmarshal response: %s", string(bin))}: + } return }Also applies to: 118-122, 126-129
pkg/subscraping/sources/shodan/shodan.go (1)
83-92: Critical: Unconditional channel send can block after context cancellation.The select statement checks
ctx.Done()but falls through to an unconditional channel send on line 89. If the context is cancelled between the select check and the send, the goroutine will block indefinitely, violating the PR's objective to stop enumeration immediately upon cancellation.Apply this diff to wrap the send inside the select statement:
for _, data := range response.Subdomains { select { case <-ctx.Done(): return - default: - } - value := fmt.Sprintf("%s.%s", data, response.Domain) - results <- subscraping.Result{ - Source: s.Name(), Type: subscraping.Subdomain, Value: value, - } - s.results++ + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: fmt.Sprintf("%s.%s", data, response.Domain)}: + s.results++ + } }pkg/subscraping/sources/securitytrails/securitytrails.go (1)
106-119: Inconsistent cancellation handling: subdomain send is unguarded.The
Subdomainsloop checksctx.Done()at the start (lines 107-111), but the actual channel send at line 117 is not wrapped in a select. This differs from theRecordsloop (lines 98-103) which correctly guards the send. If the context is cancelled during the send, this could block.Apply this diff for consistency:
for _, subdomain := range securityTrailsResponse.Subdomains { - select { - case <-ctx.Done(): - return - default: - } if strings.HasSuffix(subdomain, ".") { subdomain += domain } else { subdomain = subdomain + "." + domain } - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain} - s.results++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}: + s.results++ + } }pkg/subscraping/sources/chaos/chaos.go (1)
44-57: Channel sends should be wrapped inselectto prevent goroutine leaks.The context check at the loop start is good, but the channel sends on lines 50 and 54-56 are not protected. If context is cancelled between the check and the send, and the receiver stops reading (because it also respects cancellation), this goroutine will block forever.
Per the PR objectives, use
selectwithctx.Done()when sending results to avoid blocked sends after cancellation:for result := range chaosClient.GetSubdomains(&chaos.SubdomainsRequest{ Domain: domain, }) { select { case <-ctx.Done(): return default: } if result.Error != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: result.Error} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: result.Error}: + s.errors++ + } break } - results <- subscraping.Result{ - Source: s.Name(), Type: subscraping.Subdomain, Value: fmt.Sprintf("%s.%s", result.Subdomain, domain), + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{ + Source: s.Name(), Type: subscraping.Subdomain, Value: fmt.Sprintf("%s.%s", result.Subdomain, domain), + }: + s.results++ } - s.results++ }With this pattern, the initial
selectat lines 44-48 becomes redundant and can be removed.pkg/subscraping/sources/robtex/robtext.go (2)
59-61: Error sends must respect context cancellation to prevent goroutine blocking.The error result sends at lines 59 and 73 use direct channel operations without checking
ctx.Done(). If the context is cancelled and the receiver stops consuming from the channel, these sends will block indefinitely, preventing the goroutine from exiting cleanly and defeating the PR's objective of immediate cancellation.Apply this pattern to both error sends:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + }Also applies to: 73-75
104-116: Add context cancellation check in the response scanning loop.The
enumeratefunction processes the response line-by-line without checking for context cancellation. For large responses, this means enumeration will continue processing all lines even after the context is cancelled, contradicting the PR objective of stopping "immediately (within milliseconds)."Apply this diff to respect cancellation during response processing:
scanner := bufio.NewScanner(resp.Body) for scanner.Scan() { + select { + case <-ctx.Done(): + return results, ctx.Err() + default: + } line := scanner.Text() if line == "" { continuepkg/subscraping/sources/chinaz/chinaz.go (1)
41-47: Error-result sends are still not ctx-aware, anderris likely nil in theelsebranch.Two related points in the error paths:
Context handling on error sends
At Lines 41-47 and 72-74, the error results are sent with a plain
results <- .... If the consumer stops reading fromresultswhen the context is cancelled, these sends can still block the goroutine. For consistency with the new pattern in the loop and the PR objective, consider wrapping these sends as:- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - session.DiscardHTTPResponse(resp) - return + select { + case <-ctx.Done(): + session.DiscardHTTPResponse(resp) + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + session.DiscardHTTPResponse(resp) + return + }and similarly in the
elsebranch.Nil error value in the
elsebranchWhen
SubdomainList.ToBool()is false (Lines 71-74),errwill typically benil(the HTTP andio.ReadAllhave already succeeded), so you’re emitting an error result with a nilErrorfield. It would be clearer to construct a descriptive error for this condition, e.g.:- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} + results <- subscraping.Result{ + Source: s.Name(), + Type: subscraping.Error, + Error: fmt.Errorf("chinaz: missing or empty ContributingSubdomainList in response"), + }(and then optionally wrap that send in the ctx-aware
selectas above).These are small, self-contained changes but would make error handling and cancellation behavior more consistent with the rest of the PR.
Also applies to: 72-74
pkg/subscraping/sources/windvane/windvane.go (1)
78-83: Consider wrapping error emissions in select for complete context cancellation support.The error result sends at lines 79 and 90 are not wrapped in select statements. If the context is cancelled while an error is being processed, these sends could block indefinitely if the consumer has stopped reading from the channel.
For consistency with the PR's objective of comprehensive context cancellation support, consider applying the same select pattern used for subdomain results:
if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } session.DiscardHTTPResponse(resp) return }Apply the same pattern at line 90.
Also applies to: 89-94
pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (1)
59-61: Error paths still perform unguarded sends and can block after cancellationAll success-path result sends are now guarded by
selectonctx.Done(), but the error paths still send directly toresults:
- Line 61 (initial request error)
- Line 69 (initial decode error)
- Line 87 (paginated request error)
- Line 95 (paginated decode error)
If the upstream consumer stops reading from this source’s channel when
ctxis canceled, these unbuffered sends can still block indefinitely and keep the goroutine alive, which partially defeats the PR’s goal of fast cancellation across sources.Consider applying the same
selectpattern to error emissions, while ensuring responses are always discarded before potentially returning on cancellation. For example, for the paginated decode error case:- err = jsoniter.NewDecoder(resp.Body).Decode(&response) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - session.DiscardHTTPResponse(resp) - s.errors++ - continue - } + err = jsoniter.NewDecoder(resp.Body).Decode(&response) + if err != nil { + session.DiscardHTTPResponse(resp) + s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + } + continue + }A similar pattern can be applied to the other error branches so that all sends on
resultsbecome cancellation-aware while still cleaning up HTTP responses.Also applies to: 69-72, 86-90, 93-99
pkg/subscraping/sources/rsecloud/rsecloud.go (1)
50-64: Add ctx-aware select around error sends to avoid blocking after cancellationThe loop-level
ctx.Done()check at Lines 53-57 is good and prevents new requests after cancellation. However, in the error paths you still send onresultswithout aselect:
- Line 60:
results <- subscraping.Result{... Error: err}- Line 70:
results <- subscraping.Result{... Error: err}If the upstream has stopped reading from
resultsbecause it honoredctx.Done(), these sends can block indefinitely, leaking this goroutine and partially defeating the context-cancellation guarantees you’re adding. Also,s.errorsis currently incremented even if the send would block.Consider mirroring the pattern you use for successful results:
- Wrap both error sends in
select { case <-ctx.Done(): return; case results <- ... }.- Move
s.errors++into theresultscase, so it only increments on successful delivery.For example:
- resp, err := session.Get(ctx, fmt.Sprintf("https://api.rsecloud.com/api/v2/subdomains/%s/%s?page=%d", endpoint, domain, page), "", headers) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - session.DiscardHTTPResponse(resp) - return - } + resp, err := session.Get(ctx, fmt.Sprintf("https://api.rsecloud.com/api/v2/subdomains/%s/%s?page=%d", endpoint, domain, page), "", headers) + if err != nil { + session.DiscardHTTPResponse(resp) + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + } @@ - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }This aligns error handling with the PR’s goal of being fully cancellation-aware in all channel writes and counters.
pkg/subscraping/sources/intelx/intelx.go (1)
81-86: Guard error sends withctx.Done()as well to avoid blocking on cancellationAll error paths still do a plain
results <- subscraping.Result{...}. If the caller stops reading fromresultsafter cancelingctx, these sends can block indefinitely, leaking the goroutine and, in some branches, delayingsession.DiscardHTTPResponse(resp).To fully meet the “no blocked sends after cancellation” objective, consider guarding these error emissions with the same pattern you used for subdomain results, e.g.:
- body, err := json.Marshal(reqBody) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + body, err := json.Marshal(reqBody) + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }Similarly, for the branches that have a
respto discard, you can discard first, then use the sameselectpattern so response cleanup is not skipped when the context is canceled.Also applies to: 88-94, 96-103, 115-121, 123-129, 131-137
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
49-65: Also guard error-path sends withctx.Done()to avoid blocking after cancellationIn both error branches you still send directly to
results:if err != nil { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} s.errors++ session.DiscardHTTPResponse(resp) return } ... if err != nil { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} s.errors++ session.DiscardHTTPResponse(resp) return }If the caller cancels the context and stops reading from the aggregated results channel, these plain sends can still block indefinitely and leak this goroutine, which undercuts the PR objective of making all emissions cancellation-aware.
You can mirror the pattern used in the records loop and only increment
s.errorson a successful send, while respecting ctx and still cleaning up the HTTP response:- if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - session.DiscardHTTPResponse(resp) - return - } + if err != nil { + select { + case <-ctx.Done(): + session.DiscardHTTPResponse(resp) + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + session.DiscardHTTPResponse(resp) + return + } @@ - err = json.NewDecoder(resp.Body).Decode(&response) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - session.DiscardHTTPResponse(resp) - return - } + err = json.NewDecoder(resp.Body).Decode(&response) + if err != nil { + select { + case <-ctx.Done(): + // `defer session.DiscardHTTPResponse(resp)` will still run + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }This way all result emissions from this source are ctx-aware, and you avoid potential goroutine leaks on error paths.
pkg/subscraping/sources/riddler/riddler.go (1)
32-37: Error path still does a blocking send and can ignore context cancellationOn HTTP error you still do an unconditional:
results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} s.errors++ session.DiscardHTTPResponse(resp) returnIf
ctxhas been cancelled and upstream has stopped reading fromresults, this send can block indefinitely, which is exactly what this PR is trying to prevent.You should make this path context-aware as well, e.g.:
- resp, err := session.SimpleGet(ctx, fmt.Sprintf("https://riddler.io/search?q=pld:%s&view_type=data_table", domain)) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - session.DiscardHTTPResponse(resp) - return - } + resp, err := session.SimpleGet(ctx, fmt.Sprintf("https://riddler.io/search?q=pld:%s&view_type=data_table", domain)) + if err != nil { + defer session.DiscardHTTPResponse(resp) + select { + case <-ctx.Done(): + // Context cancelled; just exit without emitting an error result. + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }This keeps the error emission consistent with the new context-aware send logic and avoids a potential goroutine leak when cancellation races with an HTTP error.
pkg/subscraping/sources/gitlab/gitlab.go (2)
36-60: Fix race condition: early returns bypasswg.Wait(), causing send-on-closed-channel panicThe returns at lines 127 and 135 exit
enumerate()immediately whenctx.Done()fires or onQueryUnescapeerror, but do so before reachingwg.Wait()at line 142. This leaves spawned goroutines (lines 91–121) still sending toresultsafterRun()defersclose(results)at line 44, causing a panic.Change the early returns to break the pagination loop and defer the return until after
wg.Wait():linksHeader := linkheader.Parse(resp.Header.Get("Link")) + cancelled := false for _, link := range linksHeader { select { case <-ctx.Done(): - return + cancelled = true + break default: } if link.Rel == "next" { nextURL, err := url.QueryUnescape(link.URL) if err != nil { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} s.errors++ - return + cancelled = true + break } s.enumerate(ctx, nextURL, domainRegexp, headers, session, results) } } wg.Wait() + if cancelled { + return + }This ensures all goroutines complete before
enumerate()returns, preventing sends to a closed channel.
87-121: Movedefer wg.Done()to the top of the goroutine to ensure it runs on all exit pathsInside the per-item goroutine,
defer wg.Done()is placed at line 121, at the very end of the function body. The error handling path at line 102 returns early before reaching the defer, skippingwg.Done()and leaving the WaitGroup counter > 0. Whenwg.Wait()is called at line 142, it will block indefinitely.Move
defer wg.Done()to the top of the goroutine immediately after the function declaration:for _, it := range items { go func(item item) { + defer wg.Done() + // The original item.Path causes 404 error because the Gitlab API is expecting the url encoded path fileUrl := fmt.Sprintf("https://gitlab.com/api/v4/projects/%d/repository/files/%s/raw?ref=%s", item.ProjectId, url.QueryEscape(item.Path), item.Ref) resp, err := session.Get(ctx, fileUrl, "", headers) if err != nil { if resp == nil || (resp != nil && resp.StatusCode != http.StatusNotFound) { session.DiscardHTTPResponse(resp) results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} s.errors++ return } } if resp.StatusCode == http.StatusOK { scanner := bufio.NewScanner(resp.Body) for scanner.Scan() { line := scanner.Text() if line == "" { continue } for _, subdomain := range domainRegexp.FindAllString(line, -1) { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain} s.results++ } } session.DiscardHTTPResponse(resp) } - defer wg.Done() }(it) }pkg/subscraping/sources/threatcrowd/threatcrowd.go (1)
41-79: Guard all error-path sends with ctx to prevent goroutine leaks on context cancellationThe unconditional sends at lines 43, 50, 57–60, 63, 70, and 76 can block indefinitely if the caller cancels
ctxand stops reading from the results channel. This leaves goroutines stuck on channel sends, preventingsync.WaitGroup.Wait()from completing in the manager and causing goroutine leaks.The select pattern with
ctx.Done()is already used incommoncrawl.goand should be applied consistently here to all six error-path sends:@@ - req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + } @@ - resp, err := session.Client.Do(req) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + resp, err := session.Client.Do(req) + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + } @@ - defer func() { - if err := resp.Body.Close(); err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - } - }() + defer func() { + if err := resp.Body.Close(); err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + } + }() @@ - if resp.StatusCode != http.StatusOK { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("unexpected status code: %d", resp.StatusCode)} - s.errors++ - return - } + if resp.StatusCode != http.StatusOK { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("unexpected status code: %d", resp.StatusCode)}: + s.errors++ + } + return + } @@ - body, err := io.ReadAll(resp.Body) - if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + body, err := io.ReadAll(resp.Body) + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + } @@ - if err := json.Unmarshal(body, &tcResponse); err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + if err := json.Unmarshal(body, &tcResponse); err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }A test that cancels
ctxduring HTTP failures or unmarshalling would validate this behavior.
♻️ Duplicate comments (1)
pkg/subscraping/sources/quake/quake.go (1)
105-109: Context check added correctly.The non-blocking context check before processing each item is appropriate and follows the PR objectives for adding ctx.Done() checks in processing loops.
Note: The need to wrap the channel send on line 114 with a select statement was already flagged in the previous comment.
🧹 Nitpick comments (15)
pkg/subscraping/sources/reconcloud/reconcloud.go (1)
64-71: Context-aware result emission is correct; consider applying the same pattern to error sendsThe new
selectaround per-asset result emission cleanly respectsctx.Done()and avoids blocked sends, while only incrementings.resultson successful delivery. This aligns well with the PR’s cancellation goals.For full consistency with “don’t block on send after cancellation”, you might also want to wrap the two error sends (Lines 47–50 and 56–59) in a similar
select { case <-ctx.Done(): ... case results <- ... }pattern. That would ensure this goroutine never blocks on an error send if the upstream has already stopped consuming due to context cancellation.pkg/subscraping/sources/leakix/leakix.go (1)
44-65: Consider making error result sends context-aware as wellThe success-path sends are now guarded by a
selectonctx.Done(), but the error paths still do direct sends onresults(Lines 44-47, 52-55, 60-64). If the upstream consumer ever stops readingresultswhen the context is cancelled, these sends could block and keep the goroutine alive.To make this source fully consistent with the new pattern and the PR objective, consider wrapping error sends in a similar
select:- if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return - } + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return + }(and likewise for the non‑200 status and JSON decode error paths).
pkg/subscraping/sources/netlas/netlas.go (1)
160-166: Context-aware result emission looks correct; consider mirroring for error sendsThe new
selectonctx.Done()vsresults <-cleanly handles cancellation: the goroutine exits promptly onctx.Done(), defers still run (soresultsgets closed andtimeTakenrecorded), ands.resultsis only incremented on a successful send. This is aligned with the PR’s cancellation goals.As a follow-up, you might also wrap the earlier error-result sends in this function in a similar
select { case <-ctx.Done(): return; case results <- ... }pattern, so they don’t block if the caller has already canceled and stopped reading from the channel.pkg/subscraping/sources/onyphe/onyphe.go (1)
78-82: Consider wrapping error sends in select statements for consistency.For complete context cancellation support, the error result sends (lines 79 and 88) should also be wrapped in select statements with
ctx.Done()checks. While these are in error paths that return immediately, they could still block if the context is canceled and the receiver has stopped consuming.Apply this pattern:
if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + case <-ctx.Done(): + } session.DiscardHTTPResponse(resp) return }Also applies to: 86-91
pkg/subscraping/sources/digitorus/digitorus.go (1)
63-69: Result emission is now cancellation‑safe; consider doing the same for error sendsWrapping subdomain emission in:
- a
selectbetweenctx.Done()andresults <- ..., and- incrementing
s.resultsonly in the successful send branchis exactly what you want for cancellation‑safe, non‑blocking sends.
To fully match the PR goal (“use
ctx.Done()when sending results into channels”), consider applying the sameselect { case <-ctx.Done(): return ... }pattern to the error sends at Line 38 and Line 46. That would make all emissions from this source safe if the consumer stops reading after cancellation, not just the hot subdomain loop.pkg/subscraping/sources/waybackarchive/waybackarchive.go (1)
36-36: Consider wrapping error send in select for consistency.While the caller is typically still listening at this early stage, wrapping this error send in a
selectwithctx.Done()would provide complete consistency with the pattern applied elsewhere in this PR.Apply this diff for consistency:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + }pkg/subscraping/sources/robtex/robtext.go (1)
60-60: Consider aligning error counter semantics with results counter.The error counter is incremented unconditionally, while the results counter (line 82) is only incremented upon successful channel send. For consistency, consider incrementing the error counter only after successfully sending the error result to the channel (which would be inside the select statement if the critical issue above is addressed).
Also applies to: 74-74
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
53-53: Consider wrapping error sends in select statements for consistency.While the result sends (lines 97-102) correctly use
selectwithctx.Done()to prevent blocking, the error sends throughout the file do not follow the same pattern. If the context is cancelled and the receiver stops reading from the channel, these error sends could block indefinitely, potentially causing goroutine leaks.For consistency with the context-aware result sending pattern and to fully align with the PR objectives, consider wrapping error sends in select statements as well.
Example pattern for error sends:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + }This pattern should be applied to all error sends (lines 53, 59, 74, 80, 82, 91) to ensure uniform context-cancellation behavior across all channel operations.
Also applies to: 59-59, 74-74, 80-80, 82-82, 91-91
pkg/subscraping/sources/chinaz/chinaz.go (1)
58-69: Context-aware loop/send looks correct; pre-loop select can be simplified.The new pattern in the loop correctly:
- Exits promptly on
ctx.Done().- Guards the send with a
select, so the goroutine won’t stay blocked after cancellation.- Increments
s.resultsonly on successful sends.The
selectat Lines 58-62 is functionally redundant with the laterselectat Lines 64-69 (which already handlesctx.Done()before/while sending). For readability and a tiny simplification, you could either:
- Drop the first
selectentirely and rely on the sendselect, or- Replace it with a cheaper
if ctx.Err() != nil { return }pre-check if you want an explicit early-exit before doingjsoniter.Get.This is non-blocking for merge, just a minor cleanup.
pkg/subscraping/sources/pugrecon/pugrecon.go (1)
108-114: Context-aware result send looks good; consider aligning error sends with the same patternThe new
selectaround the subdomain emission correctly:
- Stops the goroutine promptly on
ctx.Done().- Avoids blocking on
results <- ...when the consumer stops reading.- Increments
s.resultsonly on successful sends.To fully align this source with the PR’s cancellation goal and avoid any chance of blocking on error paths if the consumer stops reading after cancellation, consider wrapping the earlier error sends (
results <- subscraping.Result{Type: subscraping.Error, ...}at lines similar to 60, 76, 83, 95, 103) in the sameselectpattern and moving thes.errors++into the successful send branch.For example, the first error case could be updated like this:
- if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("failed to marshal request body: %w", err)} - s.errors++ - return - } + if err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{ + Source: s.Name(), + Type: subscraping.Error, + Error: fmt.Errorf("failed to marshal request body: %w", err), + }: + s.errors++ + } + return + }You could apply the same pattern to the other error-emission sites in this function for consistency and to guarantee non-blocking behavior under cancellation.
pkg/subscraping/sources/virustotal/virustotal.go (1)
54-58: Context check in pagination loop is correct; consider simpler idiomatic formThe select here does the right thing and stops pagination promptly when
ctxis cancelled. If you want to shave some noise, this can be expressed a bit more idiomatically as a direct error check:- for { - select { - case <-ctx.Done(): - return - default: - } + for { + if err := ctx.Err(); err != nil { + return + }Behavior is the same, just a bit easier to scan.
pkg/subscraping/sources/c99/c99.go (1)
59-79: Consider making error-result sends context-aware as wellThe success path now respects
ctx.Done(), but the error paths still perform unconditional sends toresults:
- Deferred
resp.Body.Close()error (lines 59-63).- JSON decode error (lines 68-71).
- API-level error in
response.Error(lines 74-79).If the context is cancelled and upstream stops reading from
results, these sends can still block and keep the goroutine alive, partially defeating the goal of prompt cancellation.You can align these with the new pattern by guarding them with
ctx.Done()as well. For example:- defer func() { - if err := resp.Body.Close(); err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - } - }() + defer func() { + if err := resp.Body.Close(); err != nil { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + } + }() var response dnsdbLookupResponse err = jsoniter.NewDecoder(resp.Body).Decode(&response) if err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - return + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + s.errors++ + } + return } if response.Error != "" { - results <- subscraping.Result{ - Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("%v", response.Error), - } - s.errors++ - return + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{ + Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("%v", response.Error), + }: + s.errors++ + } + return }This keeps behavior the same under normal operation but prevents blocked sends after cancellation, bringing the error paths in line with the PR’s context-cancellation goal.
pkg/subscraping/sources/hackertarget/hackertarget.go (1)
49-53: Context check before processing each scanned line looks goodThe non-blocking
selectonctx.Done()before handling each line should let this goroutine exit quickly on cancellation while avoiding any extra blocking. That aligns well with the PR goal.As a small optional improvement, you could also check
scanner.Err()after the loop and emit an error result (guarded byctx.Done()similarly), so read errors (including context-related ones) are surfaced instead of silently ignored.pkg/subscraping/sources/riddler/riddler.go (1)
42-59: Context-aware scanning and result emission look correct; minor duplication onlyThe added
selectblocks in the scan loop and per-subdomain send correctly:
- Stop processing promptly on
ctx.Done().- Ensure
session.DiscardHTTPResponse(resp)is called before returning.- Increment
s.resultsonly when the send onresultsactually succeeds.This matches the PR goal for cooperative cancellation. The only minor nit is that the
ctx.Done()+DiscardHTTPResponse+returnpattern is duplicated twice; if this shows up in many sources, consider a small helper to reduce repetition.pkg/subscraping/sources/gitlab/gitlab.go (1)
69-83: Align result sends with ctx cancellation to avoid blocked sendsPer the PR objective, sources should avoid blocking or continuing to emit results after ctx cancellation. In this file, all sends to
resultsare direct:results <- subscraping.Result{...}If downstream stops reading from
resultswhenctxis cancelled, these sends can still block goroutines even after pagination stops.To align with the rest of the PR, consider wrapping sends in a
selectthat respectsctx.Done(); for example:- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}: + }and similarly for the subdomain emission:
- for _, subdomain := range domainRegexp.FindAllString(line, -1) { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain} - s.results++ - } + for _, subdomain := range domainRegexp.FindAllString(line, -1) { + select { + case <-ctx.Done(): + return + case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}: + } + s.results++ + }This keeps goroutines from getting stuck on channel sends once the caller cancels the context, matching the intended behavior described in the linked issue. Based on PR objectives, this makes the GitLab source consistent with other updated sources.
Also applies to: 95-115
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| default: | ||
| } |
There was a problem hiding this comment.
Context check is incomplete; result emission also needs protection.
While checking for cancellation before processing each subdomain is a good step, the subsequent result emission on line 98 is still a direct channel send that can block if the receiver has stopped due to cancellation.
Per the PR objectives and the pattern adopted by other sources, result emission should also be wrapped in a context-aware select.
🤖 Prompt for AI Agents
In pkg/subscraping/sources/fofa/fofa.go around lines 86 to 90, the code checks
ctx.Done() before processing each subdomain but still performs an unconditional
send to the result channel later (around line 98), which can block if the
receiver has stopped; change that send to be context-aware by replacing the
direct channel send with a select that either sends the result on the result
channel or returns/continues when ctx.Done() fires, ensuring you handle
cancellation without blocking (mirror the pattern used elsewhere in other source
implementations).
| for _, subdomain := range domainRegexp.FindAllString(normalizeContent(line), -1) { | ||
| results <- subscraping.Result{Source: name, Type: subscraping.Subdomain, Value: subdomain} | ||
| s.results++ | ||
| select { | ||
| case <-ctx.Done(): | ||
| session.DiscardHTTPResponse(resp) | ||
| return | ||
| case results <- subscraping.Result{Source: name, Type: subscraping.Subdomain, Value: subdomain}: | ||
| s.results++ | ||
| } | ||
| } |
There was a problem hiding this comment.
Data race on s.results increment.
Multiple goroutines spawned at line 161 concurrently increment s.results without synchronization. This applies to both line 198 and line 216.
Options to fix:
- Use
atomic.AddInt32fors.results(change field type toint32) - Use a mutex to protect the increment
- Send increments through a channel and aggregate in the parent
+import "sync/atomic"
type Source struct {
apiKeys []string
timeTaken time.Duration
errors int
- results int
+ results int32
skipped bool
}Then update the increments:
case results <- subscraping.Result{Source: name, Type: subscraping.Subdomain, Value: subdomain}:
- s.results++
+ atomic.AddInt32(&s.results, 1)And update Statistics() to use atomic.LoadInt32(&s.results).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/subscraping/sources/github/github.go around lines 192–200 (and the other
increment at ~216), s.results is being incremented concurrently causing a data
race; change s.results type to int32, import sync/atomic, replace all direct
increments (s.results++ at lines ~198 and ~216) with atomic.AddInt32(&s.results,
1), and update Statistics() to return atomic.LoadInt32(&s.results) (or convert
the loaded int32 to the expected return type). Ensure the new import is added
and all other accesses to s.results use atomic operations.
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}: | ||
| s.results++ | ||
| } |
There was a problem hiding this comment.
Result send is now ctx-aware, but error/close sends can still block after cancellation
The select around the subdomain result send correctly prevents this goroutine from blocking indefinitely if ctx is canceled and the caller stops reading from results, and only increments s.results on a successful send. That part looks solid.
However, the error sends at Line 34 (SimpleGet error) and Line 42 (resp.Body.Close() error) still write to results unconditionally. If ctx is already canceled and upstream has stopped consuming from this channel, those sends can block and keep the goroutine alive, which undermines the “stop promptly on cancellation” objective.
Consider wrapping those error sends in the same select pattern you use here, for example:
select {
case <-ctx.Done():
return
case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
s.errors++
}This would make all sends from this source respect context cancellation consistently, including error paths.
🤖 Prompt for AI Agents
In pkg/subscraping/sources/hackertarget/hackertarget.go around lines 60 to 65,
the result send uses a ctx-aware select but the earlier error sends (around
lines ~34 and ~42) still write to results unconditionally and can block after
cancellation; change those unconditional error sends to the same ctx-aware
select pattern (case <-ctx.Done(): return; case results <-
subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
s.errors++), and apply the same pattern for any resp.Body.Close() error send so
error sends respect context cancellation and only increment counters on a
successful non-blocking send.
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}: | ||
| s.results++ | ||
| } |
There was a problem hiding this comment.
Context‑aware send for subdomain results is good; align error paths with the same pattern
The new select around the subdomain result send correctly respects ctx.Done() and only increments s.results on successful sends. This matches the PR goal of making result emission cancellation‑aware. 👍
However, the two error paths above still do a bare send to results:
- Line 45:
results <- subscraping.Result{... Type: subscraping.Error ...} - Line 55:
results <- subscraping.Result{... Type: subscraping.Error ...}
If the caller cancels the context and stops draining the results channel, these sends can still block indefinitely, leaking this goroutine and partially defeating the context‑cancellation objective.
I’d recommend using the same select pattern for the error cases and only incrementing s.errors when the send succeeds, e.g.:
@@
- if err != nil {
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
- s.errors++
- session.DiscardHTTPResponse(resp)
- return
- }
+ if err != nil {
+ select {
+ case <-ctx.Done():
+ session.DiscardHTTPResponse(resp)
+ return
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
+ session.DiscardHTTPResponse(resp)
+ return
+ }
@@
- if err != nil {
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
- s.errors++
- session.DiscardHTTPResponse(resp)
- return
- }
+ if err != nil {
+ select {
+ case <-ctx.Done():
+ session.DiscardHTTPResponse(resp)
+ return
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
+ session.DiscardHTTPResponse(resp)
+ return
+ }If this PR introduces a shared helper for cancellation‑aware sends elsewhere, you could also delegate to that here to keep all sources consistent.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/subscraping/sources/hudsonrock/hudsonrock.go around lines 63-68, the
select used for sending subdomain results is cancellation-aware but the earlier
error sends at lines ~45 and ~55 still do bare sends that can block if the
caller cancels; change those two bare sends to use the same select pattern
(select on ctx.Done() and on sending the subscraping.Result to results) and only
increment s.errors when the send case succeeds; if you have a shared
cancellation-aware send helper, call that instead to keep behavior consistent
across sources.
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| default: | ||
| } |
There was a problem hiding this comment.
Good addition, but implementation is incomplete.
The context check correctly prevents starting a new pagination iteration after cancellation. However, the PR objectives explicitly state: "Use select with ctx.Done() when sending results into channels to avoid blocked sends after cancellation."
All channel sends in this function (lines 76, 85, 93-95, 114) still use direct send operations, which can block indefinitely if the receiver stops consuming due to context cancellation.
Apply this pattern to all channel sends:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
- s.errors++
+ select {
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
+ s.errors++
+ case <-ctx.Done():
+ return
+ }Similarly for lines 85, 93-95, and 114-115. For line 114-115, ensure the counter is only incremented when the send succeeds:
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}
- s.results++
+ select {
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}:
+ s.results++
+ case <-ctx.Done():
+ return
+ }Committable suggestion skipped: line range outside the PR's diff.
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case results <- subscraping.Result{Source: "sitedossier", Type: subscraping.Subdomain, Value: subdomain}: | ||
| s.results++ | ||
| } |
There was a problem hiding this comment.
Context-aware result send looks good, but error paths still ignore ctx cancellation
The new select around the subdomain send correctly avoids blocking when ctx is canceled and the consumer stops reading. However, the two error sends above (around Line 56 and Line 64) still use a plain results <- ..., so they can still block indefinitely after cancellation and leak this goroutine, which goes against the PR’s objective.
I’d mirror the same pattern for error emissions, e.g.:
@@
- if err != nil && !isnotfound {
- results <- subscraping.Result{Source: "sitedossier", Type: subscraping.Error, Error: err}
- s.errors++
- session.DiscardHTTPResponse(resp)
- return
- }
+ if err != nil && !isnotfound {
+ select {
+ case <-ctx.Done():
+ session.DiscardHTTPResponse(resp)
+ return
+ case results <- subscraping.Result{Source: "sitedossier", Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
+ session.DiscardHTTPResponse(resp)
+ return
+ }
@@
- body, err := io.ReadAll(resp.Body)
- if err != nil {
- results <- subscraping.Result{Source: "sitedossier", Type: subscraping.Error, Error: err}
- s.errors++
- session.DiscardHTTPResponse(resp)
- return
- }
+ body, err := io.ReadAll(resp.Body)
+ if err != nil {
+ select {
+ case <-ctx.Done():
+ session.DiscardHTTPResponse(resp)
+ return
+ case results <- subscraping.Result{Source: "sitedossier", Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
+ session.DiscardHTTPResponse(resp)
+ return
+ }(You could also factor this into a small helper to avoid repetition if this pattern appears in multiple sources.)
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/subscraping/sources/sitedossier/sitedossier.go around lines 56, 64 and
73-78, the error sends use a blocking results <- ... which can hang after ctx
cancellation; replace those plain sends with the same context-aware select used
for subdomain sends (select { case <-ctx.Done(): return; case results <-
subscraping.Result{...}: s.results++ }) so errors won't block when the consumer
stops, and optionally factor the pattern into a small helper like
sendResult(ctx, results, result) to avoid repetition.
| for _, subdomain := range data.Data { | ||
| results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain.Id} | ||
| s.results++ | ||
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain.Id}: | ||
| s.results++ | ||
| } |
There was a problem hiding this comment.
Make all sends to results context‑aware, not just subdomain emissions
The new select around subdomain emission is good: it avoids blocking on results when ctx is cancelled and only increments s.results on successful sends.
However, other sends in this function are still unconditional:
- HTTP error path (Lines 63-68)
resp.Body.Close()error in the defer (Lines 70-75)- JSON decode error (Lines 77-82)
If the caller cancels ctx and stops draining results, these sends can still block indefinitely, defeating the goal of context cancellation and potentially leaking this goroutine.
You can apply the same pattern you used for subdomain results to the error paths, for example:
resp, err := session.Get(ctx, url, "", map[string]string{"x-apikey": randomApiKey})
if err != nil {
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
- s.errors++
- session.DiscardHTTPResponse(resp)
- return
+ select {
+ case <-ctx.Done():
+ // context cancelled; just clean up and exit
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
+ session.DiscardHTTPResponse(resp)
+ return
}
defer func() {
if err := resp.Body.Close(); err != nil {
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
- s.errors++
+ select {
+ case <-ctx.Done():
+ return
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
}
}()
@@
var data response
err = jsoniter.NewDecoder(resp.Body).Decode(&data)
if err != nil {
- results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
- s.errors++
- return
+ select {
+ case <-ctx.Done():
+ return
+ case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}:
+ s.errors++
+ }
+ return
}This keeps behavior the same when ctx is active while ensuring the source fully respects cancellation and never blocks on results after ctx.Done().
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/subscraping/sources/virustotal/virustotal.go around lines 63-91, several
sends to the results channel (HTTP error path lines 63-68, resp.Body.Close()
error in the defer lines 70-75, JSON decode error lines 77-82) are unconditional
and can block if the caller cancels ctx; make them context-aware like the
subdomain send at lines 85-91 by replacing each direct send with a select that
first checks <-ctx.Done() (and returns / skips sending) and otherwise sends to
results, and only increment s.results in the case branch where the send
succeeds; apply the same pattern inside the defer (use a select that avoids
blocking if ctx is done) so no send can block after cancellation.
closes #1679
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.