Skip to content

Add request tracking to -stats flag#1699

Merged
Mzack9999 merged 3 commits intodevfrom
feature/request-tracking-stats
Jan 7, 2026
Merged

Add request tracking to -stats flag#1699
Mzack9999 merged 3 commits intodevfrom
feature/request-tracking-stats

Conversation

@dogancanbakir
Copy link
Copy Markdown
Member

@dogancanbakir dogancanbakir commented Jan 5, 2026

Summary

  • Adds Requests field to Statistics struct to track HTTP requests per source
  • Updates all 53 sources to count HTTP requests made during enumeration
  • Displays request count in the -stats output table

This helps users monitor API usage and debug quota consumption issues.

Example Output

 Source               Duration      Results   Requests     Errors
────────────────────────────────────────────────────────────────────
 virustotal           1.2s               15          3          0
 shodan               0.8s               42          2          0

Test plan

  • Run subfinder -d example.com -stats and verify request counts appear
  • Verify request counts match actual API calls for paginated sources

Closes #1698
Related to #1562

Summary by CodeRabbit

  • New Features
    • Added per-source request counting across all subscraping sources.
    • Statistics output now includes a Requests column alongside Results and Errors; table header and separator widened to align with the new column.

✏️ Tip: You can customize this high-level summary in your review settings.

Adds request counting to all sources to help users monitor API usage
and debug quota consumption issues like #1562.

Changes:
- Add Requests field to Statistics struct
- Track HTTP requests in all 53 sources
- Display request count in stats output

Closes #1698
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 5, 2026

Walkthrough

Adds per-source HTTP request tracking: a new Requests field on subscraping.Statistics, updates ~50 subscraping sources to count requests (some using atomics), and updates runner stats output to include a Requests column.

Changes

Cohort / File(s) Summary
Type Definitions
pkg/subscraping/types.go
Added Requests int to subscraping.Statistics.
Runner output
pkg/runner/stats.go
Updated header to include "Requests", extended separator length and adjusted column formatting for the new column.
Concurrency‑sensitive sources
pkg/subscraping/sources/...
\pkg/subscraping/sources/github/github.go`, `pkg/subscraping/sources/gitlab/gitlab.go`, `pkg/subscraping/sources/driftnet/driftnet.go``
Replaced or added atomic counters for errors/results/requests where concurrent updates occur (use atomic.Int32: Store, Add, Load) and surface values in Statistics().
Updated subscraping sources (bulk)
pkg/subscraping/sources/{alienvault,anubis,bevigil,bufferover,builtwith,c99,censys,certspotter,chaos,chinaz,commoncrawl,crtsh,digitalyama,digitorus,dnsdb,dnsdumpster,dnsrepo,domainsproject,facebook/ctlogs,fofa,fullhunt,merklemap,netlas,onyphe,profundis,pugrecon,quake,rapiddns,reconcloud,reconeer,redhuntlabs,riddler,rsecloud,securitytrails,shodan,sitedossier,thc,threatbook,threatcrowd,threatminer,virustotal,waybackarchive,whoisxmlapi,windvane,zoomeyeapi,...}/*.go`
Added unexported requests int (or atomic counter), initialize in Run (s.requests = 0 or Store(0)), increment per HTTP/API call (s.requests++ or Add(1)), and include Requests: s.requests in Statistics() return.
High‑attention files
pkg/subscraping/sources/gitlab/gitlab.go, pkg/subscraping/sources/github/github.go, pkg/subscraping/sources/rsecloud/rsecloud.go
gitlab/github introduce broader atomicization of counters (concurrency changes); review proper atomic usage and casts in Statistics(); rsecloud has denser diffs—verify initialization and consistent Statistics() ordering.

Sequence Diagram(s)

sequenceDiagram
  participant Runner
  participant Source
  participant HTTP
  Runner->>Source: Run()
  Source->>Source: s.requests = 0
  loop per request
    Source->>Source: s.requests++ (or atomic.Add)
    Source->>HTTP: perform HTTP/API call
    HTTP-->>Source: response
    Source->>Source: parse results, s.results++ / s.errors++ (atomic where used)
  end
  Source-->>Runner: Statistics{Errors, Results, Requests, TimeTaken, Skipped}
  Runner->>Runner: format line with Requests column
  Runner-->>Console: print stats table (includes Requests)
  note right of Runner `#D3F2E8`: New column added to overall output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through code and counted each call,
Requests now tick as I watch numbers fall,
Sources whisper totals, neat and small,
A rabbit’s tally lines them all,
Hooray — the stats now stand tall! 🥕📊

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Add request tracking to -stats flag' directly and accurately describes the main change: adding request tracking functionality that is displayed in the -stats flag output.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #1698: adds a Requests field to the Statistics struct, updates all 53 sources to track HTTP requests, and displays request counts in the -stats output table.
Out of Scope Changes check ✅ Passed All changes are directly related to request tracking: the types.go change adds the Requests field to Statistics, the runner/stats.go change updates the display output, and all source files consistently implement request counting and exposure via Statistics().

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @pkg/subscraping/sources/gitlab/gitlab.go:
- Around line 94-98: Concurrent goroutines incrementing s.requests in the
goroutine started in the anonymous func cause a data race; change the
Source.requests field to an int64 and import sync/atomic, replace the direct
increment s.requests++ with atomic.AddInt64(&s.requests, 1) inside the
goroutine, and update Statistics() (or any reader) to use
atomic.LoadInt64(&s.requests) so reads/writes are atomic and race-free.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (1)

60-115: Pre-existing issue: Page 1 requested twice when pagination is needed.

When ResultCount > pageSize, the pagination logic has a pre-existing inefficiency:

  1. Line 60-62: Initial request fetches page 1
  2. Line 79: Detects pagination is needed
  3. Line 81-115: Loop starts from page = 1 and re-requests page 1 at line 88
  4. The subdomains from the initial request (line 60-62) are never processed in the pagination branch

This results in an unnecessary duplicate request for page 1. The loop at line 81 should start from page = 2 instead, and the first request's subdomains should be processed before entering the pagination loop.

Note: This is a pre-existing bug not introduced by this PR. The request tracking feature correctly counts all requests, including the duplicate.

🔎 Suggested fix to eliminate duplicate page 1 request
 		session.DiscardHTTPResponse(resp)
+		
+		// Process subdomains from the first request
+		for _, subdomain := range response.Subdomains {
+			select {
+			case <-ctx.Done():
+				return
+			case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}:
+				s.results++
+			}
+		}
+		
 		if response.Metadata.ResultCount > pageSize {
 			totalPages := (response.Metadata.ResultCount + pageSize - 1) / pageSize
-			for page := 1; page <= totalPages; page++ {
+			for page := 2; page <= totalPages; page++ {
 				select {
 				case <-ctx.Done():
 					return
 				default:
 				}
 				getUrl = fmt.Sprintf("%s?domain=%s&page=%d&page_size=%d", baseUrl, domain, page, pageSize)
 				s.requests++
 				resp, err := session.Get(ctx, getUrl, "", requestHeaders)
 				if err != nil {
 					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("encountered error: %v; note: if you get a 'limit has been reached' error, head over to https://devportal.redhuntlabs.com", err)}
 					session.DiscardHTTPResponse(resp)
 					s.errors++
 					return
 				}
 
 				err = jsoniter.NewDecoder(resp.Body).Decode(&response)
 				if err != nil {
 					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
 					session.DiscardHTTPResponse(resp)
 					s.errors++
 					continue
 				}
 
 				session.DiscardHTTPResponse(resp)
 
 				for _, subdomain := range response.Subdomains {
 					select {
 					case <-ctx.Done():
 						return
 					case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}:
 						s.results++
 					}
 				}
 			}
-		} else {
-			for _, subdomain := range response.Subdomains {
-				select {
-				case <-ctx.Done():
-					return
-				case results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}:
-					s.results++
-				}
-			}
 		}
pkg/subscraping/sources/github/github.go (1)

152-237: Critical: Race condition on request counter.

The s.requests counter is incremented at line 173 inside a goroutine spawned at line 164. Multiple goroutines increment this counter concurrently without any synchronization (mutex or atomic operations), causing a data race.

Similarly, s.results is incremented at lines 202 and 220 within the same goroutines, creating additional race conditions.

🔎 Recommended fix using sync/atomic

Add atomic imports and change the counter types:

 import (
 	"bufio"
 	"context"
 	"fmt"
 	"net/http"
 	"net/url"
 	"regexp"
 	"strconv"
 	"strings"
 	"sync"
+	"sync/atomic"
 	"time"

Update the Source struct to use atomic-friendly types:

 type Source struct {
 	apiKeys   []string
 	timeTaken time.Duration
-	errors    int
-	results   int
-	requests  int
+	errors    int32
+	results   int32
+	requests  int32
 	skipped   bool
 }

Replace all increments with atomic operations:

-	s.requests++
+	atomic.AddInt32(&s.requests, 1)
-	s.results++
+	atomic.AddInt32(&s.results, 1)

Update Statistics() to return atomic values:

 func (s *Source) Statistics() subscraping.Statistics {
 	return subscraping.Statistics {
-		Errors:    s.errors,
-		Results:   s.results,
-		Requests:  s.requests,
+		Errors:    int(atomic.LoadInt32(&s.errors)),
+		Results:   int(atomic.LoadInt32(&s.results)),
+		Requests:  int(atomic.LoadInt32(&s.requests)),
 		TimeTaken: s.timeTaken,
 		Skipped:   s.skipped,
 	}
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1977c0 and e96320e.

📒 Files selected for processing (54)
  • pkg/runner/stats.go
  • pkg/subscraping/sources/alienvault/alienvault.go
  • pkg/subscraping/sources/anubis/anubis.go
  • pkg/subscraping/sources/bevigil/bevigil.go
  • pkg/subscraping/sources/bufferover/bufferover.go
  • pkg/subscraping/sources/builtwith/builtwith.go
  • pkg/subscraping/sources/c99/c99.go
  • pkg/subscraping/sources/censys/censys.go
  • pkg/subscraping/sources/certspotter/certspotter.go
  • pkg/subscraping/sources/chaos/chaos.go
  • pkg/subscraping/sources/chinaz/chinaz.go
  • pkg/subscraping/sources/commoncrawl/commoncrawl.go
  • pkg/subscraping/sources/crtsh/crtsh.go
  • pkg/subscraping/sources/digitalyama/digitalyama.go
  • pkg/subscraping/sources/digitorus/digitorus.go
  • pkg/subscraping/sources/dnsdb/dnsdb.go
  • pkg/subscraping/sources/dnsdumpster/dnsdumpster.go
  • pkg/subscraping/sources/dnsrepo/dnsrepo.go
  • pkg/subscraping/sources/domainsproject/domainsproject.go
  • pkg/subscraping/sources/driftnet/driftnet.go
  • pkg/subscraping/sources/facebook/ctlogs.go
  • pkg/subscraping/sources/fofa/fofa.go
  • pkg/subscraping/sources/fullhunt/fullhunt.go
  • pkg/subscraping/sources/github/github.go
  • pkg/subscraping/sources/gitlab/gitlab.go
  • pkg/subscraping/sources/hackertarget/hackertarget.go
  • pkg/subscraping/sources/hudsonrock/hudsonrock.go
  • pkg/subscraping/sources/intelx/intelx.go
  • pkg/subscraping/sources/leakix/leakix.go
  • pkg/subscraping/sources/merklemap/merklemap.go
  • pkg/subscraping/sources/netlas/netlas.go
  • pkg/subscraping/sources/onyphe/onyphe.go
  • pkg/subscraping/sources/profundis/profundis.go
  • pkg/subscraping/sources/pugrecon/pugrecon.go
  • pkg/subscraping/sources/quake/quake.go
  • pkg/subscraping/sources/rapiddns/rapiddns.go
  • pkg/subscraping/sources/reconcloud/reconcloud.go
  • pkg/subscraping/sources/reconeer/reconeer.go
  • pkg/subscraping/sources/redhuntlabs/redhuntlabs.go
  • pkg/subscraping/sources/riddler/riddler.go
  • pkg/subscraping/sources/rsecloud/rsecloud.go
  • pkg/subscraping/sources/securitytrails/securitytrails.go
  • pkg/subscraping/sources/shodan/shodan.go
  • pkg/subscraping/sources/sitedossier/sitedossier.go
  • pkg/subscraping/sources/thc/thc.go
  • pkg/subscraping/sources/threatbook/threatbook.go
  • pkg/subscraping/sources/threatcrowd/threatcrowd.go
  • pkg/subscraping/sources/threatminer/threatminer.go
  • pkg/subscraping/sources/virustotal/virustotal.go
  • pkg/subscraping/sources/waybackarchive/waybackarchive.go
  • pkg/subscraping/sources/whoisxmlapi/whoisxmlapi.go
  • pkg/subscraping/sources/windvane/windvane.go
  • pkg/subscraping/sources/zoomeyeapi/zoomeyeapi.go
  • pkg/subscraping/types.go
🧰 Additional context used
🧬 Code graph analysis (13)
pkg/subscraping/sources/reconcloud/reconcloud.go (11)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/riddler/riddler.go (2)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/threatcrowd/threatcrowd.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/github/github.go (1)
  • Source (41-48)
pkg/subscraping/sources/hackertarget/hackertarget.go (1)
  • Source (14-21)
pkg/subscraping/sources/intelx/intelx.go (1)
  • Source (41-48)
pkg/subscraping/sources/securitytrails/securitytrails.go (1)
  • Source (29-36)
pkg/subscraping/sources/shodan/shodan.go (1)
  • Source (15-22)
pkg/subscraping/sources/sitedossier/sitedossier.go (1)
  • Source (22-27)
pkg/subscraping/sources/threatminer/threatminer.go (1)
  • Source (21-26)
pkg/subscraping/sources/virustotal/virustotal.go (1)
  • Source (28-35)
pkg/subscraping/sources/waybackarchive/waybackarchive.go (1)
  • Source (16-21)
pkg/subscraping/sources/netlas/netlas.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/bevigil/bevigil.go (1)
  • Source (19-26)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/builtwith/builtwith.go (1)
  • Source (33-40)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/anubis/anubis.go (18)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/bevigil/bevigil.go (1)
  • Source (19-26)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/builtwith/builtwith.go (1)
  • Source (33-40)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/dnsrepo/dnsrepo.go (1)
  • Source (15-22)
pkg/subscraping/types.go (2)
  • Session (72-79)
  • Result (82-87)
pkg/resolve/resolve.go (1)
  • Result (35-42)
pkg/subscraping/sources/fullhunt/fullhunt.go (11)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/onyphe/onyphe.go (1)
  • Result (25-32)
pkg/subscraping/sources/crtsh/crtsh.go (2)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/sitedossier/sitedossier.go (1)
  • Source (22-27)
pkg/subscraping/sources/bevigil/bevigil.go (5)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/bevigil/bevigil.go (1)
  • Source (19-26)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/builtwith/builtwith.go (1)
  • Source (33-40)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/reconeer/reconeer.go (4)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/github/github.go (1)
  • Source (41-48)
pkg/subscraping/sources/hackertarget/hackertarget.go (1)
  • Source (14-21)
pkg/subscraping/sources/intelx/intelx.go (1)
  • Source (41-48)
pkg/subscraping/sources/securitytrails/securitytrails.go (1)
  • Source (29-36)
pkg/subscraping/sources/shodan/shodan.go (1)
  • Source (15-22)
pkg/subscraping/sources/sitedossier/sitedossier.go (1)
  • Source (22-27)
pkg/subscraping/sources/threatcrowd/threatcrowd.go (1)
  • Source (22-27)
pkg/subscraping/sources/threatminer/threatminer.go (1)
  • Source (21-26)
pkg/subscraping/sources/virustotal/virustotal.go (1)
  • Source (28-35)
pkg/subscraping/sources/waybackarchive/waybackarchive.go (1)
  • Source (16-21)
pkg/subscraping/sources/hudsonrock/hudsonrock.go (6)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/digitorus/digitorus.go (12)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/dnsrepo/dnsrepo.go (1)
  • Source (15-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Builds (ubuntu-latest)
  • GitHub Check: Test Builds (windows-latest)
  • GitHub Check: Test Builds (macOS-latest)
  • GitHub Check: release-test
  • GitHub Check: Analyze (go)
🔇 Additional comments (64)
pkg/subscraping/sources/quake/quake.go (1)

39-39: LGTM! Request tracking implementation is correct.

The request counter is properly declared, initialized, incremented before each API call (including paginated requests), and exposed via Statistics().

Also applies to: 48-48, 74-74, 159-159

pkg/subscraping/sources/rapiddns/rapiddns.go (1)

22-22: LGTM! Request tracking correctly handles pagination.

The implementation correctly increments the counter for each page request in the pagination loop, ensuring accurate tracking across all API calls.

Also applies to: 30-30, 46-46, 121-121

pkg/subscraping/sources/threatbook/threatbook.go (1)

33-33: LGTM! Request tracking implementation is correct.

The counter is properly initialized, incremented before the API call, and included in the statistics output.

Also applies to: 42-42, 56-56, 133-133

pkg/subscraping/sources/waybackarchive/waybackarchive.go (1)

20-20: LGTM! Request tracking implementation is correct.

The implementation follows the established pattern: field declaration, initialization, increment before the HTTP call, and statistics reporting.

Also applies to: 28-28, 36-36, 103-103

pkg/subscraping/sources/certspotter/certspotter.go (1)

25-25: LGTM! Request tracking correctly handles both initial and paginated requests.

The implementation properly increments the counter for both the initial issuances call (Line 51) and subsequent paginated requests (Line 94), ensuring accurate tracking across all API interactions.

Also applies to: 34-34, 51-51, 94-94, 159-159

pkg/subscraping/sources/intelx/intelx.go (1)

46-46: LGTM! Request tracking correctly implemented.

The implementation properly tracks all HTTP requests:

  • Initial POST request (line 90) to initiate the search
  • Subsequent GET requests (line 118) in the polling loop

The use of a plain int counter is appropriate since Run() uses a single goroutine.

Also applies to: 60-60, 90-90, 118-118, 186-186

pkg/subscraping/sources/zoomeyeapi/zoomeyeapi.go (1)

30-30: LGTM! Paginated request tracking implemented correctly.

The request counter is properly incremented inside the pagination loop (line 74), ensuring all page requests are tracked. The counter is reset at the start of Run() and correctly exposed via Statistics().

Also applies to: 39-39, 74-74, 138-138

pkg/subscraping/sources/driftnet/driftnet.go (1)

31-31: LGTM! Concurrent request tracking with proper synchronization.

Excellent use of atomic.Int32 for the request counter. Since Run() spawns multiple goroutines (one per endpoint at lines 86-88), the atomic operations (Store, Add, Load) correctly prevent race conditions when tracking concurrent requests.

Also applies to: 69-69, 145-145, 124-124

pkg/subscraping/sources/domainsproject/domainsproject.go (1)

21-21: LGTM! Straightforward request tracking implementation.

The implementation correctly tracks the single API request made per Run() invocation. The counter is incremented before the HTTP request (line 55) and properly exposed via Statistics().

Also applies to: 40-40, 55-55, 137-137

pkg/subscraping/sources/threatminer/threatminer.go (1)

25-25: LGTM! Clean request tracking implementation.

The implementation correctly tracks the single API request made per Run() invocation. The counter is incremented before the HTTP call (line 41) and properly included in the Statistics() return value.

Also applies to: 33-33, 41-41, 99-99

pkg/subscraping/sources/hudsonrock/hudsonrock.go (1)

29-29: LGTM! Request tracking correctly implemented.

The request tracking follows the established pattern consistently:

  • Field added to struct
  • Initialized at Run() start
  • Incremented before the HTTP call
  • Exposed via Statistics()

The placement before the API call (line 45) correctly tracks attempted requests, which is appropriate for quota monitoring.

Also applies to: 37-37, 45-45, 105-105

pkg/subscraping/sources/shodan/shodan.go (1)

20-20: LGTM! Request tracking correctly handles pagination.

The implementation properly tracks each paginated request:

  • Counter incremented inside the pagination loop (line 60)
  • Each page fetch is counted individually
  • Early exit when API key is missing (lines 46-49) correctly avoids incrementing

This ensures accurate request counts for quota tracking even with multiple pages.

Also applies to: 37-37, 60-60, 133-133

pkg/subscraping/sources/leakix/leakix.go (1)

19-19: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern consistently:

  • Field initialization and increment placement are correct
  • Request counted before the HTTP call (line 45)
  • Properly integrated with optional API key handling

Also applies to: 28-28, 45-45, 106-106

pkg/subscraping/sources/whoisxmlapi/whoisxmlapi.go (1)

36-36: LGTM! Request tracking correctly implemented.

The implementation properly handles the required API key check:

  • Early exit when no API key (lines 54-56) occurs before the request counter increment
  • Counter incremented immediately before the HTTP call (line 59)
  • Follows the consistent pattern across all sources

Also applies to: 45-45, 59-59, 119-119

pkg/subscraping/sources/alienvault/alienvault.go (1)

26-26: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern:

  • API key validation (lines 44-48) correctly exits before incrementing when key is missing
  • Request counter incremented before the HTTP call (line 50)
  • Consistent with all other sources in this PR

Excellent consistency across all reviewed sources - the uniform implementation pattern will aid maintainability.

Also applies to: 36-36, 50-50, 116-116

pkg/subscraping/types.go (1)

29-35: LGTM! Clean addition to Statistics struct.

The Requests field is correctly added to track HTTP request counts per source. The field type and naming are appropriate.

pkg/subscraping/sources/hackertarget/hackertarget.go (1)

19-19: LGTM! Request tracking correctly implemented.

The implementation follows a clean pattern:

  • Adds requests field to track counter
  • Initializes to 0 at Run start
  • Increments before the HTTP request
  • Exposes via Statistics()

The counter increments before the actual HTTP call, which correctly tracks attempted requests (useful for quota monitoring even if requests fail).

Also applies to: 28-28, 45-45, 115-115

pkg/subscraping/sources/onyphe/onyphe.go (1)

39-39: LGTM! Pagination correctly handled.

The request tracking is correctly implemented for this paginated source:

  • Counter increments inside the pagination loop (lines 67-134)
  • Each page request is tracked separately
  • Total requests accurately reflect all API calls made during enumeration

This is the correct pattern for sources that make multiple requests.

Also applies to: 48-48, 78-78, 165-165

pkg/subscraping/sources/facebook/ctlogs.go (1)

81-81: LGTM! Request tracking correctly implemented for paginated source.

The implementation properly handles pagination:

  • Counter increments inside the pagination loop (lines 107-148)
  • Each page request is tracked
  • The FetchAccessToken HTTP call (line 43) is intentionally not counted, which is correct since it occurs during initialization via AddApiKeys, not during the enumeration phase

Also applies to: 90-90, 113-113, 198-198

pkg/subscraping/sources/builtwith/builtwith.go (1)

38-38: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern consistently:

  • Adds tracking field and initializes at Run start
  • Increments before the HTTP request (line 61)
  • Properly exposes the counter via Statistics()

Also applies to: 47-47, 60-60, 118-118

pkg/subscraping/sources/bufferover/bufferover.go (1)

30-30: LGTM! Request tracking implementation is correct.

The request counter follows the established pattern consistently:

  • Field added to Source struct
  • Initialized in Run()
  • Incremented before the HTTP call in getData()
  • Exposed via Statistics()

This correctly tracks request attempts regardless of success or failure.

Also applies to: 39-39, 60-60, 137-137

pkg/subscraping/sources/threatcrowd/threatcrowd.go (1)

26-26: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern:

  • Counter field added to Source struct
  • Properly initialized at the start of Run()
  • Incremented immediately before the HTTP request
  • Exposed through Statistics()

The placement at line 50 (before session.Client.Do) correctly tracks the request attempt.

Also applies to: 34-34, 50-50, 128-128

pkg/subscraping/sources/fofa/fofa.go (1)

30-30: LGTM! Request tracking implementation is correct.

The implementation is consistent with the PR-wide pattern:

  • Counter field properly added to Source struct
  • Initialized to 0 at Run() start
  • Incremented before the API call at line 60
  • Returned through Statistics()

The single request tracks correctly with the bulk retrieval approach (size=10000 on line 61).

Also applies to: 44-44, 60-60, 137-137

pkg/subscraping/sources/chinaz/chinaz.go (1)

20-20: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern consistently:

  • Counter field added to Source struct
  • Properly initialized in Run()
  • Incremented before the HTTP call (line 43)
  • Exposed via Statistics()

All changes align with the PR objectives for tracking API requests.

Also applies to: 29-29, 43-43, 109-109

pkg/subscraping/sources/sitedossier/sitedossier.go (1)

26-26: LGTM! Request tracking correctly handles pagination.

The implementation properly tracks requests in a paginated scenario:

  • Counter field added to Source struct
  • Initialized in Run()
  • Incremented at the start of enumerate() before each HTTP call
  • Correctly counts multiple requests when pagination occurs (recursive enumerate at line 86)
  • Exposed through Statistics()

This accurately tracks the total number of API requests across all pages.

Also applies to: 34-34, 55-55, 116-116

pkg/subscraping/sources/thc/thc.go (1)

27-27: LGTM! Request tracking implementation is correct.

The requests counter is properly declared, initialized, incremented before each API call, and exposed via Statistics(). The placement of the increment at line 68 correctly tracks each paginated request.

Also applies to: 42-42, 68-68, 129-129

pkg/subscraping/sources/windvane/windvane.go (1)

41-41: LGTM! Request tracking implementation is correct.

The requests counter is properly declared, initialized, incremented before each paginated API call, and exposed via Statistics().

Also applies to: 49-49, 77-77, 156-156

pkg/subscraping/sources/netlas/netlas.go (1)

43-43: LGTM! Request tracking implementation is correct.

The requests counter properly tracks both API calls (count query and bulk download). The increments at lines 68 and 126 are correctly placed before their respective HTTP requests.

Also applies to: 51-51, 68-68, 126-126, 203-203

pkg/subscraping/sources/gitlab/gitlab.go (1)

25-25: Sequential request tracking is correct, but see concurrency issue at line 97.

The field declaration, initialization, sequential increment at line 71, and statistics exposure are all implemented correctly. However, the concurrent increment at line 97 requires fixing.

Also applies to: 41-41, 71-71, 180-180

pkg/runner/stats.go (1)

27-27: LGTM! Statistics output formatting is correct.

The new "Requests" column is properly integrated into both the format string and header. The column width (10 chars) and separator length (68 chars) are consistent with the overall layout.

Also applies to: 32-32

pkg/subscraping/sources/pugrecon/pugrecon.go (1)

35-35: LGTM! Request tracking correctly implemented.

The request counting follows the correct pattern: field declaration, initialization, increment before the HTTP call, and exposure via Statistics(). The implementation accurately tracks the single API request made to the pugrecon service.

Also applies to: 44-44, 76-76, 156-156

pkg/subscraping/sources/virustotal/virustotal.go (1)

33-33: LGTM! Pagination-aware request tracking correctly implemented.

The implementation correctly increments the request counter inside the pagination loop (line 65), ensuring accurate tracking when VirusTotal returns multiple pages of results. This is essential for monitoring actual API quota consumption.

Also applies to: 42-42, 65-65, 131-131

pkg/subscraping/sources/reconeer/reconeer.go (1)

25-25: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern and accurately tracks the single API request to the Reconeer service.

Also applies to: 33-33, 51-51, 112-112

pkg/subscraping/sources/riddler/riddler.go (1)

18-18: LGTM! Request tracking correctly implemented.

The implementation follows the established pattern and accurately tracks the single HTTP request to the Riddler service.

Also applies to: 26-26, 34-34, 97-97

pkg/subscraping/sources/chaos/chaos.go (1)

19-19: The implementation is correct. The Chaos API's GetSubdomains endpoint makes a single HTTP request that returns all subdomains in one response with no pagination. Incrementing the request counter once before calling GetSubdomains() accurately reflects the actual API usage.

Likely an incorrect or invalid review comment.

pkg/subscraping/sources/digitorus/digitorus.go (1)

21-21: LGTM! Request tracking correctly implemented.

The request counter is properly added to the Source struct, initialized at the start of Run, incremented before the HTTP call, and exposed via Statistics(). The implementation follows the consistent pattern across all sources.

Also applies to: 29-29, 37-37, 105-105

pkg/subscraping/sources/censys/censys.go (1)

78-78: LGTM! Request tracking correctly handles pagination.

The request counter is properly implemented with correct pagination handling. Each HTTP request in the pagination loop (up to maxCensysPages) is counted by incrementing s.requests at line 141 before the API call at line 142. This ensures accurate tracking of all API requests made to Censys.

Also applies to: 87-87, 141-141, 226-226

pkg/subscraping/sources/crtsh/crtsh.go (1)

32-32: LGTM! Request tracking correctly distinguishes HTTP from database queries.

The implementation correctly counts only HTTP requests. The getSubdomainsFromSQL method (which performs direct database queries via PostgreSQL connection) does not increment the counter, while getSubdomainsFromHTTP appropriately increments at line 150 before the HTTP call at line 151. This properly reflects the feature's intent to track HTTP/API requests.

Also applies to: 40-40, 150-150, 218-218

pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)

28-28: LGTM! Request tracking correctly implemented.

The request counter is properly integrated following the standard pattern: field added to Source struct, initialized at Run start, incremented before the HTTP call at line 52, and exposed via Statistics(). Implementation is correct and consistent with other sources.

Also applies to: 37-37, 51-51, 109-109

pkg/subscraping/sources/reconcloud/reconcloud.go (1)

32-32: LGTM! Request tracking correctly implemented.

The request counter follows the consistent pattern across all sources: added to Source struct, initialized at the start of Run, incremented before the HTTP call at line 49, and properly exposed via Statistics(). The implementation is correct.

Also applies to: 40-40, 48-48, 108-108

pkg/subscraping/sources/securitytrails/securitytrails.go (4)

34-34: LGTM! Request tracking field added.

The requests field is correctly added to track HTTP requests made by this source.


43-43: LGTM! Request counter initialized.

The counter is properly reset at the start of each run.


71-71: LGTM! All HTTP requests tracked correctly.

The implementation correctly increments the request counter before:

  • The initial POST request (line 71)
  • Each pagination GET request (line 75)
  • The retry GET request on 403 errors (line 80)

This ensures accurate tracking of all API calls made during enumeration.

Also applies to: 75-75, 80-80


164-164: LGTM! Request metric exposed in statistics.

The Requests field is correctly included in the statistics output.

pkg/subscraping/sources/profundis/profundis.go (1)

21-21: LGTM! Request tracking implemented correctly.

The implementation follows the correct pattern:

  • Field declared in struct (line 21)
  • Counter initialized at run start (line 30)
  • Incremented before the HTTP POST request (line 57)
  • Exposed via Statistics() (line 129)

Also applies to: 30-30, 57-57, 129-129

pkg/subscraping/sources/anubis/anubis.go (1)

20-20: LGTM! Request tracking implemented correctly.

The implementation follows the correct pattern:

  • Field declared in struct (line 20)
  • Counter initialized at run start (line 28)
  • Incremented before the HTTP GET request (line 36)
  • Exposed via Statistics() (line 100)

Also applies to: 28-28, 36-36, 100-100

pkg/subscraping/sources/c99/c99.go (1)

21-21: LGTM! Request tracking implemented correctly.

The implementation follows the correct pattern:

  • Field declared in struct (line 21)
  • Counter initialized at run start (line 40)
  • Incremented before the HTTP GET request (line 55)
  • Exposed via Statistics() (line 125)

Also applies to: 40-40, 55-55, 125-125

pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (1)

31-31: LGTM! Request tracking implemented correctly.

The implementation follows the correct pattern:

  • Field declared in struct (line 31)
  • Counter initialized at run start (line 39)
  • Incremented before the initial request (line 61) and each pagination request (line 88)
  • Exposed via Statistics() (line 157)

The request tracking accurately counts all HTTP calls made during enumeration, including pagination.

Also applies to: 39-39, 61-61, 88-88, 157-157

pkg/subscraping/sources/commoncrawl/commoncrawl.go (4)

35-35: LGTM: Request tracking field added.

The requests field is correctly added to the Source struct, following the same pattern as other sources in the codebase.


43-51: LGTM: Request counter properly initialized and incremented.

The counter is correctly initialized to 0 at the start of the Run method and incremented before the initial HTTP request to the index URL.


135-136: LGTM: Per-subdomain request tracking implemented correctly.

The request counter is correctly incremented before each HTTP call in the getSubdomains loop, ensuring all API requests are tracked.


119-126: LGTM: Statistics properly exposes request count.

The Statistics method correctly includes the Requests field, making the request count available for the -stats output.

pkg/subscraping/sources/fullhunt/fullhunt.go (1)

26-26: LGTM: Request tracking correctly implemented.

The request tracking implementation is correct:

  • Field added to Source struct (line 26)
  • Initialized at Run start (line 34)
  • Incremented before the HTTP request (line 48)
  • Exposed via Statistics (line 104)

Also applies to: 34-34, 48-48, 104-104

pkg/subscraping/sources/dnsdb/dnsdb.go (3)

46-46: LGTM: Request tracking properly initialized.

The request counter is correctly added to the struct, initialized, and incremented before the getMaxOffset call (which internally makes an HTTP request to the rate_limit endpoint).

Also applies to: 55-55, 75-76


98-99: LGTM: Paginated requests properly tracked.

The request counter is correctly incremented for each paginated request in the loop.


198-206: LGTM: Statistics properly exposes request count.

The Statistics method correctly includes the Requests field in the returned structure.

pkg/subscraping/sources/dnsrepo/dnsrepo.go (1)

20-20: LGTM: Request tracking correctly implemented.

The request tracking is properly implemented across all necessary points:

  • Field declaration (line 20)
  • Initialization (line 32)
  • Increment before HTTP request (line 55)
  • Exposure via Statistics (line 118)

Also applies to: 32-32, 55-56, 118-118

pkg/subscraping/sources/github/github.go (3)

46-46: LGTM: Request counter field and initialization.

The requests field is properly added to the Source struct and initialized at the start of the Run method.

Also applies to: 55-55


91-92: LGTM: Initial search request tracked correctly.

The request counter is correctly incremented before the initial GitHub search API call.


280-288: Statistics method structure is correct, but affected by race condition.

While the Statistics method properly includes the Requests field, the value will be inaccurate due to the race condition flagged in the previous comment.

pkg/subscraping/sources/digitalyama/digitalyama.go (1)

19-19: LGTM!

The request tracking implementation follows the established pattern across all source files:

  • Field added to struct
  • Initialization in Run
  • Increment before the HTTP request
  • Exposure via Statistics()

Also applies to: 38-38, 53-53, 137-137

pkg/subscraping/sources/bevigil/bevigil.go (1)

24-24: LGTM!

Request tracking implementation is consistent with the codebase pattern and correctly placed before the HTTP request.

Also applies to: 32-32, 48-48, 112-112

pkg/subscraping/sources/merklemap/merklemap.go (1)

24-24: LGTM!

Request tracking correctly handles pagination by incrementing in fetchPage, ensuring each page request is counted. Implementation is consistent with the codebase pattern.

Also applies to: 33-33, 102-102, 159-159

pkg/subscraping/sources/rsecloud/rsecloud.go (1)

27-27: LGTM!

Request tracking correctly handles the dual-endpoint pagination pattern. The increment inside fetchSubdomains properly counts all requests across both "active" and "passive" endpoints and their respective pages.

Also applies to: 36-36, 60-60, 128-128

use atomic.Int32 for counter fields to prevent data races when
multiple goroutines concurrently increment requests/results/errors
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/subscraping/sources/github/github.go (1)

154-238: Incomplete error counting in concurrent item processing.

When multiple goroutines in proccesItems encounter errors (line 180), they're collected in errChan, but only the first error is returned (lines 231-235) and counted in statistics (lines 127-129). If several items fail concurrently, the error count will be underreported.

Consider incrementing the error counter atomically within the goroutine when an error occurs, similar to how results are tracked.

🔎 Suggested fix to track all concurrent errors
 			resp, err := session.SimpleGet(ctx, rawURL(responseItem.HTMLURL))
 			if err != nil {
+				s.errors.Add(1)
 				if resp != nil && resp.StatusCode != http.StatusNotFound {
 					session.DiscardHTTPResponse(resp)
 				}
-				errChan <- err
 				return
 			}

Then you can simplify the error channel handling or remove it if it's no longer needed for propagation.

🧹 Nitpick comments (1)
pkg/subscraping/sources/gitlab/gitlab.go (1)

95-126: Pre-existing bug: defer wg.Done() placement can cause deadlock.

The defer wg.Done() at line 124 is registered after potential early returns (e.g., line 106). If the goroutine returns before reaching line 124, wg.Done() is never called, causing wg.Wait() at line 147 to hang indefinitely.

Move the defer to the beginning of the goroutine to ensure it's always registered.

🔎 Proposed fix
 	for _, it := range items {
 		go func(item item) {
+			defer wg.Done()
 			// The original item.Path causes 404 error because the Gitlab API is expecting the url encoded path
 			fileUrl := fmt.Sprintf("https://gitlab.com/api/v4/projects/%d/repository/files/%s/raw?ref=%s", item.ProjectId, url.QueryEscape(item.Path), item.Ref)
 			s.requests.Add(1)
 			resp, err := session.Get(ctx, fileUrl, "", headers)
 			if err != nil {
 				if resp == nil || (resp != nil && resp.StatusCode != http.StatusNotFound) {
 					session.DiscardHTTPResponse(resp)

 					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
 					s.errors.Add(1)
 					return
 				}
 			}

 			if resp.StatusCode == http.StatusOK {
 				scanner := bufio.NewScanner(resp.Body)
 				for scanner.Scan() {
 					line := scanner.Text()
 					if line == "" {
 						continue
 					}
 					for _, subdomain := range domainRegexp.FindAllString(line, -1) {
 						results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}
 						s.results.Add(1)
 					}
 				}
 				session.DiscardHTTPResponse(resp)
 			}
-			defer wg.Done()
 		}(it)
 	}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e96320e and 9b47052.

📒 Files selected for processing (2)
  • pkg/subscraping/sources/github/github.go
  • pkg/subscraping/sources/gitlab/gitlab.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/subscraping/sources/github/github.go (1)
pkg/subscraping/sources/gitlab/gitlab.go (1)
  • Source (21-28)
pkg/subscraping/sources/gitlab/gitlab.go (1)
pkg/subscraping/sources/github/github.go (1)
  • Source (42-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test Builds (macOS-latest)
  • GitHub Check: Test Builds (ubuntu-latest)
  • GitHub Check: Test Builds (windows-latest)
  • GitHub Check: Analyze (go)
🔇 Additional comments (9)
pkg/subscraping/sources/gitlab/gitlab.go (4)

21-28: LGTM! Atomic counters properly address the race condition.

The switch from primitive int to atomic.Int32 for errors, results, and requests fields correctly resolves the data race flagged in the previous review. This is consistent with the GitHub source implementation.


40-42: LGTM!

Proper initialization of atomic counters using Store(0) ensures clean state at the start of each run.


72-76: LGTM!

The request counter is correctly incremented before the API call, and error counter is updated atomically on failure.


177-185: LGTM!

The Statistics() method correctly uses Load() to safely read the atomic counter values.

pkg/subscraping/sources/github/github.go (5)

15-15: LGTM! Atomic counters prevent race conditions.

The introduction of sync/atomic and the conversion of counter fields to atomic.Int32 correctly addresses the race condition with concurrent goroutines processing items (spawned at line 165).

Also applies to: 45-47


54-56: LGTM! Counter initialization is correct.

Using Store(0) properly resets all atomic counters at the start of each enumeration run.


92-92: LGTM! Request tracking is correctly implemented.

Both the initial GitHub search request and per-item raw file fetches are properly tracked with atomic increments before the HTTP calls.

Also applies to: 174-174


203-203: LGTM! Results tracking is thread-safe and accurate.

The atomic increments correctly track each subdomain result found, both from file content and text matches, with proper synchronization for concurrent processing.

Also applies to: 221-221


281-289: LGTM! Statistics method correctly reports atomic counter values.

The method properly uses Load() to safely read the atomic counters and returns the complete statistics including the new Requests field.

Copy link
Copy Markdown
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

$ go run . -d scanme.sh -stats  

 ...
[INF] Enumerating subdomains for scanme.sh
honey.scanme.sh
demo.scanme.sh
[INF] Found 2 subdomains for scanme.sh in 439 milliseconds 62 microseconds
[INF] Printing source statistics for scanme.sh

 Source               Duration      Results   Requests     Errors
────────────────────────────────────────────────────────────────────
 anubis               437ms               0          1          0
 builtwith            0s                  0          0          0
 crtsh                366ms               2          0          0
 digitorus            111ms               0          1          0
 driftnet             33ms                0          4          4
 leakix               210ms               0          1          1
 thc                  383ms               0          1          0
 virustotal           0s                  0          0          0
...

@Mzack9999 Mzack9999 merged commit 4984bfb into dev Jan 7, 2026
6 of 10 checks passed
@Mzack9999 Mzack9999 deleted the feature/request-tracking-stats branch January 7, 2026 14:10
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/subscraping/sources/gitlab/gitlab.go (1)

95-126: Deadlock: defer wg.Done() placed after early return.

The defer wg.Done() at line 124 is never reached if the goroutine returns early at line 106. This causes wg.Wait() at line 147 to block indefinitely, resulting in a deadlock when any file fetch fails with a non-404 error.

🐛 Proposed fix: move defer to start of goroutine
 	for _, it := range items {
 		go func(item item) {
+			defer wg.Done()
 			// The original item.Path causes 404 error because the Gitlab API is expecting the url encoded path
 			fileUrl := fmt.Sprintf("https://gitlab.com/api/v4/projects/%d/repository/files/%s/raw?ref=%s", item.ProjectId, url.QueryEscape(item.Path), item.Ref)
 			s.requests.Add(1)
 			resp, err := session.Get(ctx, fileUrl, "", headers)
 			if err != nil {
 				if resp == nil || (resp != nil && resp.StatusCode != http.StatusNotFound) {
 					session.DiscardHTTPResponse(resp)

 					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
 					s.errors.Add(1)
 					return
 				}
 			}

 			if resp.StatusCode == http.StatusOK {
 				scanner := bufio.NewScanner(resp.Body)
 				for scanner.Scan() {
 					line := scanner.Text()
 					if line == "" {
 						continue
 					}
 					for _, subdomain := range domainRegexp.FindAllString(line, -1) {
 						results <- subscraping.Result{Source: s.Name(), Type: subscraping.Subdomain, Value: subdomain}
 						s.results.Add(1)
 					}
 				}
 				session.DiscardHTTPResponse(resp)
 			}
-			defer wg.Done()
 		}(it)
 	}
pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (1)

79-88: Pre-existing pagination issue: page 1 fetched twice.

The initial request (line 60-62) fetches page 1, but when pagination is needed (line 79), the loop starts at page := 1 (line 81), causing page 1 to be fetched again at line 89. This wastes an API call and may confuse users monitoring request counts.

The request tracking added by this PR is correct and accurately counts the duplicate request, but the underlying pagination logic should be fixed.

♻️ Suggested fix for pagination logic

Option 1: Start pagination loop from page 2 (preferred if initial request subdomains are already processed):

 	session.DiscardHTTPResponse(resp)
 	if response.Metadata.ResultCount > pageSize {
 		totalPages := (response.Metadata.ResultCount + pageSize - 1) / pageSize
-		for page := 1; page <= totalPages; page++ {
+		for page := 2; page <= totalPages; page++ {
 			select {
 			case <-ctx.Done():
 				return

Option 2: Skip initial request if pagination is needed (fetch all pages in loop):

 	session.DiscardHTTPResponse(resp)
+
+	// Process initial page response only if no pagination needed
 	if response.Metadata.ResultCount > pageSize {
+		// Fetch all pages via pagination
 		totalPages := (response.Metadata.ResultCount + pageSize - 1) / pageSize
 		for page := 1; page <= totalPages; page++ {
 			// ... pagination logic
 		}
 	} else {
+		// Process subdomains from initial request
 		for _, subdomain := range response.Subdomains {
🤖 Fix all issues with AI agents
In @pkg/subscraping/sources/c99/c99.go:
- Line 21: The fields requests, errors, and results in the c99 source struct
should be changed from plain int to atomic.Int32 (like
gitlab.go/github.go/driftnet.go) to make stats thread-safe: replace their
declarations with atomic.Int32, import sync/atomic (or use the atomic package
type), increment them using Add(1) in Run(), and read them using Load() in
Statistics(); ensure zero-value initialization is fine and update any casts when
returning numbers to match the Statistics() return types.

In @pkg/subscraping/sources/fofa/fofa.go:
- Line 30: Replace the plain int field named requests with atomic.Int32 on the
struct (use sync/atomic's atomic.Int32 type), update all increments in Run() (or
any goroutine) to use requests.Add(1) / atomic.AddInt32(&requests, 1) style, and
change any reads (e.g., in Statistics()) to use requests.Load() /
atomic.LoadInt32(&requests) so all accesses are thread-safe; ensure you import
sync/atomic or use the atomic.Int32 methods consistently and update any type
assumptions where requests was previously treated as an int.

In @pkg/subscraping/sources/hackertarget/hackertarget.go:
- Around line 42-44: Remove the duplicated API-key append that mutates
htSearchUrl a second time (the assignment using fmt.Sprintf with randomApiKey)
so the URL isn’t built with a duplicate or empty apikey parameter; ensure the
single API-key append remains only in the earlier logic that already handles
randomApiKey (or, alternatively, gate any append with randomApiKey != ""), and
leave the s.requests++ increment as-is before the HTTP call.
🧹 Nitpick comments (1)
pkg/subscraping/sources/bufferover/bufferover.go (1)

30-30: Consider using atomic operations for thread safety.

While this implementation follows the existing pattern for errors and results counters, there's a potential data race when Statistics() is called concurrently while the Run goroutine modifies the counter. Since the AI summary indicates some sources use atomics, consider using atomic.Int32 or atomic.Int64 for consistency and guaranteed thread safety.

🔒 Proposed refactor using atomic.Int32
+import (
+	"sync/atomic"
+)
+
 type Source struct {
 	apiKeys   []string
 	timeTaken time.Duration
 	errors    int
 	results   int
-	requests  int
+	requests  atomic.Int32
 	skipped   bool
 }

Then update the increment at line 60:

-	s.requests++
+	s.requests.Add(1)

And the statistics exposure at line 141:

-		Requests:  s.requests,
+		Requests:  int(s.requests.Load()),

Note: Initialization at line 39 can be removed since atomic.Int32 is zero-initialized.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b47052 and 4bba544.

📒 Files selected for processing (53)
  • pkg/subscraping/sources/alienvault/alienvault.go
  • pkg/subscraping/sources/anubis/anubis.go
  • pkg/subscraping/sources/bevigil/bevigil.go
  • pkg/subscraping/sources/bufferover/bufferover.go
  • pkg/subscraping/sources/builtwith/builtwith.go
  • pkg/subscraping/sources/c99/c99.go
  • pkg/subscraping/sources/censys/censys.go
  • pkg/subscraping/sources/certspotter/certspotter.go
  • pkg/subscraping/sources/chaos/chaos.go
  • pkg/subscraping/sources/chinaz/chinaz.go
  • pkg/subscraping/sources/commoncrawl/commoncrawl.go
  • pkg/subscraping/sources/crtsh/crtsh.go
  • pkg/subscraping/sources/digitalyama/digitalyama.go
  • pkg/subscraping/sources/digitorus/digitorus.go
  • pkg/subscraping/sources/dnsdb/dnsdb.go
  • pkg/subscraping/sources/dnsdumpster/dnsdumpster.go
  • pkg/subscraping/sources/dnsrepo/dnsrepo.go
  • pkg/subscraping/sources/domainsproject/domainsproject.go
  • pkg/subscraping/sources/driftnet/driftnet.go
  • pkg/subscraping/sources/facebook/ctlogs.go
  • pkg/subscraping/sources/fofa/fofa.go
  • pkg/subscraping/sources/fullhunt/fullhunt.go
  • pkg/subscraping/sources/github/github.go
  • pkg/subscraping/sources/gitlab/gitlab.go
  • pkg/subscraping/sources/hackertarget/hackertarget.go
  • pkg/subscraping/sources/hudsonrock/hudsonrock.go
  • pkg/subscraping/sources/intelx/intelx.go
  • pkg/subscraping/sources/leakix/leakix.go
  • pkg/subscraping/sources/merklemap/merklemap.go
  • pkg/subscraping/sources/netlas/netlas.go
  • pkg/subscraping/sources/onyphe/onyphe.go
  • pkg/subscraping/sources/profundis/profundis.go
  • pkg/subscraping/sources/pugrecon/pugrecon.go
  • pkg/subscraping/sources/quake/quake.go
  • pkg/subscraping/sources/rapiddns/rapiddns.go
  • pkg/subscraping/sources/reconcloud/reconcloud.go
  • pkg/subscraping/sources/reconeer/reconeer.go
  • pkg/subscraping/sources/redhuntlabs/redhuntlabs.go
  • pkg/subscraping/sources/riddler/riddler.go
  • pkg/subscraping/sources/rsecloud/rsecloud.go
  • pkg/subscraping/sources/securitytrails/securitytrails.go
  • pkg/subscraping/sources/shodan/shodan.go
  • pkg/subscraping/sources/sitedossier/sitedossier.go
  • pkg/subscraping/sources/thc/thc.go
  • pkg/subscraping/sources/threatbook/threatbook.go
  • pkg/subscraping/sources/threatcrowd/threatcrowd.go
  • pkg/subscraping/sources/threatminer/threatminer.go
  • pkg/subscraping/sources/virustotal/virustotal.go
  • pkg/subscraping/sources/waybackarchive/waybackarchive.go
  • pkg/subscraping/sources/whoisxmlapi/whoisxmlapi.go
  • pkg/subscraping/sources/windvane/windvane.go
  • pkg/subscraping/sources/zoomeyeapi/zoomeyeapi.go
  • pkg/subscraping/types.go
🚧 Files skipped from review as they are similar to previous changes (28)
  • pkg/subscraping/sources/builtwith/builtwith.go
  • pkg/subscraping/sources/merklemap/merklemap.go
  • pkg/subscraping/sources/alienvault/alienvault.go
  • pkg/subscraping/sources/fullhunt/fullhunt.go
  • pkg/subscraping/sources/anubis/anubis.go
  • pkg/subscraping/sources/zoomeyeapi/zoomeyeapi.go
  • pkg/subscraping/sources/crtsh/crtsh.go
  • pkg/subscraping/sources/certspotter/certspotter.go
  • pkg/subscraping/sources/facebook/ctlogs.go
  • pkg/subscraping/sources/digitalyama/digitalyama.go
  • pkg/subscraping/sources/domainsproject/domainsproject.go
  • pkg/subscraping/sources/profundis/profundis.go
  • pkg/subscraping/sources/reconeer/reconeer.go
  • pkg/subscraping/sources/rsecloud/rsecloud.go
  • pkg/subscraping/sources/securitytrails/securitytrails.go
  • pkg/subscraping/sources/shodan/shodan.go
  • pkg/subscraping/sources/windvane/windvane.go
  • pkg/subscraping/types.go
  • pkg/subscraping/sources/rapiddns/rapiddns.go
  • pkg/subscraping/sources/sitedossier/sitedossier.go
  • pkg/subscraping/sources/github/github.go
  • pkg/subscraping/sources/digitorus/digitorus.go
  • pkg/subscraping/sources/pugrecon/pugrecon.go
  • pkg/subscraping/sources/leakix/leakix.go
  • pkg/subscraping/sources/hudsonrock/hudsonrock.go
  • pkg/subscraping/sources/threatcrowd/threatcrowd.go
  • pkg/subscraping/sources/riddler/riddler.go
  • pkg/subscraping/sources/chaos/chaos.go
🧰 Additional context used
🧬 Code graph analysis (6)
pkg/subscraping/sources/netlas/netlas.go (10)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/reconcloud/reconcloud.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/bevigil/bevigil.go (1)
  • Source (19-26)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/builtwith/builtwith.go (1)
  • Source (33-40)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/github/github.go (1)
  • Source (42-49)
pkg/subscraping/sources/hackertarget/hackertarget.go (1)
  • Source (14-21)
pkg/subscraping/sources/intelx/intelx.go (1)
  • Source (41-48)
pkg/subscraping/sources/securitytrails/securitytrails.go (1)
  • Source (29-36)
pkg/subscraping/sources/shodan/shodan.go (1)
  • Source (15-22)
pkg/subscraping/sources/sitedossier/sitedossier.go (1)
  • Source (22-27)
pkg/subscraping/sources/threatcrowd/threatcrowd.go (1)
  • Source (22-27)
pkg/subscraping/sources/threatminer/threatminer.go (1)
  • Source (21-26)
pkg/subscraping/sources/virustotal/virustotal.go (1)
  • Source (28-35)
pkg/subscraping/sources/waybackarchive/waybackarchive.go (1)
  • Source (16-21)
pkg/subscraping/sources/gitlab/gitlab.go (3)
pkg/subscraping/types.go (2)
  • Result (95-100)
  • Error (108-108)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/github/github.go (1)
  • Source (42-49)
pkg/subscraping/sources/bevigil/bevigil.go (16)
pkg/subscraping/sources/alienvault/alienvault.go (1)
  • Source (22-29)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/bufferover/bufferover.go (1)
  • Source (25-32)
pkg/subscraping/sources/builtwith/builtwith.go (1)
  • Source (33-40)
pkg/subscraping/sources/c99/c99.go (1)
  • Source (16-23)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
pkg/subscraping/sources/dnsrepo/dnsrepo.go (1)
  • Source (15-22)
pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (11)
pkg/subscraping/sources/anubis/anubis.go (1)
  • Source (16-21)
pkg/subscraping/sources/censys/censys.go (1)
  • Source (73-80)
pkg/subscraping/sources/certspotter/certspotter.go (1)
  • Source (20-27)
pkg/subscraping/sources/chaos/chaos.go (1)
  • Source (14-21)
pkg/subscraping/sources/chinaz/chinaz.go (1)
  • Source (15-22)
pkg/subscraping/sources/commoncrawl/commoncrawl.go (1)
  • Source (31-36)
pkg/subscraping/sources/crtsh/crtsh.go (1)
  • Source (28-33)
pkg/subscraping/sources/digitalyama/digitalyama.go (1)
  • Source (14-21)
pkg/subscraping/sources/digitorus/digitorus.go (1)
  • Source (17-22)
pkg/subscraping/sources/dnsdb/dnsdb.go (1)
  • Source (41-48)
pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (1)
  • Source (23-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test Builds (windows-latest)
  • GitHub Check: Test Builds (macOS-latest)
  • GitHub Check: Test Builds (ubuntu-latest)
  • GitHub Check: Analyze (go)
🔇 Additional comments (74)
pkg/subscraping/sources/netlas/netlas.go (1)

43-43: LGTM! Request tracking implemented correctly.

The request tracking implementation follows the established pattern consistently:

  • Field added to the Source struct (line 43)
  • Properly initialized to 0 at the start of Run (line 51)
  • Incremented immediately before each HTTP request (lines 68, 126)
  • Exposed in the Statistics() method (line 207)

The tracking accurately captures both HTTP requests made by this source (GET for domain count, POST for domain download). Since all operations execute sequentially within a single goroutine, no atomic operations are needed.

Also applies to: 51-51, 68-68, 126-126, 207-207

pkg/subscraping/sources/chinaz/chinaz.go (4)

20-20: LGTM! Field declaration follows existing pattern.

The requests field is consistent with the existing errors and results tracking fields.


27-29: LGTM! Proper initialization of tracking fields.

The requests counter is correctly initialized alongside errors and results at the start of each Run invocation.


43-44: LGTM! Request counter correctly placed.

The increment is properly positioned immediately before the HTTP request, ensuring all request attempts are counted regardless of outcome.


109-117: LGTM! Statistics correctly expose request count.

The Requests field is properly included in the Statistics return value, consistent with other tracked metrics.

pkg/subscraping/sources/waybackarchive/waybackarchive.go (3)

20-20: LGTM! Request counter properly declared and initialized.

The requests field follows the same pattern as the existing errors and results counters, and is correctly initialized at the start of Run().

Also applies to: 28-28


36-36: Correct placement of request counter increment.

The counter is incremented immediately before the HTTP call, ensuring both successful and failed requests are tracked.


107-107: Request count properly exposed in statistics.

The Requests field is correctly included in the Statistics return, enabling the -stats flag to display request counts as intended.

pkg/subscraping/sources/gitlab/gitlab.go (5)

12-12: Atomic counters correctly address the race condition.

The migration to atomic.Int32 for errors, results, and requests fields aligns with the pattern used in github.go and properly addresses the previously identified race condition from concurrent goroutine access.

Also applies to: 24-26


40-42: Counter reset ensures clean state per invocation.

Initializing all atomic counters to zero at the start of Run() ensures accurate statistics for each enumeration run.


72-72: Atomic increments in the main enumerate flow look correct.

Request and error counters are properly incremented using atomic operations in the synchronous parts of enumerate().

Also applies to: 76-76, 87-87


139-139: Atomic error increment is correct here.


181-188: Statistics correctly reads atomic counters.

Using .Load() to read and casting to int for the Statistics struct is the correct approach for safely retrieving atomic values.

pkg/subscraping/sources/driftnet/driftnet.go (4)

31-31: LGTM! Consistent with existing counter fields.

The requests field follows the same atomic.Int32 pattern as the existing errors and results counters, ensuring thread-safe access across the concurrent goroutines spawned for each endpoint.


69-69: LGTM! Proper initialization before concurrent access.

The counter is correctly reset at the start of each Run invocation, before the goroutines are spawned, ensuring a clean state and matching the initialization pattern of the error and result counters.


150-150: LGTM! Request tracking correctly placed.

The counter increment is positioned immediately before the HTTP request, ensuring all API calls are tracked (including failed ones). Since runSubsource is invoked once per endpoint in concurrent goroutines, the atomic increment safely counts all requests across the 4 configured endpoints.


129-129: LGTM! Statistics correctly expose the request count.

The atomic Load() safely reads the request count and returns it as part of the statistics, matching the pattern used for errors and results.

pkg/subscraping/sources/threatbook/threatbook.go (4)

33-33: LGTM: Request tracking field added.

The requests field is correctly declared and follows the same pattern as other statistics fields in the struct.


40-42: LGTM: Initialization is correct.

The requests counter is properly initialized to 0 at the start of each Run invocation, consistent with other statistics fields.


56-57: LGTM: Request increment correctly placed.

The request counter is incremented immediately before the API call, which correctly tracks the request regardless of success or failure. This aligns with the PR's objective to monitor actual API usage.


131-139: LGTM: Statistics correctly expose request count.

The Requests field is properly included in the returned Statistics, enabling the -stats flag to display request counts as described in the PR objectives.

pkg/subscraping/sources/reconcloud/reconcloud.go (4)

32-32: LGTM! Consistent field addition.

The requests field addition follows the established pattern across all sources in this PR and is correctly positioned alongside other tracking metrics.


38-40: LGTM! Proper initialization.

The request counter is correctly initialized to 0 at the start of Run, consistent with the existing error and result counters.


48-48: LGTM! Correct request tracking.

The request counter is incremented at the appropriate location—immediately before the HTTP request—accurately tracking the single API call made by this source.


107-114: LGTM! Statistics correctly updated.

The Statistics() method now returns the request count, properly surfacing the tracked metric for the -stats flag output as intended by the PR objectives.

pkg/subscraping/sources/commoncrawl/commoncrawl.go (5)

35-35: LGTM! Field addition follows established pattern.

The requests int field is consistent with other sources and appropriate for single-goroutine access.


41-43: LGTM! Proper initialization.

The requests field is correctly initialized alongside other metrics at the start of Run.


51-52: LGTM! Initial request correctly tracked.

The increment before session.SimpleGet(ctx, indexURL) accurately counts the index fetch request.


139-140: LGTM! Subdomain search requests correctly tracked.

The increment before session.Get(...) accurately counts each subdomain search request made in the loop.


127-127: LGTM! Statistics correctly updated.

The Requests field is properly included in the statistics output, enabling request tracking for the -stats flag.

pkg/subscraping/sources/threatminer/threatminer.go (4)

25-25: LGTM!

The requests field is appropriately added to track HTTP requests, consistent with the existing metric fields.


31-33: LGTM!

The initialization of s.requests is consistent with the existing pattern for s.errors and s.results.


41-42: LGTM! Correct placement for request tracking.

The increment occurs immediately before the HTTP request, ensuring all attempted requests are counted (including those that may fail). This is the correct behavior for tracking API quota usage.


98-105: LGTM!

The Requests field is properly included in the Statistics return value, consistent with the other tracked metrics.

pkg/subscraping/sources/thc/thc.go (3)

27-27: LGTM! Field addition and initialization are correct.

The requests field addition to the Source struct and its initialization in Run() follow the established pattern for other metrics (errors, results). The field is appropriately scoped as unexported.

Also applies to: 42-42


68-68: LGTM! Request increment is correctly placed.

The counter is incremented before the HTTP POST call, ensuring all request attempts are tracked (including those that may fail). The placement within the pagination loop correctly accounts for multiple requests when fetching paginated results.


133-133: LGTM! Statistics exposure is consistent.

The Requests field is properly exposed in the Statistics() method return value, following the same pattern as other metrics.

pkg/subscraping/sources/intelx/intelx.go (1)

46-46: LGTM! Request tracking implemented consistently.

The request counter is properly initialized, incremented before each HTTP call (1 POST + N GETs in the polling loop), and exposed via Statistics(). The implementation follows the established pattern used for error and result tracking across all sources.

Also applies to: 60-60, 90-90, 118-118, 190-190

pkg/subscraping/sources/virustotal/virustotal.go (1)

33-33: LGTM! Request tracking correctly handles pagination.

The implementation properly counts each paginated request in the cursor-based loop and follows the consistent pattern established across all sources. The counter is initialized, incremented before each HTTP call, and exposed via Statistics().

Also applies to: 42-42, 65-65, 135-135

pkg/subscraping/sources/dnsdb/dnsdb.go (1)

46-46: LGTM! Request tracking accounts for both offset lookup and data retrieval.

The implementation correctly counts both the initial rate limit/offset request (line 75, executed within getMaxOffset() at line 215) and each subsequent data retrieval request (line 98) in the pagination loop. The pattern is consistent with other sources.

Also applies to: 55-55, 75-75, 98-98, 206-206

pkg/subscraping/sources/bevigil/bevigil.go (1)

24-24: LGTM! Request tracking follows the established pattern.

The implementation is straightforward and consistent with the pattern used across all other sources in this PR. The counter is properly initialized, incremented before the single HTTP request, and exposed via Statistics(). This matches the approach used in the 15+ sources shown in the relevant code snippets.

Also applies to: 32-32, 48-48, 116-116

pkg/subscraping/sources/censys/censys.go (1)

78-78: LGTM! Request tracking implemented correctly.

The request counter follows the established pattern for error and result tracking in this source. The increment is correctly placed before the HTTP call (line 141), ensuring all request attempts are counted, including those that may fail. The initialization and statistics exposure are properly handled.

Also applies to: 87-87, 141-141, 230-230

pkg/subscraping/sources/dnsdumpster/dnsdumpster.go (4)

28-28: LGTM: Request tracking field added.

The requests field is properly added to the Source struct and follows the same pattern as the existing errors and results fields.


37-37: LGTM: Proper initialization.

The requests counter is correctly initialized to zero at the start of each Run() execution, consistent with the existing error and result counters.


51-51: LGTM: Request increment correctly placed.

The increment is properly positioned immediately before the HTTP call and after API key validation. This ensures that only actual HTTP requests are counted, and skipped sources don't increment the counter.


113-113: LGTM: Statistics properly updated.

The Requests field is correctly included in the returned statistics, enabling the -stats flag to display per-source request counts as intended by the PR.

pkg/subscraping/sources/quake/quake.go (4)

39-39: LGTM! Field addition is consistent with existing metrics.

The requests field follows the same pattern as errors and results, and is appropriately unexported for encapsulation.


48-48: LGTM! Proper initialization alongside other counters.

Explicitly resetting the counter at the start of Run() ensures accurate tracking across multiple invocations.


74-74: LGTM! Request counting is accurate for paginated API calls.

The increment is correctly placed before the HTTP call within the pagination loop, ensuring all requests (including failed ones) are tracked. This accurately reflects API usage for quota management.


157-165: LGTM! Statistics correctly includes request count.

The Requests field is properly returned alongside other metrics, enabling the -stats flag output as intended by the PR objectives.

pkg/subscraping/sources/hackertarget/hackertarget.go (3)

19-19: LGTM: Request tracking field added.

The requests field is appropriately added to track HTTP requests. Since the Run method operates in a single goroutine, a simple int without atomic operations is sufficient.


28-28: LGTM: Request counter initialization.

The request counter is correctly initialized at the start of the Run method.


117-117: LGTM: Statistics correctly expose request count.

The Requests field is properly included in the returned statistics.

pkg/subscraping/sources/redhuntlabs/redhuntlabs.go (5)

31-31: LGTM! Request tracking field follows established pattern.

The requests int field addition is consistent with the pattern used across other sources (anubis, commoncrawl, crtsh, etc.).


39-39: LGTM! Proper initialization.

Initializing s.requests = 0 at the start of Run() ensures a clean state for each enumeration, consistent with the existing pattern for errors and results.


61-61: LGTM! Request increment correctly placed.

The increment before the initial HTTP request correctly tracks the API call.


88-88: LGTM! Pagination request increment correctly placed.

The increment before each paginated HTTP request correctly tracks API calls in the pagination loop.


155-162: LGTM! Statistics output correctly includes request count.

The Requests: s.requests field is properly included in the returned statistics, making the tracked request count available to the runner's stats output.

pkg/subscraping/sources/dnsrepo/dnsrepo.go (4)

20-20: Field addition looks good.

The requests field is correctly added to track HTTP request counts, following the same pattern as the existing errors and results fields.


32-32: Initialization is correct.

The requests counter is properly reset to 0 at the start of each Run() invocation, ensuring accurate per-run tracking.


55-55: Request increment is correctly placed.

The counter is incremented immediately before the HTTP request, ensuring it accurately reflects the number of API calls attempted regardless of success or failure.


118-126: Statistics correctly exposes the request count.

The Requests field is properly returned in the statistics payload and follows the established pattern used across the vast majority of sources in this codebase (50 out of 53 sources use plain int fields without synchronization).

pkg/subscraping/sources/onyphe/onyphe.go (4)

39-39: LGTM! Field addition is consistent with existing counters.

The requests field follows the same pattern as errors and results, maintaining consistency within the Source struct.


48-48: LGTM! Proper initialization.

The counter is correctly reset at the start of each Run invocation, consistent with the initialization pattern for errors and results.


78-78: LGTM! Request counting correctly placed.

The increment is positioned inside the pagination loop right before each HTTP call, ensuring all requests (including those that may fail) are accurately counted. This placement correctly tracks requests across multiple pages.


169-169: LGTM! Statistics correctly populated.

The request count is properly included in the returned Statistics struct, making the tracked data available to consumers.

pkg/subscraping/sources/whoisxmlapi/whoisxmlapi.go (4)

36-36: LGTM! Field addition follows existing pattern.

The requests field is added consistently with the existing errors and results fields in the struct.


45-45: LGTM! Initialization is correctly placed.

The s.requests counter is properly initialized before the goroutine starts, consistent with the pattern for errors and results.


117-125: LGTM! Statistics output correctly includes the new field.

The Requests field is properly included in the Statistics() return value, maintaining consistency with the other tracked metrics.


59-59: [Your rewritten review comment text here]
[Exactly ONE classification tag]

pkg/subscraping/sources/c99/c99.go (1)

21-21: Request tracking implementation looks correct.

The request counter is properly initialized, incremented before the HTTP request, and exposed via Statistics(). The placement ensures that only actual request attempts are counted, correctly skipping the increment when no API key is available (lines 48-52).

Also applies to: 40-40, 55-55, 129-129

pkg/subscraping/sources/bufferover/bufferover.go (3)

37-39: LGTM! Initialization follows the established pattern.

The counter is properly initialized to 0 at the start of Run(), consistent with the existing errors and results initialization.


60-60: LGTM! Request tracking correctly counts all attempts.

The increment is placed before the HTTP request and counts all attempts (including failures), which is appropriate for tracking API usage and quota consumption as specified in the PR objectives.


141-141: LGTM! Statistics exposure is correct.

The Requests field is properly exposed in the Statistics() method, enabling the -stats flag to display request counts as intended.

timeTaken time.Duration
errors int
results int
requests int
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check atomic usage patterns across subscraping sources

echo "=== Sources using atomic types for requests/errors/results ==="
rg -n "atomic\.(Int32|Int64|Int)" pkg/subscraping/sources/ -A 2 -B 2 | head -100

echo -e "\n=== Sources with plain int fields (like c99.go) ==="
rg -n "^\s+(requests|errors|results)\s+int\s*$" pkg/subscraping/sources/ -C 2 | head -100

echo -e "\n=== File structure of c99.go to understand the context ==="
head -50 pkg/subscraping/sources/c99/c99.go

Repository: projectdiscovery/subfinder

Length of output: 8530


🏁 Script executed:

#!/bin/bash
# Examine c99.go more thoroughly

echo "=== Full c99.go file ==="
wc -l pkg/subscraping/sources/c99/c99.go

echo -e "\n=== Full Run() method and Statistics() method ==="
cat -n pkg/subscraping/sources/c99/c99.go | sed -n '33,130p'

Repository: projectdiscovery/subfinder

Length of output: 3169


🏁 Script executed:

#!/bin/bash
# Check how Statistics() is used and when it's called

echo "=== Check Statistics() usage in the codebase ==="
rg -n "Statistics()" --type go -B 2 -A 2 | head -50

echo -e "\n=== Check for -stats flag implementation ==="
rg -n "stats" pkg/subscraping/ --type go -i | grep -i flag | head -20

echo -e "\n=== Check if consumers wait for channel to close before calling Statistics() ==="
rg -n "Range\|<-" pkg/subscraping/ --type go -A 1 -B 1 | grep -A 3 -B 3 Statistics | head -30

Repository: projectdiscovery/subfinder

Length of output: 2864


🏁 Script executed:

#!/bin/bash
# Check Runner implementation to understand timing of Statistics() calls

echo "=== Check agent.go or runner implementation ==="
fd -t f "agent|runner" pkg/subscraping/ --type go | head -5

echo -e "\n=== Look at PassiveAgent interface and implementation ==="
rg -n "GetStatistics|passiveAgent" pkg/ --type go -B 2 -A 5 | head -80

echo -e "\n=== Check how sources are run in the enumeration flow ==="
rg -n "\.Run\(" pkg/subscraping/ --type go -B 2 -A 3 | head -60

Repository: projectdiscovery/subfinder

Length of output: 5326


🏁 Script executed:

#!/bin/bash
# Examine the complete enumeration flow

echo "=== Check enumerate.go around the GetStatistics call ==="
sed -n '50,210p' pkg/runner/enumerate.go | cat -n

echo -e "\n=== Check PassiveAgent.EnumerateSubdomainsWithCtx implementation ==="
sed -n '1,80p' pkg/passive/passive.go | cat -n

Repository: projectdiscovery/subfinder

Length of output: 10137


Use atomic types for thread-safe statistics to match other sources.

The requests, errors, and results fields are modified in the Run() goroutine and read by Statistics(). While the current enumeration flow synchronizes these accesses via wg.Wait() before calling GetStatistics(), this is not a defensive pattern and creates inconsistency with other sources.

Three sources already use atomic.Int32 for these fields: gitlab.go, github.go, and driftnet.go. Consider migrating c99.go to use atomic.Int32 for consistency and to guard against future code changes that may not maintain the current synchronization guarantee.

🤖 Prompt for AI Agents
In @pkg/subscraping/sources/c99/c99.go at line 21, The fields requests, errors,
and results in the c99 source struct should be changed from plain int to
atomic.Int32 (like gitlab.go/github.go/driftnet.go) to make stats thread-safe:
replace their declarations with atomic.Int32, import sync/atomic (or use the
atomic package type), increment them using Add(1) in Run(), and read them using
Load() in Statistics(); ensure zero-value initialization is fine and update any
casts when returning numbers to match the Statistics() return types.

timeTaken time.Duration
errors int
results int
requests int
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if other sources use atomic operations for request tracking

echo "=== Searching for request tracking in sources ==="
rg -n 'requests\s+(int|atomic)' pkg/subscraping/sources/ --type go

echo ""
echo "=== Checking for atomic operations in request handling ==="
rg -n 'atomic\..*[Rr]equest|sync\.Mutex.*[Rr]equest' pkg/subscraping/sources/ --type go

Repository: projectdiscovery/subfinder

Length of output: 3516


🏁 Script executed:

#!/bin/bash
# Examine the FOFA source implementation
echo "=== FOFA source implementation ==="
cat -n pkg/subscraping/sources/fofa/fofa.go | sed -n '25,65p'
echo ""
echo "=== FOFA Statistics method ==="
cat -n pkg/subscraping/sources/fofa/fofa.go | sed -n '135,145p'

Repository: projectdiscovery/subfinder

Length of output: 1999


🏁 Script executed:

#!/bin/bash
# Check how other sources implement request tracking to understand concurrent access patterns
echo "=== Checking Run() method signatures in sources ==="
rg -n 'func \(.*Source\) Run\(' pkg/subscraping/sources/ --type go -A 3 | head -40

echo ""
echo "=== Check for goroutine spawning patterns ==="
rg -n 'go ' pkg/subscraping/sources/fofa/fofa.go --type go -B 2 -A 2

Repository: projectdiscovery/subfinder

Length of output: 3448


Use atomic.Int32 for thread-safe request counting to match safer implementations in the codebase.

The request tracking logic is correct, but the plain int field creates a data race: Statistics() can be called while the Run() goroutine is active, resulting in unsafe concurrent access. While most sources in the codebase currently use plain int, newer implementations (gitlab, github, driftnet) use atomic.Int32, indicating the codebase is moving toward thread-safe counters.

Replace the plain int with atomic.Int32 for requests and use atomic.AddInt32() to increment the counter for consistency and safety.

🤖 Prompt for AI Agents
In @pkg/subscraping/sources/fofa/fofa.go at line 30, Replace the plain int field
named requests with atomic.Int32 on the struct (use sync/atomic's atomic.Int32
type), update all increments in Run() (or any goroutine) to use requests.Add(1)
/ atomic.AddInt32(&requests, 1) style, and change any reads (e.g., in
Statistics()) to use requests.Load() / atomic.LoadInt32(&requests) so all
accesses are thread-safe; ensure you import sync/atomic or use the atomic.Int32
methods consistently and update any type assumptions where requests was
previously treated as an int.

Comment on lines +42 to +44
htSearchUrl = fmt.Sprintf("%s&apikey=%s", htSearchUrl, randomApiKey)

s.requests++
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Duplicate API key appending breaks URL construction.

Lines 42-43 duplicate the API key logic already handled in lines 38-39, causing two issues:

  1. When randomApiKey is not empty: The API key gets appended twice, creating an invalid URL with duplicate apikey parameters.
  2. When randomApiKey is empty: Line 42 appends &apikey= with an empty value, breaking the request.

Lines 42-43 should be removed entirely.

🐛 Proposed fix
 		if randomApiKey != "" {
 			htSearchUrl = fmt.Sprintf("%s&apikey=%s", htSearchUrl, randomApiKey)
 		}
 
-		htSearchUrl = fmt.Sprintf("%s&apikey=%s", htSearchUrl, randomApiKey)
-
 		s.requests++
 		resp, err := session.SimpleGet(ctx, htSearchUrl)

Note: The s.requests++ increment on line 44 is correctly placed before the HTTP call and is safe without atomics since it executes in a single goroutine.

🤖 Prompt for AI Agents
In @pkg/subscraping/sources/hackertarget/hackertarget.go around lines 42 - 44,
Remove the duplicated API-key append that mutates htSearchUrl a second time (the
assignment using fmt.Sprintf with randomApiKey) so the URL isn’t built with a
duplicate or empty apikey parameter; ensure the single API-key append remains
only in the earlier logic that already handles randomApiKey (or, alternatively,
gate any append with randomApiKey != ""), and leave the s.requests++ increment
as-is before the HTTP call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce request tracking with -stats flag

3 participants