feat: add merklemap source#1683
Conversation
WalkthroughAdds a new passive subdomain source "merklemap" with a complete Source implementation (API key handling, HTTP requests with pagination, JSON parsing, result streaming, and statistics) and registers it in the passive sources list. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Scraper
participant Source as merklemap.Source
participant API as Merklemap API (HTTP)
participant Session as subscraping.Session / results chan
Caller->>Source: Run(ctx, domain, session)
Note right of Source: select API key, build headers
loop per page
Source->>API: GET /v1/hosts?query=<domain>&page=N (Authorization)
API-->>Source: 200 JSON (Count, Results[]) or error
alt 200
Source->>Source: decode JSON, map Results to subscraping.Result
Source->>Session: send each subscraping.Result
else error
Source->>Session: send error result, increment error counter
Source-->>Source: stop pagination
end
Source->>Source: determine if more pages needed
end
Source-->>Session: close results channel
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/subscraping/sources/merklemap/merklemap.go (1)
95-125: Consider consolidating response body reads.The response body is read twice: once at line 106 for error handling and once at line 114 for decoding. While this works correctly, it's slightly inefficient.
Apply this diff to read the body once:
func (s *Source) fetchPage(ctx context.Context, baseURL string, page int, headers map[string]string, session *subscraping.Session) (*response, error) { url := baseURL + "&page=" + strconv.Itoa(page) resp, err := session.Get(ctx, url, "", headers) if err != nil { return nil, err } defer session.DiscardHTTPResponse(resp) + respBody, err := io.ReadAll(resp.Body) + if err != nil { + return nil, err + } + if resp.StatusCode != 200 { - respBody, err := io.ReadAll(resp.Body) - if err != nil { - return nil, fmt.Errorf("request failed with status %d: %s", resp.StatusCode, err) - } return nil, fmt.Errorf("request failed with status %d: %s", resp.StatusCode, string(respBody)) } var pageResponse response - respBody, err := io.ReadAll(resp.Body) - if err != nil { - return nil, err - } - decoder := json.NewDecoder(bytes.NewReader(respBody)) if err := decoder.Decode(&pageResponse); err != nil { return nil, err } return &pageResponse, nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pkg/passive/sources.go(3 hunks)pkg/subscraping/sources/merklemap/merklemap.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/subscraping/sources/merklemap/merklemap.go (2)
pkg/subscraping/types.go (1)
Statistics(29-34)pkg/subscraping/utils.go (1)
PickRandom(12-20)
🔇 Additional comments (7)
pkg/subscraping/sources/merklemap/merklemap.go (5)
1-15: LGTM!Package structure and imports are appropriate for the implementation.
17-24: LGTM!The Source struct follows the expected pattern with appropriate fields for tracking API keys, timing, and statistics.
26-57: LGTM!The Run method correctly implements the expected pattern: it resets counters, picks a random API key, sets appropriate headers (including a fixed User-Agent to avoid Cloudflare protection), and delegates to the pagination handler. The deferred time tracking and channel closing are properly structured.
127-156: LGTM!All metadata methods are correctly implemented according to the Source interface. The source correctly indicates it needs an API key, supports recursive searches, and is not enabled by default.
158-165: LGTM!The response type correctly models the API response structure with appropriate JSON tags. The Hostname field is properly extracted for subdomain results.
pkg/passive/sources.go (2)
39-39: LGTM!The import statement is correctly placed and necessary for registering the new source.
85-85: LGTM!The merklemap source is correctly added to the AllSources list, making it available for subdomain enumeration.
|
Thanks for the PR! We previously decided not to include this source #1482. We'd love to hear your thoughts on why you believe it should be added and how it could benefit Subfinder users. Looking forward to your insights! |
|
Hey @dogancanbakir ! |
|
@nohehf This helps a lot, thanks! |
|
@nohehf Tests fails, we should add it to the default and recursive lists in |
|
Should be good thanks ! |
This PR adds a source for https://www.merklemap.com/ api (docs: https://www.merklemap.com/documentation/search), which allows to search for subdomains in a database of CT logs.
It can be used as a drop in replacement to crt.sh, to prevent spamming it.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.