Skip to content

feat: add merklemap source#1683

Merged
dogancanbakir merged 5 commits intoprojectdiscovery:devfrom
nohehf:main
Dec 19, 2025
Merged

feat: add merklemap source#1683
dogancanbakir merged 5 commits intoprojectdiscovery:devfrom
nohehf:main

Conversation

@nohehf
Copy link
Copy Markdown
Contributor

@nohehf nohehf commented Dec 16, 2025

This PR adds a source for https://www.merklemap.com/ api (docs: https://www.merklemap.com/documentation/search), which allows to search for subdomains in a database of CT logs.
It can be used as a drop in replacement to crt.sh, to prevent spamming it.

Summary by CodeRabbit

  • New Features

    • Added merklemap as a new passive data source.
    • Provides paginated retrieval of results, optional API key configuration, per-run statistics, and clearer error reporting when fetches fail.
    • Returns discovered hostnames with associated metadata.
  • Tests

    • Updated test expectations to include the new source in default and recursive source lists.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 16, 2025

Walkthrough

Adds a new passive subdomain source "merklemap" with a complete Source implementation (API key handling, HTTP requests with pagination, JSON parsing, result streaming, and statistics) and registers it in the passive sources list.

Changes

Cohort / File(s) Summary
Passive source registration
pkg/passive/sources.go
Imports the merklemap package and inserts &merklemap.Source{} into the AllSources list.
Merklemap source implementation
pkg/subscraping/sources/merklemap/merklemap.go
New Source type implementing Run, pagination (fetchAllPages / fetchPage), HTTP request/response handling (including non-200 errors), JSON response parsing (Count, Results.Hostname / SubjectCommonName), API key management (AddApiKeys), metadata methods, and statistics counters.
Tests updated
pkg/passive/sources_test.go
Adds "merklemap" to expected source lists in tests (AllSources, expectedDefaultSources, expectedDefaultRecursiveSources).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as Scraper
    participant Source as merklemap.Source
    participant API as Merklemap API (HTTP)
    participant Session as subscraping.Session / results chan

    Caller->>Source: Run(ctx, domain, session)
    Note right of Source: select API key, build headers
    loop per page
        Source->>API: GET /v1/hosts?query=<domain>&page=N (Authorization)
        API-->>Source: 200 JSON (Count, Results[]) or error
        alt 200
            Source->>Source: decode JSON, map Results to subscraping.Result
            Source->>Session: send each subscraping.Result
        else error
            Source->>Session: send error result, increment error counter
            Source-->>Source: stop pagination
        end
        Source->>Source: determine if more pages needed
    end
    Source-->>Session: close results channel
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect pagination termination and potential off-by-one logic in fetchAllPages.
  • Verify HTTP error construction and non-200 handling in fetchPage.
  • Confirm JSON struct fields match the real API (Count, Results.Hostname/SubjectCommonName).
  • Review concurrency/safety of API key selection and statistics counters.

Poem

🐰 I nibbled keys and headers bright,
Paging through the net by moonlit light,
Each hostname a carrot found,
I send them hopping, one by one,
The warren grows — merklemap's delight.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: add merklemap source' accurately and concisely describes the primary change: integrating a new MerkleMap source for subdomain discovery.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2d7ecf and 322a433.

📒 Files selected for processing (2)
  • pkg/passive/sources.go (2 hunks)
  • pkg/passive/sources_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/passive/sources.go
  • pkg/passive/sources_test.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ehsandeep ehsandeep changed the base branch from main to dev December 16, 2025 15:50
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/subscraping/sources/merklemap/merklemap.go (1)

95-125: Consider consolidating response body reads.

The response body is read twice: once at line 106 for error handling and once at line 114 for decoding. While this works correctly, it's slightly inefficient.

Apply this diff to read the body once:

 func (s *Source) fetchPage(ctx context.Context, baseURL string, page int, headers map[string]string, session *subscraping.Session) (*response, error) {
 	url := baseURL + "&page=" + strconv.Itoa(page)
 
 	resp, err := session.Get(ctx, url, "", headers)
 	if err != nil {
 		return nil, err
 	}
 	defer session.DiscardHTTPResponse(resp)
 
+	respBody, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return nil, err
+	}
+
 	if resp.StatusCode != 200 {
-		respBody, err := io.ReadAll(resp.Body)
-		if err != nil {
-			return nil, fmt.Errorf("request failed with status %d: %s", resp.StatusCode, err)
-		}
 		return nil, fmt.Errorf("request failed with status %d: %s", resp.StatusCode, string(respBody))
 	}
 
 	var pageResponse response
-	respBody, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return nil, err
-	}
-
 	decoder := json.NewDecoder(bytes.NewReader(respBody))
 	if err := decoder.Decode(&pageResponse); err != nil {
 		return nil, err
 	}
 
 	return &pageResponse, nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7842ebf and 7ac8b6c.

📒 Files selected for processing (2)
  • pkg/passive/sources.go (3 hunks)
  • pkg/subscraping/sources/merklemap/merklemap.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/subscraping/sources/merklemap/merklemap.go (2)
pkg/subscraping/types.go (1)
  • Statistics (29-34)
pkg/subscraping/utils.go (1)
  • PickRandom (12-20)
🔇 Additional comments (7)
pkg/subscraping/sources/merklemap/merklemap.go (5)

1-15: LGTM!

Package structure and imports are appropriate for the implementation.


17-24: LGTM!

The Source struct follows the expected pattern with appropriate fields for tracking API keys, timing, and statistics.


26-57: LGTM!

The Run method correctly implements the expected pattern: it resets counters, picks a random API key, sets appropriate headers (including a fixed User-Agent to avoid Cloudflare protection), and delegates to the pagination handler. The deferred time tracking and channel closing are properly structured.


127-156: LGTM!

All metadata methods are correctly implemented according to the Source interface. The source correctly indicates it needs an API key, supports recursive searches, and is not enabled by default.


158-165: LGTM!

The response type correctly models the API response structure with appropriate JSON tags. The Hostname field is properly extracted for subdomain results.

pkg/passive/sources.go (2)

39-39: LGTM!

The import statement is correctly placed and necessary for registering the new source.


85-85: LGTM!

The merklemap source is correctly added to the AllSources list, making it available for subdomain enumeration.

@dogancanbakir
Copy link
Copy Markdown
Member

Thanks for the PR! We previously decided not to include this source #1482. We'd love to hear your thoughts on why you believe it should be added and how it could benefit Subfinder users. Looking forward to your insights!

@nohehf
Copy link
Copy Markdown
Contributor Author

nohehf commented Dec 18, 2025

Hey @dogancanbakir !
I read #1482, the cloudflare issue is fixed (I contacted them and they changes the WAF rules + you can see in this PR that I use a fixed user agent, which helped).
I believe that this source is great if you want to use subfinder at scale, on a schedule for instance, and don't want to spam crt.sh (which now seem to block major could providers anyway).
It is indeed a paid only api, but I believe it is worth it for enterprises, that have high volume. We plan to use it in combination with other paid APIs.
I do already use my subfinder fork with merklemap at scale and it's been a great drop in replacement for crt.sh.
Hope this helps clarify the use case, I believe this integration does add value. Lmk if you need further info

@dogancanbakir
Copy link
Copy Markdown
Member

@nohehf This helps a lot, thanks!

@dogancanbakir
Copy link
Copy Markdown
Member

@nohehf Tests fails, we should add it to the default and recursive lists in sources_test.go file

@dogancanbakir dogancanbakir self-requested a review December 19, 2025 14:36
Copy link
Copy Markdown
Member

@dogancanbakir dogancanbakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor change

@nohehf
Copy link
Copy Markdown
Contributor Author

nohehf commented Dec 19, 2025

Should be good thanks !

@dogancanbakir dogancanbakir merged commit 2e0982c into projectdiscovery:dev Dec 19, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants