Skip to content

Update Censys PAT integration and docs#1654

Merged
dogancanbakir merged 4 commits intodevfrom
feature/censys-api-key
Dec 9, 2025
Merged

Update Censys PAT integration and docs#1654
dogancanbakir merged 4 commits intodevfrom
feature/censys-api-key

Conversation

@knakul853
Copy link
Copy Markdown
Contributor

@knakul853 knakul853 commented Oct 3, 2025

closes #1614

deprecate censys v2 api

Summary by CodeRabbit

  • New Features

    • Added support for organization IDs in Censys API authentication
  • Refactor

    • Improved Censys API integration with cursor-based pagination and optimized request structure
  • Tests

    • Added comprehensive test suite for Censys integration covering key scenarios and functionality

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 3, 2025

Walkthrough

The Censys source integration is migrated from a GET-based search API to Censys API v3 using POST requests. The change introduces new request/response structures, API key management with organization ID support, cursor-based pagination via NextPageToken, and adds a public AddApiKeys method for key parsing.

Changes

Cohort / File(s) Summary
Censys API v3 Integration
pkg/subscraping/sources/censys/censys.go
Migrates from GET search to POST-based API v3; introduces apiKey type for key management; adds AddApiKeys method to parse and store PAT and PAT:ORG_ID credentials; replaces previous certificate model with streamlined request/response structures using searchRequest body and cursor-based pagination via NextPageToken; adds X-Organization-ID header support; refactors Run method to select random API key, build and send POST request, and iterate hits for domain extraction.
Censys Test Suite
pkg/subscraping/sources/censys/censys_test.go
Adds comprehensive test coverage for missing API keys, context cancellation, metadata exposure, API key parsing for both PAT and PAT:ORG_ID formats, and statistics reporting; includes helper for multi-rate limiter construction.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • API integration logic: Review new searchRequest structure, POST request construction, and response decoding to ensure correct API v3 compliance
  • Pagination handling: Verify NextPageToken cursor-based pagination implementation and hits iteration logic
  • API key management: Validate random key selection, organization ID header handling, and the new AddApiKeys parsing logic
  • Run method refactoring: Ensure error handling, time tracking, and data streaming are correctly maintained through the control flow changes

Poem

🐰 A hop through Censys v3's door,
With POST requests where GET lived before,
Keys shuffled like carrots at random,
Cursors paginate through the handsom,
Organization IDs now run deep—
The API's secrets are ours to keep! 🔐

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Update Censys PAT integration and docs' is directly related to the main change of updating the Censys integration from API v2 to v3 PAT-based authentication.
Linked Issues check ✅ Passed The PR implements the Censys API v3 integration requested in issue #1614, replacing GET requests with POST, handling PAT API keys, and updating the response parsing structure accordingly.
Out of Scope Changes check ✅ Passed All changes are scoped to the Censys integration refactoring for API v3: source implementation, API key handling, request/response structures, and corresponding tests.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/censys-api-key

Comment @coderabbitai help to get the list of available commands and usage tips.

@dogancanbakir dogancanbakir self-requested a review October 7, 2025 12:53
@yutasuzuki-0206
Copy link
Copy Markdown

Hello. I have been informed by a distributor that Censys API v2 is scheduled for EOL (End of Life) on December 15th.

Since many users rely on the Censys integration in subfinder, failing to address this before the deadline will likely cause issues for a significant number of people.

Do you have a rough timeline for when this could be reviewed or merged?

I understand the team is busy, but we would greatly appreciate it if you could handle this before the 15th. Thank you!

@knakul853 knakul853 force-pushed the feature/censys-api-key branch from 333eaec to b7193b4 Compare December 8, 2025 14:35
@knakul853 knakul853 marked this pull request as ready for review December 8, 2025 14:49
@knakul853 knakul853 requested a review from Ice3man543 December 8, 2025 14:51
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
pkg/subscraping/sources/censys/censys.go (2)

149-163: Consider validating HTTP status code before decoding.

The code decodes the response body directly without checking resp.StatusCode. If the API returns a 4xx (e.g., 401 Unauthorized, 429 Rate Limited) or 5xx error, the error response body will be decoded into the success response struct, likely resulting in silently empty results rather than a meaningful error.

 		if err != nil {
 			results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
 			s.errors++
 			session.DiscardHTTPResponse(resp)
 			return
 		}

+		if resp.StatusCode != http.StatusOK {
+			_ = resp.Body.Close()
+			results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("unexpected status code: %d", resp.StatusCode)}
+			s.errors++
+			return
+		}
+
 		var censysResponse response

Note: You'll need to import "fmt" if not already imported.


204-217: Clarify docstring to document both supported formats.

The docstring mentions PAT:ORG_ID format but the implementation also supports PAT-only keys for free users (lines 211-216). Consider updating the docstring to document both formats explicitly.

-// AddApiKeys parses and adds API keys.
-// Format: "PAT:ORG_ID" where ORG_ID is required for paid accounts.
-// Example: "censys_xxx_token:12345678-91011-1213"
+// AddApiKeys parses and adds API keys.
+// Supported formats:
+//   - "PAT:ORG_ID" for paid accounts (e.g., "censys_xxx_token:12345678-91011-1213")
+//   - "PAT" for free users (e.g., "censys_xxx_token")
 func (s *Source) AddApiKeys(keys []string) {
pkg/subscraping/sources/censys/censys_test.go (1)

17-25: Consider handling the error from NewMultiLimiter.

Ignoring the error could mask test setup issues. While unlikely to fail with these options, handling it improves test reliability.

-func createTestMultiRateLimiter(ctx context.Context) *ratelimit.MultiLimiter {
-	mrl, _ := ratelimit.NewMultiLimiter(ctx, &ratelimit.Options{
+func createTestMultiRateLimiter(t *testing.T, ctx context.Context) *ratelimit.MultiLimiter {
+	t.Helper()
+	mrl, err := ratelimit.NewMultiLimiter(ctx, &ratelimit.Options{
 		Key:         "censys",
 		IsUnlimited: false,
 		MaxCount:    math.MaxInt32,
 		Duration:    time.Millisecond,
 	})
+	require.NoError(t, err)
 	return mrl
 }

Note: Update all call sites to pass t as the first argument.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between deccd62 and 0b6432d.

📒 Files selected for processing (2)
  • pkg/subscraping/sources/censys/censys.go (5 hunks)
  • pkg/subscraping/sources/censys/censys_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/subscraping/sources/censys/censys.go (1)
pkg/subscraping/utils.go (2)
  • PickRandom (12-20)
  • CreateApiKeys (22-30)
🔇 Additional comments (9)
pkg/subscraping/sources/censys/censys.go (4)

16-33: LGTM!

The constants are well-documented and clearly define the API endpoint, pagination limits, and header values for the Censys Platform API v3 integration.


35-70: LGTM!

The request/response structures are well-defined and appropriately map to the Censys Platform API v3 schema. The searchRequest supports cursor-based pagination, and the response hierarchy correctly targets certificate_v1.resource.names for subdomain extraction.


93-101: LGTM!

Good use of the PickRandom utility for load distribution across multiple API keys, with clear documentation explaining the rationale. The empty PAT check correctly handles the skip case.


165-181: LGTM!

The result extraction correctly iterates through hits and their certificate names, with proper context cancellation handling. The pagination logic correctly uses NextPageToken and respects the max page limit.

pkg/subscraping/sources/censys/censys_test.go (5)

27-51: LGTM!

This test correctly validates that the source is skipped when no API keys are configured. Since the skip happens before any HTTP request is made, the test is deterministic and doesn't require network access.


91-98: LGTM!

Simple and effective metadata verification for the source interface methods.


100-122: LGTM!

Thorough testing of both API key formats. The subtests clearly validate that:

  • PAT:ORG_ID format correctly populates both fields
  • PAT format (for free users) populates only the PAT field with empty orgID

Good use of require.Len before accessing slice elements to prevent panics.


124-137: LGTM!

Correctly validates that the Statistics() method properly maps all internal fields to the returned subscraping.Statistics struct.


53-89: Context cancellation is properly checked before HTTP requests are made.

The Run() method checks ctx.Done() at the start of its main loop (before building any request body or making HTTP calls), so when cancel() is called immediately after Run() returns, the context will be cancelled before any HTTP request is sent. The theoretical race window between goroutine startup and the context check is negligible in practice. Additionally, HTTPRequest() uses http.NewRequestWithContext() which respects context cancellation. The test design is sound and follows the project's patterns.

Likely an incorrect or invalid review comment.

@knakul853
Copy link
Copy Markdown
Contributor Author

Screenshot 2025-12-08 at 8 30 36 PM

@dogancanbakir dogancanbakir merged commit 5ad10c6 into dev Dec 9, 2025
10 checks passed
@dogancanbakir dogancanbakir deleted the feature/censys-api-key branch December 9, 2025 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implementing Censys API v3

3 participants