Skip to content

Detect and ignore path loops#558

Merged
NGTmeaty merged 9 commits into
mainfrom
fix-pathloop
Feb 9, 2026
Merged

Detect and ignore path loops#558
NGTmeaty merged 9 commits into
mainfrom
fix-pathloop

Conversation

@NGTmeaty

@NGTmeaty NGTmeaty commented Feb 8, 2026

Copy link
Copy Markdown
Collaborator

Detect and stop loops in path and query.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds crawler-trap detection during URL normalization by rejecting URLs with excessive repetition in path segments and/or query parameters, helping prevent infinite/degenerate crawls in the preprocessor.

Changes:

  • Introduces hasPathLoop() to detect repeated path segments and repeated query parameter pairs.
  • Integrates loop detection into NormalizeURL for both CGO and non-CGO URL parsing backends.
  • Expands preprocessor test coverage with new path-loop specific tests and additional NormalizeURL cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/pkg/preprocessor/url_test.go Adds NormalizeURL test cases that should now fail when path repetition indicates a loop/trap.
internal/pkg/preprocessor/url_cgofree.go Captures pathname/search from ada parser and rejects URLs flagged by hasPathLoop().
internal/pkg/preprocessor/url_cgo.go Same as above for the CGO-backed ada parser.
internal/pkg/preprocessor/pathloop_test.go New unit tests validating path/query repetition detection behavior.
internal/pkg/preprocessor/pathloop.go New loop/trap detection helper used by NormalizeURL.
internal/pkg/preprocessor/error.go Adds a new exported error for loop/trap detection failures.

Comment thread internal/pkg/preprocessor/error.go Outdated
Comment thread internal/pkg/preprocessor/pathloop.go Outdated
@codecov-commenter

codecov-commenter commented Feb 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.75510% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.81%. Comparing base (8d29277) to head (5bc1ffb).

Files with missing lines Patch % Lines
internal/pkg/preprocessor/pathloop.go 84.61% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #558      +/-   ##
==========================================
+ Coverage   56.66%   56.81%   +0.14%     
==========================================
  Files         131      132       +1     
  Lines        6621     6667      +46     
==========================================
+ Hits         3752     3788      +36     
- Misses       2495     2502       +7     
- Partials      374      377       +3     
Flag Coverage Δ
e2etests 42.22% <53.06%> (-0.01%) ⬇️
unittests 29.23% <82.97%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 2f72ca0. Idea still
makes sense, but implementation needs work.
Comment thread internal/pkg/preprocessor/url_cgofree.go
Comment thread internal/pkg/preprocessor/pathloop_test.go
Comment thread e2e/test/nxdomain/nxdomain_test.go
Comment thread internal/pkg/preprocessor/pathloop.go Outdated
Comment thread internal/pkg/preprocessor/pathloop.go Outdated
Comment thread internal/pkg/preprocessor/pathloop.go Outdated
@NGTmeaty NGTmeaty merged commit aa455d9 into main Feb 9, 2026
5 checks passed
@NGTmeaty NGTmeaty deleted the fix-pathloop branch February 9, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants