terminal: Sanitize trailing periods in URL detection#37684
Merged
probably-neb merged 3 commits intozed-industries:mainfrom Sep 9, 2025
Merged
terminal: Sanitize trailing periods in URL detection#37684probably-neb merged 3 commits intozed-industries:mainfrom
probably-neb merged 3 commits intozed-industries:mainfrom
Conversation
Contributor
|
probably-neb
requested changes
Sep 8, 2025
Collaborator
probably-neb
left a comment
There was a problem hiding this comment.
Please fix the formatting errors in CI (just need to run cargo fmt and push), and I'll be ready to merge. Curious if @davewa has any thoughts on this as well.
davewa
reviewed
Sep 8, 2025
Contributor
davewa
left a comment
There was a problem hiding this comment.
Looks nice! I think this is a useful heuristic to have for urls. I left a few minor suggestions.
Improve URL detection in terminal by removing unbalanced trailing parentheses from captured URLs. This handles common cases like markdown links [text](url) and parenthetical URLs (url) where the closing parenthesis should not be part of the clickable link. The sanitization logic: - Counts opening and closing parentheses in detected URLs - Removes trailing ')' characters when there are more closing than opening parentheses - Preserves balanced parentheses in legitimate URLs like Wikipedia disambiguation pages This provides better user experience by correctly handling the majority of real-world URL contexts, despite technically making some valid URLs unclickable (URLs that legitimately end with unbalanced closing parentheses are extremely rare).
URLs ending with periods now have trailing periods removed when they appear to be sentence punctuation rather than part of the URL structure. This improves UX by preventing clicks on "Visit https://example.com." from opening "https://example.com." (with period) instead of the intended "https://example.com". The sanitization uses heuristics to distinguish sentence punctuation from legitimate URL structure: - Multiple trailing periods (..) are always removed - Single trailing periods are removed when preceded by alphanumeric characters or forward slashes - Periods within URL structure (domains, paths) are preserved Extends the existing punctuation sanitization to handle both parentheses and periods in a unified approach.
- Use single fold traversal for parentheses counting instead of two filters - Remove redundant parentheses recounting in while loop - Simplify trailing periods counting with take_while iterator - Use let-else pattern for cleaner conditional logic - Add test imports for better readability (Line, Column, Match) - Apply cargo fmt formatting Addresses all review feedback from @davewa and @probably-neb
3bce24e to
9814a43
Compare
probably-neb
approved these changes
Sep 9, 2025
Collaborator
probably-neb
left a comment
There was a problem hiding this comment.
Looks good. Thanks!
1 task
tidely
pushed a commit
to tidely/zed
that referenced
this pull request
Sep 10, 2025
…37684) Fixes zed-industries#12338, related to zed-industries#37616 This change improves URL detection in the terminal by removing trailing periods that appear to be sentence punctuation rather than part of the URL structure. It builds upon the parentheses sanitization work from zed-industries#37076 by consolidating both approaches into a unified `sanitize_url_punctuation` function. ## Changes - Combines parentheses and period sanitization into a single `sanitize_url_punctuation` function - Uses optimized single traversal with `fold()` for parentheses counting (addressing code review feedback) - Removes trailing periods using heuristics to distinguish sentence punctuation from legitimate URL components - Removes multiple trailing periods (always considered punctuation) - Removes single trailing periods when they appear after alphanumeric characters or slashes - Preserves periods that are part of legitimate URL structure (e.g., version numbers, IP addresses, subdomains) - Maintains existing parentheses balancing logic from zed-industries#37076 ## Implementation Details - **Parentheses handling**: Counts opening and closing parentheses, removes trailing `)` when unbalanced - **Period handling**: Uses `take_while()` iterator for efficient period counting - **Performance**: Single pass counting with optimized loop to avoid redundant work - **Code clarity**: Uses let-else pattern for readable conditional logic ## Testing - Added comprehensive test coverage for both parentheses and period sanitization - Tests cover balanced vs unbalanced parentheses cases - Tests cover various period scenarios including legitimate URL periods vs sentence punctuation - All existing tests continue to pass ## Release Notes - Improved terminal URL detection by further trimming trailing punctuation. URLs ending with periods (like `https://example.com.`) and unbalanced parentheses (like `https://example.com/path)`) are now properly detected without including the trailing punctuation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #12338, related to #37616
This change improves URL detection in the terminal by removing trailing periods that appear to be sentence punctuation rather than part of the URL structure. It builds upon the parentheses sanitization work from #37076 by consolidating both approaches into a unified
sanitize_url_punctuationfunction.Changes
sanitize_url_punctuationfunctionfold()for parentheses counting (addressing code review feedback)Implementation Details
)when unbalancedtake_while()iterator for efficient period countingTesting
Release Notes
https://example.com.) and unbalanced parentheses (likehttps://example.com/path)) are now properly detected without including the trailing punctuation. (#12338, #37616)