Skip to content

terminal: Sanitize trailing periods in URL detection#37684

Merged
probably-neb merged 3 commits intozed-industries:mainfrom
Mearman:fix/terminal-url-periods
Sep 9, 2025
Merged

terminal: Sanitize trailing periods in URL detection#37684
probably-neb merged 3 commits intozed-industries:mainfrom
Mearman:fix/terminal-url-periods

Conversation

@Mearman
Copy link
Contributor

@Mearman Mearman commented Sep 6, 2025

Fixes #12338, related to #37616

This change improves URL detection in the terminal by removing trailing periods that appear to be sentence punctuation rather than part of the URL structure. It builds upon the parentheses sanitization work from #37076 by consolidating both approaches into a unified sanitize_url_punctuation function.

Changes

  • Combines parentheses and period sanitization into a single sanitize_url_punctuation function
  • Uses optimized single traversal with fold() for parentheses counting (addressing code review feedback)
  • Removes trailing periods using heuristics to distinguish sentence punctuation from legitimate URL components
  • Removes multiple trailing periods (always considered punctuation)
  • Removes single trailing periods when they appear after alphanumeric characters or slashes
  • Preserves periods that are part of legitimate URL structure (e.g., version numbers, IP addresses, subdomains)
  • Maintains existing parentheses balancing logic from terminal: Sanitize trailing parentheses in URL detection #37076

Implementation Details

  • Parentheses handling: Counts opening and closing parentheses, removes trailing ) when unbalanced
  • Period handling: Uses take_while() iterator for efficient period counting
  • Performance: Single pass counting with optimized loop to avoid redundant work
  • Code clarity: Uses let-else pattern for readable conditional logic

Testing

  • Added comprehensive test coverage for both parentheses and period sanitization
  • Tests cover balanced vs unbalanced parentheses cases
  • Tests cover various period scenarios including legitimate URL periods vs sentence punctuation
  • All existing tests continue to pass

Release Notes

  • Improved: Terminal URL detection now correctly handles trailing punctuation from sentence context. URLs ending with periods (like https://example.com.) and unbalanced parentheses (like https://example.com/path)) are now properly detected without including the trailing punctuation. (#12338, #37616)

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Sep 6, 2025
@MrSubidubi MrSubidubi changed the title fix(terminal): sanitize trailing periods in URL detection terminal: Sanitize trailing periods in URL detection Sep 6, 2025
@zed-industries-bot
Copy link
Contributor

zed-industries-bot commented Sep 8, 2025

Warnings
⚠️

This PR is missing release notes.

Please add a "Release Notes" section that describes the change:

Release Notes:

- Added/Fixed/Improved ...

If your change is not user-facing, you can use "N/A" for the entry:

Release Notes:

- N/A
Messages
📖

This PR includes links to the following GitHub Issues: #12338, #37616
If this PR aims to close an issue, please include a Closes #ISSUE line at the top of the PR body.

Generated by 🚫 dangerJS against 9814a43

Copy link
Collaborator

@probably-neb probably-neb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the formatting errors in CI (just need to run cargo fmt and push), and I'll be ready to merge. Curious if @davewa has any thoughts on this as well.

Copy link
Contributor

@davewa davewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! I think this is a useful heuristic to have for urls. I left a few minor suggestions.

Improve URL detection in terminal by removing unbalanced trailing
parentheses from captured URLs. This handles common cases like
markdown links [text](url) and parenthetical URLs (url) where
the closing parenthesis should not be part of the clickable link.

The sanitization logic:
- Counts opening and closing parentheses in detected URLs
- Removes trailing ')' characters when there are more closing
  than opening parentheses
- Preserves balanced parentheses in legitimate URLs like
  Wikipedia disambiguation pages

This provides better user experience by correctly handling the
majority of real-world URL contexts, despite technically making
some valid URLs unclickable (URLs that legitimately end with
unbalanced closing parentheses are extremely rare).
URLs ending with periods now have trailing periods removed when they
appear to be sentence punctuation rather than part of the URL structure.

This improves UX by preventing clicks on "Visit https://example.com."
from opening "https://example.com." (with period) instead of the intended
"https://example.com".

The sanitization uses heuristics to distinguish sentence punctuation
from legitimate URL structure:
- Multiple trailing periods (..) are always removed
- Single trailing periods are removed when preceded by alphanumeric
  characters or forward slashes
- Periods within URL structure (domains, paths) are preserved

Extends the existing punctuation sanitization to handle both parentheses
and periods in a unified approach.
- Use single fold traversal for parentheses counting instead of two filters
- Remove redundant parentheses recounting in while loop
- Simplify trailing periods counting with take_while iterator
- Use let-else pattern for cleaner conditional logic
- Add test imports for better readability (Line, Column, Match)
- Apply cargo fmt formatting

Addresses all review feedback from @davewa and @probably-neb
@Mearman Mearman force-pushed the fix/terminal-url-periods branch from 3bce24e to 9814a43 Compare September 8, 2025 21:04
@Mearman Mearman requested a review from davewa September 8, 2025 21:05
Copy link
Contributor

@davewa davewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks!

Copy link
Collaborator

@probably-neb probably-neb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@probably-neb probably-neb merged commit 0ac1752 into zed-industries:main Sep 9, 2025
23 checks passed
tidely pushed a commit to tidely/zed that referenced this pull request Sep 10, 2025
…37684)

Fixes zed-industries#12338, related to zed-industries#37616

This change improves URL detection in the terminal by removing trailing
periods that appear to be sentence punctuation rather than part of the
URL structure. It builds upon the parentheses sanitization work from
zed-industries#37076 by consolidating both approaches into a unified
`sanitize_url_punctuation` function.

## Changes
- Combines parentheses and period sanitization into a single
`sanitize_url_punctuation` function
- Uses optimized single traversal with `fold()` for parentheses counting
(addressing code review feedback)
- Removes trailing periods using heuristics to distinguish sentence
punctuation from legitimate URL components
- Removes multiple trailing periods (always considered punctuation)
- Removes single trailing periods when they appear after alphanumeric
characters or slashes
- Preserves periods that are part of legitimate URL structure (e.g.,
version numbers, IP addresses, subdomains)
- Maintains existing parentheses balancing logic from zed-industries#37076

## Implementation Details
- **Parentheses handling**: Counts opening and closing parentheses,
removes trailing `)` when unbalanced
- **Period handling**: Uses `take_while()` iterator for efficient period
counting
- **Performance**: Single pass counting with optimized loop to avoid
redundant work
- **Code clarity**: Uses let-else pattern for readable conditional logic

## Testing
- Added comprehensive test coverage for both parentheses and period
sanitization
- Tests cover balanced vs unbalanced parentheses cases  
- Tests cover various period scenarios including legitimate URL periods
vs sentence punctuation
- All existing tests continue to pass

## Release Notes

- Improved terminal URL detection by further trimming trailing
punctuation. URLs ending with periods (like
`https://example.com.`) and unbalanced parentheses (like
`https://example.com/path)`) are now properly detected without including
the trailing punctuation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cmd+click to linkify file in terminal doesn't work when there are whitespace or certain separators in the filename

4 participants