Improve whitespace trimming in get_canonical_url() #3534

acicovic · 2025-07-15T06:28:12Z

Description

With this PR, we're improving whitespace trimming in the get_canonical_url() function, as the previous implementation could result in double slashes when there was trailing whitespace in the canonical. We're also introducing the tests mentioned in #3525.

Motivation and context

Fix trailing whitespace edge case.
Complete and add the integration tests mentioned in Fix: Allow domain override in canonical URLs #3525.

How has this been tested?

Through the integration tests introduced.

Summary by CodeRabbit

Bug Fixes
- Improved handling and normalization of canonical URL domains to ensure more consistent and accurate URL processing.
Tests
- Added new integration tests to verify canonical URL handling and normalization across various domain formats.

coderabbitai · 2025-07-15T06:28:18Z

📝 Walkthrough

Walkthrough

The update revises the order of string processing in the get_canonical_url method to first trim whitespace, then remove trailing slashes, and finally strip protocol prefixes. Additionally, a new integration test class is introduced to comprehensively verify canonical URL normalization and handling within the plugin.

Changes

File(s)	Change Summary
src/class-parsely.php	Adjusts the order of trimming, trailing slash removal, and protocol stripping in canonical URL processing logic.
tests/Integration/CanonicalURLsTest.php	Adds a new integration test class to verify canonical URL normalization, overrides, and storage behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Test as CanonicalURLsTest
    participant Parsely as class-parsely.php
    participant WP as WordPress

    Test->>WP: Set up permalink structure and options
    Test->>Parsely: Call get_canonical_url()
    Parsely->>Parsely: Trim, remove slashes, strip protocol from domain
    Parsely-->>Test: Return normalized canonical URL
    Test->>Parsely: Call set_canonical_url()
    Parsely-->>WP: Store canonical URL
    Test->>Parsely: Call get_canonical_url_from_post()
    Parsely->>WP: Retrieve post and metadata
    Parsely-->>Test: Return canonical URL for post

Possibly related PRs

Fix: Allow domain override in canonical URLs #3525: Refines the trimming and slash removal order in the get_canonical_url method and adds comprehensive integration tests for canonical URL handling, directly related to the normalization logic updated here.

Suggested labels

Maintenance

Suggested reviewers

vaurdan

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 PHPStan (2.1.17)

Note: Using configuration file /phpstan.neon.
Invalid configuration:
Unexpected item 'parameters › type_coverage'.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 988b5a1 and 6b48b3a.

📒 Files selected for processing (2)

src/class-parsely.php (1 hunks)
tests/Integration/CanonicalURLsTest.php (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{html,php}

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🧠 Learnings (1)

📓 Common learnings

Learnt from: acicovic
PR: Parsely/wp-parsely#0
File: :0-0
Timestamp: 2024-07-26T21:07:21.167Z
Learning: User: acicovic
URL: https://github.com/Parsely/wp-parsely/pull/2355

Timestamp: 2024-04-03T08:04:35.576Z
Learning: In the context of the `wp-parsely` project's documentation, bullet points are consistently capitalized. This standard should be respected in reviews and suggestions regarding document formatting.

Learnt from: acicovic
PR: Parsely/wp-parsely#0
File: :0-0
Timestamp: 2024-10-16T13:03:58.056Z
Learning: User: acicovic
URL: https://github.com/Parsely/wp-parsely/pull/2355

Timestamp: 2024-04-03T08:04:35.576Z
Learning: In the context of the `wp-parsely` project's documentation, bullet points are consistently capitalized. This standard should be respected in reviews and suggestions regarding document formatting.

Learnt from: acicovic
PR: Parsely/wp-parsely#2554
File: src/UI/class-settings-page.php:28-29
Timestamp: 2024-10-12T10:01:08.699Z
Learning: acicovic prefers to use the term "resolve" to indicate that an issue or request has been addressed satisfactorily.

Learnt from: acicovic
PR: Parsely/wp-parsely#2554
File: src/UI/class-settings-page.php:28-29
Timestamp: 2024-06-18T09:33:19.519Z
Learning: acicovic prefers to use the term "resolve" to indicate that an issue or request has been addressed satisfactorily.

🔇 Additional comments (7)

src/class-parsely.php (1)

1046-1047: LGTM! Improved processing order resolves the whitespace issue.

The change to process trimming before removing trailing slashes is logically sound and addresses the issue described in the PR. This prevents scenarios where trailing whitespace could interfere with proper slash removal.

tests/Integration/CanonicalURLsTest.php (6)

21-60: Well-structured test class with proper setup/teardown.

The test class follows WordPress testing best practices with appropriate setup and teardown methods. The permalink structure management ensures tests run in a consistent environment.

83-88: Basic functionality test is clean and focused.

The test correctly verifies that get_canonical_url() uses the Site ID as the domain when no filter is set. The assertion is clear and validates the expected behavior.

117-134: Comprehensive test for filtered canonical domain.

This test properly validates that the wp_parsely_canonical_url_domain filter works correctly. The use of a data provider ensures multiple domain formats are tested, and the filter cleanup is properly handled.

164-189: Thorough test for canonical URL retrieval from post.

This test validates both scenarios: when a post has no stored canonical URL and when it has an incorrect one. The test ensures the method returns the correct canonical URL in both cases, which is important for data integrity.

216-240: Comprehensive test for canonical URL storage.

The test properly validates that set_canonical_url() stores the normalized canonical URL, not the original input. This ensures data consistency and validates the trimming/normalization logic.

247-296: Excellent data provider covering edge cases.

The data provider comprehensively covers various domain formats including:

Domains with and without protocols

Trailing and leading whitespace scenarios

Multiple trailing slashes

Subdomain variations

Path variations

This directly validates the whitespace trimming fixes implemented in the main code.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

acicovic · 2025-07-15T06:40:10Z

SonarCloud incorrectly complains of "possible security issues" because the integration tests uses http in test strings, which is totally acceptable and in line with the canonicals used by many customers. The URLs aren't used to fetch data or perform any operations.

acicovic · 2025-07-15T06:51:22Z

I've marked the issues as "safe" in SonarCloud and will be merging this now.

…rimming-in-get_canonical_url" (4b6dbea)

Improve whitespace trimming in get_canonical_url()

6b48b3a

acicovic added this to the 3.20.5 milestone Jul 15, 2025

acicovic self-assigned this Jul 15, 2025

acicovic added the Changelog: Fixed PR to be added under the changelog's "Fixed" section label Jul 15, 2025

acicovic marked this pull request as ready for review July 15, 2025 06:32

acicovic requested a review from a team as a code owner July 15, 2025 06:33

coderabbitai bot approved these changes Jul 15, 2025

View reviewed changes

acicovic merged commit 4b6dbea into develop Jul 15, 2025
32 checks passed

acicovic deleted the fix/improve-whitespace-trimming-in-get_canonical_url branch July 15, 2025 06:51

github-actions bot added a commit that referenced this pull request Jul 15, 2025

Build "Merge pull request #3534 from Parsely/fix/improve-whitespace-t…

8324a94

…rimming-in-get_canonical_url" (4b6dbea)

github-actions bot mentioned this pull request Jul 15, 2025

Update version number and changelog for 3.20.5 release #3536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve whitespace trimming in get_canonical_url() #3534

Improve whitespace trimming in get_canonical_url() #3534

Uh oh!

acicovic commented Jul 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 15, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

acicovic commented Jul 15, 2025

Uh oh!

acicovic commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve whitespace trimming in get_canonical_url() #3534

Improve whitespace trimming in get_canonical_url() #3534

Uh oh!

Conversation

acicovic commented Jul 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and context

How has this been tested?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

acicovic commented Jul 15, 2025

Uh oh!

acicovic commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acicovic commented Jul 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 15, 2025 •

edited

Loading