Skip to content

Conversation

@acicovic
Copy link
Collaborator

@acicovic acicovic commented Jul 15, 2025

Description

With this PR, we're improving whitespace trimming in the get_canonical_url() function, as the previous implementation could result in double slashes when there was trailing whitespace in the canonical. We're also introducing the tests mentioned in #3525.

Motivation and context

How has this been tested?

Through the integration tests introduced.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling and normalization of canonical URL domains to ensure more consistent and accurate URL processing.
  • Tests

    • Added new integration tests to verify canonical URL handling and normalization across various domain formats.

@acicovic acicovic added this to the 3.20.5 milestone Jul 15, 2025
@acicovic acicovic self-assigned this Jul 15, 2025
@acicovic acicovic added the Changelog: Fixed PR to be added under the changelog's "Fixed" section label Jul 15, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 15, 2025

📝 Walkthrough

Walkthrough

The update revises the order of string processing in the get_canonical_url method to first trim whitespace, then remove trailing slashes, and finally strip protocol prefixes. Additionally, a new integration test class is introduced to comprehensively verify canonical URL normalization and handling within the plugin.

Changes

File(s) Change Summary
src/class-parsely.php Adjusts the order of trimming, trailing slash removal, and protocol stripping in canonical URL processing logic.
tests/Integration/CanonicalURLsTest.php Adds a new integration test class to verify canonical URL normalization, overrides, and storage behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Test as CanonicalURLsTest
    participant Parsely as class-parsely.php
    participant WP as WordPress

    Test->>WP: Set up permalink structure and options
    Test->>Parsely: Call get_canonical_url()
    Parsely->>Parsely: Trim, remove slashes, strip protocol from domain
    Parsely-->>Test: Return normalized canonical URL
    Test->>Parsely: Call set_canonical_url()
    Parsely-->>WP: Store canonical URL
    Test->>Parsely: Call get_canonical_url_from_post()
    Parsely->>WP: Retrieve post and metadata
    Parsely-->>Test: Return canonical URL for post
Loading

Possibly related PRs

  • Fix: Allow domain override in canonical URLs #3525: Refines the trimming and slash removal order in the get_canonical_url method and adds comprehensive integration tests for canonical URL handling, directly related to the normalization logic updated here.

Suggested labels

Maintenance

Suggested reviewers

  • vaurdan

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 PHPStan (2.1.17)

Note: Using configuration file /phpstan.neon.
Invalid configuration:
Unexpected item 'parameters › type_coverage'.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 988b5a1 and 6b48b3a.

📒 Files selected for processing (2)
  • src/class-parsely.php (1 hunks)
  • tests/Integration/CanonicalURLsTest.php (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{html,php}

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🧠 Learnings (1)
📓 Common learnings
Learnt from: acicovic
PR: Parsely/wp-parsely#0
File: :0-0
Timestamp: 2024-07-26T21:07:21.167Z
Learning: User: acicovic
URL: https://github.com/Parsely/wp-parsely/pull/2355

Timestamp: 2024-04-03T08:04:35.576Z
Learning: In the context of the `wp-parsely` project's documentation, bullet points are consistently capitalized. This standard should be respected in reviews and suggestions regarding document formatting.
Learnt from: acicovic
PR: Parsely/wp-parsely#0
File: :0-0
Timestamp: 2024-10-16T13:03:58.056Z
Learning: User: acicovic
URL: https://github.com/Parsely/wp-parsely/pull/2355

Timestamp: 2024-04-03T08:04:35.576Z
Learning: In the context of the `wp-parsely` project's documentation, bullet points are consistently capitalized. This standard should be respected in reviews and suggestions regarding document formatting.
Learnt from: acicovic
PR: Parsely/wp-parsely#2554
File: src/UI/class-settings-page.php:28-29
Timestamp: 2024-10-12T10:01:08.699Z
Learning: acicovic prefers to use the term "resolve" to indicate that an issue or request has been addressed satisfactorily.
Learnt from: acicovic
PR: Parsely/wp-parsely#2554
File: src/UI/class-settings-page.php:28-29
Timestamp: 2024-06-18T09:33:19.519Z
Learning: acicovic prefers to use the term "resolve" to indicate that an issue or request has been addressed satisfactorily.
🔇 Additional comments (7)
src/class-parsely.php (1)

1046-1047: LGTM! Improved processing order resolves the whitespace issue.

The change to process trimming before removing trailing slashes is logically sound and addresses the issue described in the PR. This prevents scenarios where trailing whitespace could interfere with proper slash removal.

tests/Integration/CanonicalURLsTest.php (6)

21-60: Well-structured test class with proper setup/teardown.

The test class follows WordPress testing best practices with appropriate setup and teardown methods. The permalink structure management ensures tests run in a consistent environment.


83-88: Basic functionality test is clean and focused.

The test correctly verifies that get_canonical_url() uses the Site ID as the domain when no filter is set. The assertion is clear and validates the expected behavior.


117-134: Comprehensive test for filtered canonical domain.

This test properly validates that the wp_parsely_canonical_url_domain filter works correctly. The use of a data provider ensures multiple domain formats are tested, and the filter cleanup is properly handled.


164-189: Thorough test for canonical URL retrieval from post.

This test validates both scenarios: when a post has no stored canonical URL and when it has an incorrect one. The test ensures the method returns the correct canonical URL in both cases, which is important for data integrity.


216-240: Comprehensive test for canonical URL storage.

The test properly validates that set_canonical_url() stores the normalized canonical URL, not the original input. This ensures data consistency and validates the trimming/normalization logic.


247-296: Excellent data provider covering edge cases.

The data provider comprehensively covers various domain formats including:

  • Domains with and without protocols
  • Trailing and leading whitespace scenarios
  • Multiple trailing slashes
  • Subdomain variations
  • Path variations

This directly validates the whitespace trimming fixes implemented in the main code.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@acicovic acicovic marked this pull request as ready for review July 15, 2025 06:32
@acicovic acicovic requested a review from a team as a code owner July 15, 2025 06:33
@acicovic
Copy link
Collaborator Author

SonarCloud incorrectly complains of "possible security issues" because the integration tests uses http in test strings, which is totally acceptable and in line with the canonicals used by many customers. The URLs aren't used to fetch data or perform any operations.

@acicovic
Copy link
Collaborator Author

I've marked the issues as "safe" in SonarCloud and will be merging this now.

@acicovic acicovic merged commit 4b6dbea into develop Jul 15, 2025
32 checks passed
@acicovic acicovic deleted the fix/improve-whitespace-trimming-in-get_canonical_url branch July 15, 2025 06:51
github-actions bot added a commit that referenced this pull request Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Fixed PR to be added under the changelog's "Fixed" section

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants