Skip to content

Conversation

@acicovic
Copy link
Collaborator

@acicovic acicovic commented Jul 8, 2025

Description

With this PR, we're making the domain overridable in canonical URLs. This is possible through the wp_parsely_canonical_url_domain filter. For example:

add_filter(
	'wp_parsely_canonical_url_domain',
	function () {
		return 'mydomain.com';
	}
);

This is useful for sites that have a mismatching Site ID and URL, as it will make some features operational that would otherwise break.

Motivation and context

Fixes #3523.

How has this been tested?

Locally, using manual and automated testing. Some integration tests have been written and will be made available in a future PR, as they are still being cleaned up.

Summary by CodeRabbit

  • New Features

    • Added support for overriding the canonical domain in URLs via a filter, allowing greater flexibility in domain management.
  • Improvements

    • Canonical URLs are now consistently formatted and sanitized before being saved or retrieved.
    • Improved internationalization by translating the fallback string for missing permalinks.

@acicovic acicovic added this to the 3.20.4 milestone Jul 8, 2025
@acicovic acicovic self-assigned this Jul 8, 2025
@acicovic acicovic requested a review from a team as a code owner July 8, 2025 13:09
@acicovic acicovic added the Changelog: Fixed PR to be added under the changelog's "Fixed" section label Jul 8, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 8, 2025

📝 Walkthrough

Walkthrough

The changes refactor canonical URL handling in the class by introducing a filter for domain overrides, ensuring canonical URLs are consistently normalized and sanitized when retrieved or stored, and internationalizing a fallback string. The logic now allows for dynamic domain replacement and aligns canonical URLs with the configured Site ID or a filtered domain.

Changes

File(s) Change Summary
src/class-parsely.php Refactored canonical URL logic: added domain override filter, improved domain replacement, normalized and sanitized URLs on set/get, and internationalized fallback string.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ParselyClass

    Client->>ParselyClass: set_canonical_url(post, url)
    ParselyClass->>ParselyClass: get_canonical_url(url)
    ParselyClass->>ParselyClass: sanitize_url(url)
    ParselyClass->>ParselyClass: update_post_meta(post, canonical_url)

    Client->>ParselyClass: get_canonical_url_from_post(post)
    ParselyClass->>ParselyClass: get_post_meta(post)
    ParselyClass->>ParselyClass: get_canonical_url(meta_url)
    ParselyClass-->>Client: canonical_url
Loading

Assessment against linked issues

Objective Addressed Explanation
Ensure canonical URLs are correct when Site ID and URL mismatch (#3523)
Provide a way to return correct canonical URLs when stored canonicals are incorrect (#3523)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes detected.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 PHPStan (2.1.15)

Note: Using configuration file /phpstan.neon.
Invalid configuration:
Unexpected item 'parameters › type_coverage'.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4efe9fd and e865efa.

📒 Files selected for processing (1)
  • src/class-parsely.php (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{html,php}`: "Perform a detailed review of the provided code with followin...

**/*.{html,php}: "Perform a detailed review of the provided code with following key aspects in mind:

  • Review the HTML and PHP code to ensure it is well-structured and adheres to best practices.
  • Ensure the code follows WordPress coding standards and is well-documented.
  • Confirm the code is secure and free from vulnerabilities.
  • Optimize the code for performance, removing any unnecessary elements.
  • Validate comments for accuracy, currency, and adherence to WordPress coding standards.
  • Ensure each line comment concludes with a period.
  • Verify code compatibility with the latest version of WordPress, avoiding deprecated functions or features."

⚙️ Source: CodeRabbit Configuration File

List of files the instruction was applied to:

  • src/class-parsely.php
🧠 Learnings (2)
📓 Common learnings
Learnt from: acicovic
PR: Parsely/wp-parsely#0
File: :0-0
Timestamp: 2024-10-16T13:03:58.056Z
Learning: User: acicovic
URL: https://github.com/Parsely/wp-parsely/pull/2355

Timestamp: 2024-04-03T08:04:35.576Z
Learning: In the context of the `wp-parsely` project's documentation, bullet points are consistently capitalized. This standard should be respected in reviews and suggestions regarding document formatting.
Learnt from: acicovic
PR: Parsely/wp-parsely#0
File: :0-0
Timestamp: 2024-07-26T21:07:21.167Z
Learning: User: acicovic
URL: https://github.com/Parsely/wp-parsely/pull/2355

Timestamp: 2024-04-03T08:04:35.576Z
Learning: In the context of the `wp-parsely` project's documentation, bullet points are consistently capitalized. This standard should be respected in reviews and suggestions regarding document formatting.
Learnt from: acicovic
PR: Parsely/wp-parsely#2554
File: src/UI/class-settings-page.php:28-29
Timestamp: 2024-06-18T09:33:19.519Z
Learning: acicovic prefers to use the term "resolve" to indicate that an issue or request has been addressed satisfactorily.
Learnt from: acicovic
PR: Parsely/wp-parsely#2554
File: src/UI/class-settings-page.php:28-29
Timestamp: 2024-10-12T10:01:08.699Z
Learning: acicovic prefers to use the term "resolve" to indicate that an issue or request has been addressed satisfactorily.
src/class-parsely.php (2)
Learnt from: vaurdan
PR: Parsely/wp-parsely#2507
File: src/RemoteAPI/class-referrers-post-detail-api.php:32-37
Timestamp: 2024-10-12T10:01:08.699Z
Learning: The `is_available_to_current_user` method in `class-referrers-post-detail-api.php` and potentially other similar methods now include an optional `$request` parameter to enhance flexibility while retaining backward compatibility.
Learnt from: vaurdan
PR: Parsely/wp-parsely#2507
File: src/RemoteAPI/class-referrers-post-detail-api.php:32-37
Timestamp: 2024-07-26T21:07:21.167Z
Learning: The `is_available_to_current_user` method in `class-referrers-post-detail-api.php` and potentially other similar methods now include an optional `$request` parameter to enhance flexibility while retaining backward compatibility.
🧬 Code Graph Analysis (1)
src/class-parsely.php (1)
wp-parsely.php (1)
  • get_parsely (71-77)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: PHP 8.0
  • GitHub Check: PHP 8.2
  • GitHub Check: PHP 7.4
  • GitHub Check: PHP 8.4
  • GitHub Check: PHP 8.1
  • GitHub Check: PHP 8.3
  • GitHub Check: E2E against WordPress latest
  • GitHub Check: build
  • GitHub Check: Analyze (javascript)
  • GitHub Check: Basic CS and QA checks
🔇 Additional comments (7)
src/class-parsely.php (7)

1011-1011: Good improvement: Ensures consistency in canonical URL processing.

This change ensures that stored canonical URLs are also processed through the get_canonical_url() method, applying domain override logic and maintaining consistency with the new functionality.


1017-1017: Excellent internationalization implementation.

The fallback string is now properly internationalized using the __() function with the correct text domain. This follows WordPress coding standards and ensures proper localization support.


1024-1031: Well-documented method changes.

The documentation clearly explains the new domain override functionality and includes proper version information. The description accurately reflects the method's behavior.


1037-1040: Proper implementation of WordPress filter pattern.

The filter hook wp_parsely_canonical_url_domain follows WordPress conventions and provides a clean way for developers to override the canonical domain. The null default value is appropriate.


1042-1053: Robust domain override logic with proper validation.

The implementation correctly handles domain override with proper input validation:

  • Strips trailing slashes and protocol prefixes
  • Validates that the processed domain is not empty
  • Uses wp_parse_url() for reliable URL parsing
  • Performs safe string replacement

The regex pattern #^https?://# is correctly implemented to remove protocol prefixes.


1055-1064: Improved domain replacement logic.

The updated logic is more efficient and clearer:

  • Only performs replacement when necessary (when home URL host differs from site ID)
  • Uses proper URL parsing with wp_parse_url()
  • Includes helpful inline comments explaining the logic

This is a significant improvement over the previous implementation.


1077-1084: Enhanced URL processing with proper sanitization.

The changes improve security and consistency:

  • URL is normalized through get_canonical_url() before saving
  • Proper sanitization with sanitize_url() restricting to HTTP/HTTPS schemes
  • Maintains the same return value pattern

This ensures that stored canonical URLs are always in the correct format and properly sanitized.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@acicovic acicovic merged commit 7905106 into develop Jul 8, 2025
32 checks passed
@acicovic acicovic deleted the fix/allow-domain-override-in-canonical-urls branch July 8, 2025 13:14
github-actions bot added a commit that referenced this pull request Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Fixed PR to be added under the changelog's "Fixed" section

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Engagement Boost canonical URL issues

2 participants