Skip to content

Refactor URLValidationCron to schedule events to validate individual URLs #5750

@westonruter

Description

@westonruter

Feature description

As noted in #5515 (comment):

We can address this in a subsequent PR, but I have an idea for how this could be modified to not make use of sleep(). Instead of there being one single event that scans all of the URLs, we can instead register a separate event for each for each URL, and offset the time() for each event by 10 minute intervals (for example). In this way only one URL would be validated during any given cron request, and there would be no need for sleep().

In other words, this would be similar to SavePostValidationEvent in which one URL is scheduled for validation. In fact, that class could be refactored to pass the URL as the schedule argument as opposed to the $post_id. It could then be used for scheduling validation after editing a post as well as for individually re-validating the sample set of URLs.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • Only one URL is validated per process (cron run). This will prevent timeouts from occurring.

Implementation brief

  1. Create a new option to store a queue of URLs to validate (e.g. amp_url_validation_queue).
  2. Once a day (or week), the URLValidationCron service should enqueue the URLs returned by \AmpProject\AmpWP\Validation\ScannableURLProvider::get_urls() into this stored option.
  3. Register a recurring event (every 10 minutes) to dequeue a URL from the stored option. If there is no URL, then abort.
  4. Fetch the validation results for the URL and store the results.
  5. Remove Conditional from the URLValidationCron service (and thus remove the amp_temp_validation_cron_tasks_enabled filter).

QA testing instructions

Demo

Changelog entry

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions