Skip to content

Refactor validator specification integration #4566

@schlessera

Description

@schlessera

Some of the validation we are doing in our sanitizers is based on the official validator specification found in https://github.com/ampproject/amphtml.

Current Process

  1. Clone https://github.com/ampproject/amphtml repository.
  2. Use Python to assemble all protoascii files (main + extensions).
  3. Use Python to parse assembled protoascii file into a PHP class containing the spec in array form ( => https://github.com/ampproject/amp-wp/blob/1.4.4/includes/sanitizers/class-amp-allowed-tags-generated.php).
  4. Load PHP class into plugin and traverse provided arrays in several ways to power the sanitizer logic.

Problems with this process that need solving

  • The project cloned in step 1. is really big, so this uses a lot of time and bandwidth.
  • Steps 2. and 3. require Python, so this adds an entire additional ecosystem to the dependencies of the project just for the sake of parsing the spec.
  • Step 4. is an all-or-nothing approach of loading the spec. To validate a single value, the complete spec (more specifically, the subset we converted to PHP) needs to be loaded into memory.
  • Step 4. is a huge list of arrays within arrays that contain repeated strings and other values that could be normalized.
  • As step 4. is based on arrays, which are efficient on key-based retrieval but inefficient on value-based retrieval, we end up with a structure that is mostly optimized towards one pre-determined dimension, and is ill-suited to cover any other dimensions (that might even only come up in future requirements). (see Refactor generated spec data to facilitate looking up by spec name #3817 for an example)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions