Revisit PHP representation of AMP validator spec

The AMP validator spec is converted from protoascii into PHP via the [`bin/amphtml-update.py`](https://github.com/ampproject/amp-wp/blob/4d47a064dc7e0e833682318a4afe3428a07fc934/bin/amphtml-update.py) Python script. This file tries to only extract the information that is needed for the tag-and-attribute sanitizer, but still, it is large: 451KB—so large in fact that it crashes phpcs: https://github.com/ampproject/amp-wp/pull/2767#issuecomment-509386369. Just requiring this file incurs memory usage of +4MB. We could look at splitting up file by tag so that we only load the data for tags that are actually encountered.

Stepping back a bit further, we should also consider whether the Python script is the best way to extract the validator spec into PHP. It turns out that @fstanis has worked on making available the validator spec in JSON format: https://github.com/ampproject/amphtml/pull/22528. This would mean at the very least that we could rewrite the spec extraction logic in PHP (or even JS) instead of Python (which has the required protobuf library, though there is probably a PHP protobuf library that could have been used instead). In any case, it is much more comfortable to work with JSON than protoascii, as long as there is no loss of fidelity in the [conversion](https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-module), which  is done in [`validator/validator_gen_js.py`](https://github.com/ampproject/amphtml/blob/b66f2fa8dde074f0adadc030316e69c92c47da8d/validator/validator_gen_js.py#L899-L919):

```py
  rules = validator_pb2.ValidatorRules()
  text_format.Merge(open(specfile).read(), rules)
  out.append(json_format.MessageToJson(rules))
```

Another benefit here is this would avoid us having to download the entire amphtml repo, since the entire spec in JSON format is always available at https://cdn.ampproject.org/v0/validator.json 

This JSON file is only ~250KB as opposed to an archive export of the amphtml repo which is 100MB+.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit PHP representation of AMP validator spec #2769

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revisit PHP representation of AMP validator spec #2769

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions