Filter out script links from .md files during check-links and make failures PR blocking#9552
Filter out script links from .md files during check-links and make failures PR blocking#9552rsimha merged 4 commits intoampproject:masterfrom rsimha:2017-05-25-LinkChecker
Conversation
| </tr> | ||
| <tr> | ||
| <td width="40%"><strong>Required Script</strong></td> | ||
| <td><code><script async custom-element="amp-animation" src="https://cdn.ampproject.org/v0/amp-animation-0.1.js"></script></code></td> |
There was a problem hiding this comment.
Are you referring to the escaped "<" tags in the raw .md source? I searched through all the AMP source code and found numerous instances of escaped tags. Perhaps we need to do a wholesale fix?
There was a problem hiding this comment.
That was supposed to appear as "& l t ;" (without spaces) and not as an open angle brace.
There was a problem hiding this comment.
No, you added foobar. which doesn't make sense.
There was a problem hiding this comment.
Oh, that was a test change that I didn't mean to commit :) I hadn't yet published this PR for review, and was merely testing a definitely-broken script tag link to make sure it got filtered.
I'll republish this PR without that change.
There was a problem hiding this comment.
Removed test change.
| } | ||
| var markdown = fs.readFileSync(markdownFile).toString(); | ||
| var filteredMarkdown = filterLocalhostLinks(markdown); | ||
| filteredMarkdown = filterScriptTagLinks(filteredMarkdown); |
There was a problem hiding this comment.
Would it make sense to filter all links that start with https? and don't have a https://github.com/ampproject/ prefix?
There was a problem hiding this comment.
I don't think it would work if we filtered all https:// links that are not on github.com, since there are several genuine https links. For example, https://wiki.saucelabs.com/display/DOCS/Setting+Up+Sauce+Connect+Proxy.
| * @param {string} markdown Original markdown. | ||
| * @return {string} Markdown after filtering out links in script tags. | ||
| */ | ||
| function filterScriptTagLinks(markdown) { |
There was a problem hiding this comment.
theres nothing dynamic here, just use a regexp literal
|
@cramforce, @erwinmombay, I just noticed that replacing the escaped html tags like "& l t ;" with angular braces caused the spurious failures to go away. I can send out a separate PR to clean up tags in all our md files. In the meantime, I still think it's worth merging this PR because there are md files like spec/amp-var-substitutions.md that refer to links like https://foo.com/pixel?host=AMPDOC_HOST within script tags, and we'd need to filter them out before we can block PRs. Let me know what you think. |
|
I'd not be surprised if replacing the entities breaks downstream processing of these files. @pbakaus Could you take a look. |
|
@pbakaus, here's an example file: https://raw.githubusercontent.com/ampproject/amphtml/master/extensions/amp-animation/amp-animation.md. If you see the raw file contents, there are a couple of escaped "& lt;" tags, while the corresponding > tags are unescaped. I was talking about replacing all escaped "& lt;" and "& gt;" tags in .md files with their unescaped equivalents. Let me know if you think this will break things downstream. |
|
Update: I've sent out PR #9557 to fix the inconsistent tags in all our .md files. That PR can be merged independent of this one. |
|
@rsimha-amp this will most definitely break us, at least in your example. The reason the escaping is in place in the linked example is the HTML table, that doesn't allow you to do markdown-style codeblocks within. When we import in https://github.com/ampproject/docs, we can deal with unescaped stuff in proper code blocks, but since this is arbitrary HTML, it's more tricky. |
| * @param {string} markdown Original markdown. | ||
| * @return {string} Markdown after filtering out links in script tags. | ||
| */ | ||
| function filterScriptTagLinks(markdown) { |
There was a problem hiding this comment.
This regex is too greedy.
There was a problem hiding this comment.
Nice catch. Fix sent out for review in #9596.
Fixes #9470