Skip to content

Filter out script links from .md files during check-links and make failures PR blocking#9552

Merged
rsimha merged 4 commits intoampproject:masterfrom
rsimha:2017-05-25-LinkChecker
May 25, 2017
Merged

Filter out script links from .md files during check-links and make failures PR blocking#9552
rsimha merged 4 commits intoampproject:masterfrom
rsimha:2017-05-25-LinkChecker

Conversation

@rsimha
Copy link
Copy Markdown
Contributor

@rsimha rsimha commented May 25, 2017

Fixes #9470

</tr>
<tr>
<td width="40%"><strong>Required Script</strong></td>
<td><code>&lt;script async custom-element="amp-animation" src="https://cdn.ampproject.org/v0/amp-animation-0.1.js">&lt;/script></code></td>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the escaped "<" tags in the raw .md source? I searched through all the AMP source code and found numerous instances of escaped tags. Perhaps we need to do a wholesale fix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was supposed to appear as "& l t ;" (without spaces) and not as an open angle brace.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you added foobar. which doesn't make sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was a test change that I didn't mean to commit :) I hadn't yet published this PR for review, and was merely testing a definitely-broken script tag link to make sure it got filtered.

I'll republish this PR without that change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed test change.

}
var markdown = fs.readFileSync(markdownFile).toString();
var filteredMarkdown = filterLocalhostLinks(markdown);
filteredMarkdown = filterScriptTagLinks(filteredMarkdown);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to filter all links that start with https? and don't have a https://github.com/ampproject/ prefix?

Copy link
Copy Markdown
Contributor Author

@rsimha rsimha May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would work if we filtered all https:// links that are not on github.com, since there are several genuine https links. For example, https://wiki.saucelabs.com/display/DOCS/Setting+Up+Sauce+Connect+Proxy.

* @param {string} markdown Original markdown.
* @return {string} Markdown after filtering out links in script tags.
*/
function filterScriptTagLinks(markdown) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres nothing dynamic here, just use a regexp literal

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rsimha rsimha self-assigned this May 25, 2017
@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 25, 2017

@cramforce, @erwinmombay, I just noticed that replacing the escaped html tags like "& l t ;" with angular braces caused the spurious failures to go away. I can send out a separate PR to clean up tags in all our md files.

In the meantime, I still think it's worth merging this PR because there are md files like spec/amp-var-substitutions.md that refer to links like https://foo.com/pixel?host=AMPDOC_HOST within script tags, and we'd need to filter them out before we can block PRs.

Let me know what you think.

@cramforce
Copy link
Copy Markdown
Member

I'd not be surprised if replacing the entities breaks downstream processing of these files. @pbakaus Could you take a look.

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 25, 2017

@pbakaus, here's an example file: https://raw.githubusercontent.com/ampproject/amphtml/master/extensions/amp-animation/amp-animation.md. If you see the raw file contents, there are a couple of escaped "& lt;" tags, while the corresponding > tags are unescaped.

I was talking about replacing all escaped "& lt;" and "& gt;" tags in .md files with their unescaped equivalents. Let me know if you think this will break things downstream.

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 25, 2017

Update: I've sent out PR #9557 to fix the inconsistent tags in all our .md files. That PR can be merged independent of this one.

@pbakaus
Copy link
Copy Markdown
Contributor

pbakaus commented May 25, 2017

@rsimha-amp this will most definitely break us, at least in your example. The reason the escaping is in place in the linked example is the HTML table, that doesn't allow you to do markdown-style codeblocks within. When we import in https://github.com/ampproject/docs, we can deal with unescaped stuff in proper code blocks, but since this is arbitrary HTML, it's more tricky.

@rsimha rsimha merged commit 5ae9d8f into ampproject:master May 25, 2017
* @param {string} markdown Original markdown.
* @return {string} Markdown after filtering out links in script tags.
*/
function filterScriptTagLinks(markdown) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is too greedy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Fix sent out for review in #9596.

@rsimha rsimha deleted the 2017-05-25-LinkChecker branch May 26, 2017 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants