Add a PR check test to catch dead links during docs changes#9099
Add a PR check test to catch dead links during docs changes#9099kmh287 merged 40 commits intoampproject:masterfrom rsimha:2017-05-02-LinkCheck
Conversation
|
Update: The invalid link I inserted into the PR was successfully caught. However, it looks like we need to disregard localhost:8000 links as they are unlikely to work during Travis runs. See https://travis-ci.org/ampproject/amphtml/builds/228065575 Investigating... |
|
@aghassemi / @mrjoro, we now have a working test for broken links. See https://travis-ci.org/ampproject/amphtml/builds/228145555 for a sample output. This PR is now ready for review. |
build-system/pr-check.js
Outdated
| execOrDie('npm run ava'); | ||
| }, | ||
| testDocumentLinks: function(files) { | ||
| files.forEach((file) => { |
There was a problem hiding this comment.
nit: no parenthesis: forEach(file => { ...
build-system/pr-check.js
Outdated
| testDocumentLinks: function(files) { | ||
| files.forEach((file) => { | ||
| if (isDocFile(file)) { | ||
| execOrDie('build-system/test-links.sh ' + file); |
There was a problem hiding this comment.
Could we do this in JS?
- create a new file in
tasksfolder. - Use https://github.com/tcort/markdown-link-check/blob/master/markdown-link-check as reference to read file, replace localhost and then call
markdownLinkCheck()JS method instead of the binary. - call the newly created task from here similar to other tasks like
compile.
my experience with BASH has always involved some incompatibility between platforms, would be nice to avoid it.
package.json
Outdated
| "karma-sinon-chai": "1.2.0", | ||
| "lazypipe": "1.0.1", | ||
| "lolex": "1.4.0", | ||
| "markdown-link-check": "3.0.3", |
There was a problem hiding this comment.
the yarn.lock file needs updating. if you use yarn add markdown-link-check it would add it to both files.
rsimha
left a comment
There was a problem hiding this comment.
@aghassemi: I've removed the shell script and added a new gulp task called check-links. I set out to implement the link checker entirely in JS, but ended up having to duplicate too much of the original code in markdown-link-check.
I then used a hybrid approach, similar to other parts of pr-check.js, involving execOrDie. I'm still debugging an issue that has to do with node exiting before the link checking is complete, but other than that, this PR is more or less complete.
I'll pick this up tomorrow, but figured you should have another look.
build-system/pr-check.js
Outdated
| execOrDie('npm run ava'); | ||
| }, | ||
| testDocumentLinks: function(files) { | ||
| files.forEach((file) => { |
package.json
Outdated
| "karma-sinon-chai": "1.2.0", | ||
| "lazypipe": "1.0.1", | ||
| "lolex": "1.4.0", | ||
| "markdown-link-check": "3.0.3", |
aghassemi
left a comment
There was a problem hiding this comment.
Left some comments. I think a pure JS approach can be simpler, take a look and see if my proposal makes sense.
| docFiles.push(file); | ||
| } | ||
| }); | ||
| execOrDie(`${gulp} check-links --files ${docFiles.join(',')}`); |
There was a problem hiding this comment.
We already import all the functions defined in tasks/*, we should be able to make this just be:
checkLinks(docFiles);
Unless there is something I am missing?
There was a problem hiding this comment.
Good point. Done. I was following the example set in pr-check.js for the other gulp tests.
| /** | ||
| * Checks for dead links in the list of files. | ||
| */ | ||
| function checkLinks() { |
There was a problem hiding this comment.
to make my comment above work we need to:
exports.checkLinks(files) {///}
then we can get rid of the argv logic here and remove execOrDie
There was a problem hiding this comment.
The reason I inserted the argv logic is to enable the use case where someone makes a local change and wants to do "gulp check-links --files FOO.md" before committing / pushing. Removing the argv logic will break this use case.
I have added the export and removed execOrDie. Done.
There was a problem hiding this comment.
got it. makes sense. let me know when ready for final look.
There was a problem hiding this comment.
This is now ready for a final look. See https://travis-ci.org/ampproject/amphtml/builds/228915159 for a sample.
(Note that gulp has some issues where it reports the wrong time, but I'll look into that in a separate PR. BTW, this will now work locally from the command line too, like so: "gulp check-links --files dir_a/FOO.md,dir_b/BAR.md")
build-system/tasks/check-links.js
Outdated
| * @param {string} filteredMarkdownFile Path of file, relative to src root. | ||
| */ | ||
| function runLinkChecker(filteredMarkdownFile) { | ||
| var cmd = 'markdown-link-check ' + filteredMarkdownFile; |
There was a problem hiding this comment.
Let's call markdownLinkCheck directly. This way we don't need to rewrite the filtered file back to the filesystem either.
Something like (untested) :
function runLinkChecker(markdownFile) {
var filteredMarkdownContent = filterLocalhostLinks(markdownFile),
var opts = {};
opts.baseUrl = 'file://' + path.dirname(path.resolve(filepath));
markdownLinkCheck(filteredMarkdownContent, opts, function (err, results) {
results.forEach(function (result) {
if(result.status === 'dead') {
error = true;
console.log('[%s] %s', 'X', result.link);
}
});
if(error) {
console.error('\nERROR: dead links found!');
process.exit(1);
}
});
}
There was a problem hiding this comment.
Refactored and rewritten based on one of my previous commits.
build-system/tasks/check-links.js
Outdated
| function filterLocalhostLinks(markdownFile) { | ||
| var contents = fs.readFileSync(markdownFile).toString(); | ||
| var filteredontents = contents.replace(/http:\/\/localhost:8000\//g, ''); | ||
| var filteredMarkdownFile = markdownFile + '.without_localhost_links'; |
There was a problem hiding this comment.
see my comment below, if we go with that approach, we can just return filteredontents; here
build-system/tasks/check-links.js
Outdated
| markdownFiles.forEach(function(markdownFile) { | ||
| util.log('Checking links in', util.colors.magenta(files), '...'); | ||
| var filteredMarkdownFile = filterLocalhostLinks(markdownFile); | ||
| runLinkChecker(filteredMarkdownFile); |
There was a problem hiding this comment.
see my comment below, if we go with that approach this becomes:
runLinkChecker(markdownFile);
build-system/pr-check.js
Outdated
| }, | ||
| testDocumentLinks: function(files) { | ||
| let docFiles = []; | ||
| files.forEach(file => { |
There was a problem hiding this comment.
change to docFiles = files.filter(isDocFile);
build-system/tasks/check-links.js
Outdated
| */ | ||
| function checkLinksInFiles(markdownFiles) { | ||
| util.log('Checking links in', util.colors.magenta(markdownFiles), '...'); | ||
| markdownFiles.forEach(function(markdownFile) { |
There was a problem hiding this comment.
markdownFiles.forEach(runLinkChecker);
build-system/tasks/check-links.js
Outdated
| var files = ''; | ||
| if (argv.files) { | ||
| files = argv.files; | ||
| } else { |
There was a problem hiding this comment.
lets make the branching simpler by assigning early and doing an if (!files) check.
build-system/tasks/check-links.js
Outdated
| opts.baseUrl = 'file://' + path.dirname(path.resolve((markdownFile))); | ||
|
|
||
| var filteredMarkdown = filterLocalhostLinks(markdownFile); | ||
| markdownLinkCheck(filteredMarkdown, opts, function (error, results) { |
There was a problem hiding this comment.
nit, remove space between "function" and "("
here and below
|
@aghassemi, this is ready for check in. Could you take a final look, and merge this if you're satisfied? |
aghassemi
left a comment
There was a problem hiding this comment.
Looks great. Just a small request to use BlueBird (already a dependency) instead of Q (not an existing dependency)
build-system/tasks/check-links.js
Outdated
| var gulp = require('gulp-help')(require('gulp')); | ||
| var markdownLinkCheck = require('markdown-link-check'); | ||
| var path = require('path'); | ||
| var Q = require('q'); |
There was a problem hiding this comment.
we don't have q as a dependency in package.json use bluebird instead: var BBPromise = require('bluebird');
build-system/tasks/check-links.js
Outdated
| } | ||
|
|
||
| var markdownFiles = files.split(','); | ||
| deferred = Q.defer(); |
There was a problem hiding this comment.
var promise = new BBPromise(r => {
resolve = r;
});
....
return promise;
There was a problem hiding this comment.
Done, without arrow function. Gulp needs to be ES5 compatible or some machines will not be able to run "gulp test".
build-system/tasks/check-links.js
Outdated
| var Q = require('q'); | ||
| var util = require('gulp-util'); | ||
|
|
||
| var deferred; |
build-system/tasks/check-links.js
Outdated
| util.colors.magenta(markdownFile), 'are alive.'); | ||
| } | ||
| if (++filesChecked == fileCount) { | ||
| deferred.resolve(); |
The comments in this review are already addressed.
|
@aghassemi, could I ask you to merge this? (I don't have permissions yet.) |
build-system/tasks/check-links.js
Outdated
| resolve = r; | ||
| }); | ||
| fileCount = markdownFiles.length; | ||
| markdownFiles.forEach(runLinkChecker); |
There was a problem hiding this comment.
This could be done with Promise.all and a map, avoiding the global resolve.
There was a problem hiding this comment.
I tried this exact approach yesterday and it did not work, most likely because markdownLinkCheck (a third party library) calls linkCheck, yet another (asynchronous) third party library, which invokes the callback that's passed in here.
There was a problem hiding this comment.
Can you point to the commit? I'm 100% confident this will work correctly, because it'd be unbelievably taboo for an npm package to break nodeback conventions.
There was a problem hiding this comment.
I didn't commit it, since it failed when I gulp tested it locally. I can put it together once again.
| * @return {string} Contents of markdown file after filtering localhost links. | ||
| */ | ||
| function filterLocalhostLinks(markdownFile) { | ||
| var markdown = fs.readFileSync(markdownFile).toString(); |
There was a problem hiding this comment.
It's weird that a filter function does that actual reading.
There was a problem hiding this comment.
Good point. This isn't necessary any more. Moved two lines into the caller. Done.
build-system/tasks/check-links.js
Outdated
| baseUrl : 'file://' + path.dirname(path.resolve((markdownFile))) | ||
| }; | ||
|
|
||
| markdownLinkCheck(filteredMarkdown, opts, function(error, results) { |
There was a problem hiding this comment.
Converting markdownLinkCheck into a promise returning function makes it easier to deal with. BB provides .promisify to do so:
// top of file
var markdownLinkCheck = BBPromise .promisify(require('markdown-link-check'));
// here, now returns a promise
return markdownLinkCheck(filteredMarkdown, opts).then(results => {
// do stuff
});
build-system/tasks/check-links.js
Outdated
| error = true; | ||
| util.log('[%s] %s', chalk.red('✖'), result.link); | ||
| } else { | ||
| util.log('[%s] %s', chalk.green('✓'), result.link); |
There was a problem hiding this comment.
Intentional, since this will only be run in the relatively rare instance when a documentation change is made. The typical PR won't have this task run against it.
|
@jridgewell / @aghassemi, can one of you please merge this? |
build-system/tasks/check-links.js
Outdated
|
|
||
| markdownLinkCheck(filteredMarkdown, opts, function(error, results) { | ||
| return markdownLinkCheck(filteredMarkdown, opts) | ||
| .then(function(error, results) { |
There was a problem hiding this comment.
results will always be undefined here, as error will cause the promise to reject (and is thus not present in the fulfilled path). Rename error to results, and remove the second param.
There was a problem hiding this comment.
Good catch. That was the part I missed when I switched to using bluebird promises. Fixed.
build-system/tasks/check-links.js
Outdated
| // resolve = r; | ||
| // }); | ||
| // fileCount = markdownFiles.length; | ||
| markdownFiles.forEach(runLinkChecker); |
There was a problem hiding this comment.
This should be a #map, now.
build-system/tasks/check-links.js
Outdated
| var markdownFiles = files.split(','); | ||
| var linkCheckers = []; | ||
| markdownFiles.forEach(function(markdownFile) { | ||
| linkCheckers.push(runLinkChecker(markdownFile)); |
There was a problem hiding this comment.
This is a #map:
var checks = markdownFiles.map(function(file) {
return runLinkChecker(file);
});
return BBPromise.all(checks);
build-system/tasks/check-links.js
Outdated
| util.log( | ||
| util.colors.red('ERROR'), 'Dead links found in', | ||
| util.colors.magenta(markdownFile), '(please update it).'); | ||
| process.exit(1); |
There was a problem hiding this comment.
So this will fail the build after the first failed link in a single markdown file, but there may be others in other files. It'd be more helpful to check all markdown files, then fail at the end. We can do that by not going through all the errors now, but after the Promise.all call (which will receive an array of results).
There was a problem hiding this comment.
I exited immediately in keeping with the pattern in pr-check, which bails at the first error. You make a good point about being able to fix all links in all files in one shot. Done.
There was a problem hiding this comment.
Looking good at https://travis-ci.org/ampproject/amphtml/builds/229325351. Tested failure case locally. This also works from the command line as "gulp check-links --files dir_a/FOO.md,dir_b/BAR.md".
@jridgewell If there are no more comments, can you merge this?
|
@aghassemi, could you please merge this, since I don't have write access? |
|
Merged at @rsimha-amp 's request. |
|
This PR made the master red: |
|
@muxin, the error seems to suggest that there were no files in the list of files in that particular run (files was undefined, when it should have contained the list of files in the PR). Is this to be expected? And how do I see the recent builds on master to see if it's still red? |
|
The problem seems to be line 241 where the method is invoked without args. You can see "master" and other branches here: https://travis-ci.org/ampproject/amphtml/branches |
|
Fix coming up. |
|
Fix in PR #9213. |
|
Confirmed that master is past the error caused by this PR. |
This PR adds a new step to pr-check.js that checks for dead links when a docs change is made. It uses an npm command line tool called markdown-link-check.
Fixes #7946