Skip to content

Add a PR check test to catch dead links during docs changes#9099

Merged
kmh287 merged 40 commits intoampproject:masterfrom
rsimha:2017-05-02-LinkCheck
May 8, 2017
Merged

Add a PR check test to catch dead links during docs changes#9099
kmh287 merged 40 commits intoampproject:masterfrom
rsimha:2017-05-02-LinkCheck

Conversation

@rsimha
Copy link
Copy Markdown
Contributor

@rsimha rsimha commented May 2, 2017

This PR adds a new step to pr-check.js that checks for dead links when a docs change is made. It uses an npm command line tool called markdown-link-check.

Fixes #7946

@rsimha rsimha requested a review from aghassemi May 2, 2017 18:53
@rsimha rsimha self-assigned this May 2, 2017
@rsimha rsimha requested a review from mrjoro May 2, 2017 18:53
@rsimha rsimha added this to the Sprint H1 May milestone May 2, 2017
@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 2, 2017

Update: The invalid link I inserted into the PR was successfully caught. However, it looks like we need to disregard localhost:8000 links as they are unlikely to work during Travis runs. See https://travis-ci.org/ampproject/amphtml/builds/228065575

Investigating...

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 2, 2017

@aghassemi / @mrjoro, we now have a working test for broken links. See https://travis-ci.org/ampproject/amphtml/builds/228145555 for a sample output.

This PR is now ready for review.

@rsimha rsimha requested a review from erwinmombay May 2, 2017 23:02
execOrDie('npm run ava');
},
testDocumentLinks: function(files) {
files.forEach((file) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no parenthesis: forEach(file => { ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

testDocumentLinks: function(files) {
files.forEach((file) => {
if (isDocFile(file)) {
execOrDie('build-system/test-links.sh ' + file);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this in JS?

my experience with BASH has always involved some incompatibility between platforms, would be nice to avoid it.

package.json Outdated
"karma-sinon-chai": "1.2.0",
"lazypipe": "1.0.1",
"lolex": "1.4.0",
"markdown-link-check": "3.0.3",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the yarn.lock file needs updating. if you use yarn add markdown-link-check it would add it to both files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Contributor Author

@rsimha rsimha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aghassemi: I've removed the shell script and added a new gulp task called check-links. I set out to implement the link checker entirely in JS, but ended up having to duplicate too much of the original code in markdown-link-check.

I then used a hybrid approach, similar to other parts of pr-check.js, involving execOrDie. I'm still debugging an issue that has to do with node exiting before the link checking is complete, but other than that, this PR is more or less complete.

I'll pick this up tomorrow, but figured you should have another look.

execOrDie('npm run ava');
},
testDocumentLinks: function(files) {
files.forEach((file) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

package.json Outdated
"karma-sinon-chai": "1.2.0",
"lazypipe": "1.0.1",
"lolex": "1.4.0",
"markdown-link-check": "3.0.3",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Contributor

@aghassemi aghassemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I think a pure JS approach can be simpler, take a look and see if my proposal makes sense.

docFiles.push(file);
}
});
execOrDie(`${gulp} check-links --files ${docFiles.join(',')}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already import all the functions defined in tasks/*, we should be able to make this just be:

checkLinks(docFiles);

Unless there is something I am missing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done. I was following the example set in pr-check.js for the other gulp tests.

/**
* Checks for dead links in the list of files.
*/
function checkLinks() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make my comment above work we need to:

exports.checkLinks(files) {///}

then we can get rid of the argv logic here and remove execOrDie

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I inserted the argv logic is to enable the use case where someone makes a local change and wants to do "gulp check-links --files FOO.md" before committing / pushing. Removing the argv logic will break this use case.

I have added the export and removed execOrDie. Done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. makes sense. let me know when ready for final look.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now ready for a final look. See https://travis-ci.org/ampproject/amphtml/builds/228915159 for a sample.

(Note that gulp has some issues where it reports the wrong time, but I'll look into that in a separate PR. BTW, this will now work locally from the command line too, like so: "gulp check-links --files dir_a/FOO.md,dir_b/BAR.md")

* @param {string} filteredMarkdownFile Path of file, relative to src root.
*/
function runLinkChecker(filteredMarkdownFile) {
var cmd = 'markdown-link-check ' + filteredMarkdownFile;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call markdownLinkCheck directly. This way we don't need to rewrite the filtered file back to the filesystem either.

Something like (untested) :

function runLinkChecker(markdownFile) {
  var filteredMarkdownContent = filterLocalhostLinks(markdownFile), 

   var opts = {};
   opts.baseUrl = 'file://' + path.dirname(path.resolve(filepath));

    markdownLinkCheck(filteredMarkdownContent, opts, function (err, results) {
        results.forEach(function (result) {
            if(result.status === 'dead') {
                error = true;
                console.log('[%s] %s', 'X', result.link);
            }
        });
        if(error) {
            console.error('\nERROR: dead links found!');
            process.exit(1);
        }
    });
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored and rewritten based on one of my previous commits.

function filterLocalhostLinks(markdownFile) {
var contents = fs.readFileSync(markdownFile).toString();
var filteredontents = contents.replace(/http:\/\/localhost:8000\//g, '');
var filteredMarkdownFile = markdownFile + '.without_localhost_links';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below, if we go with that approach, we can just return filteredontents; here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored. Done.

markdownFiles.forEach(function(markdownFile) {
util.log('Checking links in', util.colors.magenta(files), '...');
var filteredMarkdownFile = filterLocalhostLinks(markdownFile);
runLinkChecker(filteredMarkdownFile);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below, if we go with that approach this becomes:

runLinkChecker(markdownFile);

},
testDocumentLinks: function(files) {
let docFiles = [];
files.forEach(file => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to docFiles = files.filter(isDocFile);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
function checkLinksInFiles(markdownFiles) {
util.log('Checking links in', util.colors.magenta(markdownFiles), '...');
markdownFiles.forEach(function(markdownFile) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markdownFiles.forEach(runLinkChecker);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var files = '';
if (argv.files) {
files = argv.files;
} else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make the branching simpler by assigning early and doing an if (!files) check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

opts.baseUrl = 'file://' + path.dirname(path.resolve((markdownFile)));

var filteredMarkdown = filterLocalhostLinks(markdownFile);
markdownLinkCheck(filteredMarkdown, opts, function (error, results) {
Copy link
Copy Markdown
Member

@erwinmombay erwinmombay May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, remove space between "function" and "("

here and below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 5, 2017

@aghassemi, this is ready for check in. Could you take a final look, and merge this if you're satisfied?

Copy link
Copy Markdown
Contributor

@aghassemi aghassemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a small request to use BlueBird (already a dependency) instead of Q (not an existing dependency)

var gulp = require('gulp-help')(require('gulp'));
var markdownLinkCheck = require('markdown-link-check');
var path = require('path');
var Q = require('q');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have q as a dependency in package.json use bluebird instead: var BBPromise = require('bluebird');

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

var markdownFiles = files.split(',');
deferred = Q.defer();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  var promise = new BBPromise(r => {
    resolve = r;
  });

....

return promise;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, without arrow function. Gulp needs to be ES5 compatible or some machines will not be able to run "gulp test".

var Q = require('q');
var util = require('gulp-util');

var deferred;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var resolve

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

util.colors.magenta(markdownFile), 'are alive.');
}
if (++filesChecked == fileCount) {
deferred.resolve();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rsimha rsimha dismissed aghassemi’s stale review May 5, 2017 21:12

The comments in this review are already addressed.

Copy link
Copy Markdown
Contributor

@aghassemi aghassemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 5, 2017

@aghassemi, could I ask you to merge this? (I don't have permissions yet.)

resolve = r;
});
fileCount = markdownFiles.length;
markdownFiles.forEach(runLinkChecker);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done with Promise.all and a map, avoiding the global resolve.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this exact approach yesterday and it did not work, most likely because markdownLinkCheck (a third party library) calls linkCheck, yet another (asynchronous) third party library, which invokes the callback that's passed in here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to the commit? I'm 100% confident this will work correctly, because it'd be unbelievably taboo for an npm package to break nodeback conventions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't commit it, since it failed when I gulp tested it locally. I can put it together once again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest commit.

* @return {string} Contents of markdown file after filtering localhost links.
*/
function filterLocalhostLinks(markdownFile) {
var markdown = fs.readFileSync(markdownFile).toString();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that a filter function does that actual reading.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This isn't necessary any more. Moved two lines into the caller. Done.

baseUrl : 'file://' + path.dirname(path.resolve((markdownFile)))
};

markdownLinkCheck(filteredMarkdown, opts, function(error, results) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting markdownLinkCheck into a promise returning function makes it easier to deal with. BB provides .promisify to do so:

// top of file
var markdownLinkCheck = BBPromise .promisify(require('markdown-link-check'));

// here, now returns a promise
return markdownLinkCheck(filteredMarkdown, opts).then(results => {
  // do stuff
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

error = true;
util.log('[%s] %s', chalk.red('✖'), result.link);
} else {
util.log('[%s] %s', chalk.green('✓'), result.link);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll be noisy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, since this will only be run in the relatively rare instance when a documentation change is made. The typical PR won't have this task run against it.

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 5, 2017

@jridgewell / @aghassemi, can one of you please merge this?


markdownLinkCheck(filteredMarkdown, opts, function(error, results) {
return markdownLinkCheck(filteredMarkdown, opts)
.then(function(error, results) {
Copy link
Copy Markdown
Contributor

@jridgewell jridgewell May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results will always be undefined here, as error will cause the promise to reject (and is thus not present in the fulfilled path). Rename error to results, and remove the second param.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That was the part I missed when I switched to using bluebird promises. Fixed.

// resolve = r;
// });
// fileCount = markdownFiles.length;
markdownFiles.forEach(runLinkChecker);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a #map, now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

var markdownFiles = files.split(',');
var linkCheckers = [];
markdownFiles.forEach(function(markdownFile) {
linkCheckers.push(runLinkChecker(markdownFile));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a #map:

var checks = markdownFiles.map(function(file) {
  return runLinkChecker(file);
});
return BBPromise.all(checks);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

util.log(
util.colors.red('ERROR'), 'Dead links found in',
util.colors.magenta(markdownFile), '(please update it).');
process.exit(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will fail the build after the first failed link in a single markdown file, but there may be others in other files. It'd be more helpful to check all markdown files, then fail at the end. We can do that by not going through all the errors now, but after the Promise.all call (which will receive an array of results).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I exited immediately in keeping with the pattern in pr-check, which bails at the first error. You make a good point about being able to fix all links in all files in one shot. Done.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good at https://travis-ci.org/ampproject/amphtml/builds/229325351. Tested failure case locally. This also works from the command line as "gulp check-links --files dir_a/FOO.md,dir_b/BAR.md".

@jridgewell If there are no more comments, can you merge this?

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 8, 2017

@aghassemi, could you please merge this, since I don't have write access?

@kmh287 kmh287 merged commit 14c6c14 into ampproject:master May 8, 2017
@kmh287
Copy link
Copy Markdown
Contributor

kmh287 commented May 8, 2017

Merged at @rsimha-amp 's request.

@rsimha rsimha deleted the 2017-05-02-LinkCheck branch May 8, 2017 15:21
@muxin
Copy link
Copy Markdown
Contributor

muxin commented May 8, 2017

This PR made the master red:

pr-check.js: Done running npm run ava. Total time: 0m 3s.
/home/travis/build/ampproject/amphtml/build-system/pr-check.js:205
    let docFiles = files.filter(isDocFile);
                        ^
TypeError: Cannot read property 'filter' of undefined
    at Object.testDocumentLinks (/home/travis/build/ampproject/amphtml/build-system/pr-check.js:205:25)
    at runAllCommands (/home/travis/build/ampproject/amphtml/build-system/pr-check.js:241:11)
    at main (/home/travis/build/ampproject/amphtml/build-system/pr-check.js:260:5)
    at Object.<anonymous> (/home/travis/build/ampproject/amphtml/build-system/pr-check.js:339:14)
    at Module._compile (module.js:571:32)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:488:32)
    at tryModuleLoad (module.js:447:12)
    at Function.Module._load (module.js:439:3)
    at Module.runMain (module.js:605:10)

https://travis-ci.org/ampproject/amphtml/builds/230000578

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 8, 2017

@muxin, the error seems to suggest that there were no files in the list of files in that particular run (files was undefined, when it should have contained the list of files in the PR). Is this to be expected? And how do I see the recent builds on master to see if it's still red?

@dreamofabear
Copy link
Copy Markdown

The problem seems to be line 241 where the method is invoked without args.

You can see "master" and other branches here: https://travis-ci.org/ampproject/amphtml/branches

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 8, 2017

Fix coming up.

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 8, 2017

Fix in PR #9213.

@rsimha
Copy link
Copy Markdown
Contributor Author

rsimha commented May 8, 2017

Confirmed that master is past the error caused by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants