Skip to content

Linter plugins: Add APIs to support sharing per-file data between rules #20700

@overlookmotel

Description

@overlookmotel

The problematic pattern

Some plugins utilize a WeakMap keyed by SourceCode object as a way to share data between different rules for the same file.

The pattern looks like this:

type FileData = /* whatever */;

const cache = new WeakMap<SourceCode, FileData>();

function getDataForFile(context: Context): FileData {
  const cachedData = cache.get(context.sourceCode);
  if (cachedData !== undefined) return cachedData;

  const data = calculateData(context);
  cache.set(context.sourceCode, data);
  return data;
}

const rule1 = {
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
};

const rule2 = {
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
};

Why this is a problem in Oxlint

ESLint creates a fresh SourceCode class instance for each file. That same object is the value of context.sourceCode in every rule running on that file, but each file gets a new SourceCode object.

Oxlint, on the other hand, has one singleton SOURCE_CODE object, which is reused for every file. All properties of SOURCE_CODE are getters, which return the right data for the file currently being linted.

This breaks the above pattern. Once the WeakMap has been populated with an entry, cache.get(context.sourceCode) will return the same data for every file. The cache never gets cleared, and every file gets the data which was calculated for the 1st file.

This problem was seen previously in eslint-plugin-panda (#16867, #19986). Now discovered again in eslint-plugin-mocha (#20611).

GitHub search suggests that eslint-plugin-prettier and SonarJS plugin are also affected (though weirdly SonarJS's tests all pass in our conformance tests). However, that search likely hasn't unearthed all the plugins that use this pattern.

Motivations for Oxlint's current design

The motivations for having a singleton SOURCE_CODE object:

  1. Oxlint aggressively recycles objects wherever possible, to reduce GC pressure.
  2. Getters are on the SOURCE_CODE object itself, not a class prototype, making (I believe) every property access slightly cheaper as JS engine doesn't have to walk up the prototype chain.
  3. Getters allow lazy computation. For example, the lines array is not built unless a rule accesses sourceCode.lines.
  4. Works well with our "alternative API" where in createOnce each rule also has the same Context object across all files.

Note that point 3 is somewhat orthogonal - we could still use lazy getters if context.sourceCode was an instance of a SourceCode class.

Why the WeakMap pattern is not ideal

WeakMap in JS is pretty slow (as is Map). It's also not the best data structure for this use case because ESLint (and also Oxlint) only lints 1 file at a time, so the WeakMap only has 1 active element at any time.

A simpler and more efficient pattern like this would work equally well:

let cachedData: FileData | null = null;

function getDataForFile(context: Context): FileData {
  if (cachedData === null) cachedData = calculateData(context);
  return cachedData;
}

// Called when a file is finished linting
function onFileFinish() {
  cachedData = null;
}

The problem is that ESLint does not offer any API to be notified when a file finishes linting, so no way to call onFileFinish at the right time.

In the absence of this API, the WeakMap pattern is the simplest option.

Why other patterns don't work

Cache keyed by file path

let cachedData: FileData | null = null;
let currentFilename: string | null = null;

function getDataForFile(context: Context): FileData {
  if (context.filename === currentFilename) return cachedData!;

  currentFilename = context.filename;
  cachedData = calculateData(context);
  return cachedData;
}

This would work in ESLint CLI, but wouldn't work in language server, because the same file can be linted over and over in quick succession.

This addition makes it work with language server:

let cachedData: FileData | null = null;
let currentFilename: string | null = null;
let resetTimerSet: boolean = false;

function getDataForFile(context: Context): FileData {
  if (context.filename === currentFilename) return cachedData!;

  currentFilename = context.filename;
  cachedData = calculateData(context);

  if (!resetTimerSet) {
    queueMicrotask(resetCache);
    resetTimerSet = true;
  }

  return cachedData;
}

function resetCache() {
  cachedData = null;
  currentFilename = null;
  resetTimerSet = false;
}

resetCache gets called in next micro-tick, and clears the cache before next file is linted, so the next lint run on same file gets a clean cache.

This depends on assumption that all language servers have a micro-tick between linting files. From my research, this appears to be a solid assumption, but it's still a little fragile.

But it still has a problem! There is one circumstance in which the same file gets linted multiple times in succession in a synchronous loop: When applying auto-fixes, ESLint will by default lint the same file up to 10 times in succession in a tight loop. There's no async break between passes, so resetCache does not get called in between. 2nd pass will get stale cache from the 1st pass, even though the code has changed.

Manually track when files finish being linted

const rule1 = createRule({
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
});

const rule2 = createRule({
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
});

function createRule(rule: Rule): Rule {
  const originalCreate = rule.create;

  rule.create = function (context: Context): Visitor {
    lintingStarted();

    const program = context.sourceCode.ast;

    const visitor = originalCreate.call(this, context);

    // Add listener for when AST traversal completes
    const originalOnCodePathEnd = visitor.onCodePathEnd;
    visitor.onCodePathEnd = originalOnCodePathEnd == null
      ? function (_codePath, node) {
        if (node === program) lintingFinished();
      }
      : function (codePath, node) {
        originalOnCodePathEnd.call(this, codePath, node);
        if (node === program) lintingFinished();
      };

    return visitor;
  };

  return rule;
}

let rulesRunningCount = 0;

function lintingStarted() {
  // Clear cache on next micro-tick
  if (rulesRunningCount === 0) queueMicrotask(resetCache);
  // Increment counter
  rulesRunningCount++;
}

function lintingFinished() {
  // Decrement counter
  rulesRunningCount--;
  // If counter reached 0, all rules have finished running on this file
  if (rulesRunningCount === 0) resetCache();
}

function resetCache() {
  cachedData = null;
  rulesRunningCount = 0;
}

Notes:

  • onCodePathEnd event on Program is the last call to the visitor for each file. It fires after Program:exit visitor.
  • Queuing resetCache on next micro-tick is required to handle case where a rule throws an error during AST traversal, in which case onCodePathEnd will not fire. The fallback handles clearing the cache in this case in language server.

This works in ESLint CLI, in language server, and when linting a file multiple times in a synchronous loop when applying auto-fixes. It would also work in Oxlint.

However, it has a major downside in Oxlint: In Oxlint we use a fast AST visitor in common case where no rules register listeners for CFG events. If any rule does register a CFG event listener, we have to use a much slower walker which passes over the AST twice. Adding an onCodePathEnd listener to the visitor will cause a major slowdown in Oxlint.

It's also a lot of fiddly code to ask plugin authors to implement, though we could wrap it in an NPM package.

Metadata

Metadata

Assignees

Labels

Priority

None yet

Start date

None yet

Target date

None yet

Effort

None yet

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions