Linter plugins: Add APIs to support sharing per-file data between rules

## The problematic pattern

Some plugins utilize a `WeakMap` keyed by `SourceCode` object as a way to share data between different rules for the same file.

The pattern looks like this:

```ts
type FileData = /* whatever */;

const cache = new WeakMap<SourceCode, FileData>();

function getDataForFile(context: Context): FileData {
  const cachedData = cache.get(context.sourceCode);
  if (cachedData !== undefined) return cachedData;

  const data = calculateData(context);
  cache.set(context.sourceCode, data);
  return data;
}

const rule1 = {
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
};

const rule2 = {
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
};
```

## Why this is a problem in Oxlint

ESLint creates a fresh `SourceCode` class instance for each file. That same object is the value of `context.sourceCode` in every rule running on that file, but each file gets a new `SourceCode` object.

Oxlint, on the other hand, has one singleton `SOURCE_CODE` object, which is reused for every file. All properties of `SOURCE_CODE` are getters, which return the right data for the file currently being linted.

This breaks the above pattern. Once the `WeakMap` has been populated with an entry, `cache.get(context.sourceCode)` will return the same data for *every* file. The cache never gets cleared, and every file gets the data which was calculated for the 1st file.

This problem was seen previously in `eslint-plugin-panda` (#16867, #19986). Now discovered again in `eslint-plugin-mocha` (#20611).

[GitHub search](https://github.com/search?q=WeakMap%3CSourceCode%2C&type=code) suggests that `eslint-plugin-prettier` and SonarJS plugin are also affected (though weirdly SonarJS's tests all pass in our conformance tests). However, that search likely hasn't unearthed all the plugins that use this pattern.

## Motivations for Oxlint's current design

The motivations for having a singleton `SOURCE_CODE` object:

1. Oxlint aggressively recycles objects wherever possible, to reduce GC pressure.
2. Getters are on the `SOURCE_CODE` object itself, not a class prototype, making (I believe) every property access slightly cheaper as JS engine doesn't have to walk up the prototype chain.
3. Getters allow lazy computation. For example, the lines array is not built unless a rule accesses `sourceCode.lines`.
4. Works well with our "alternative API" where in `createOnce` each rule also has the same `Context` object across all files.

Note that point 3 is somewhat orthogonal - we could still use lazy getters if `context.sourceCode` was an instance of a `SourceCode` class.

## Why the `WeakMap` pattern is not ideal

`WeakMap` in JS is pretty slow (as is `Map`). It's also not the best data structure for this use case because ESLint (and also Oxlint) only lints 1 file at a time, so the `WeakMap` only has 1 active element at any time.

A simpler and more efficient pattern like this would work equally well:

```ts
let cachedData: FileData | null = null;

function getDataForFile(context: Context): FileData {
  if (cachedData === null) cachedData = calculateData(context);
  return cachedData;
}

// Called when a file is finished linting
function onFileFinish() {
  cachedData = null;
}
```

The problem is that ESLint does not offer any API to be notified when a file finishes linting, so no way to call `onFileFinish` at the right time.

In the absence of this API, the `WeakMap` pattern is the simplest option.

## Why other patterns don't work

### Cache keyed by file path

```js
let cachedData: FileData | null = null;
let currentFilename: string | null = null;

function getDataForFile(context: Context): FileData {
  if (context.filename === currentFilename) return cachedData!;

  currentFilename = context.filename;
  cachedData = calculateData(context);
  return cachedData;
}
```

This would work in ESLint CLI, but wouldn't work in language server, because the same file can be linted over and over in quick succession.

This addition makes it work with language server:

```js
let cachedData: FileData | null = null;
let currentFilename: string | null = null;
let resetTimerSet: boolean = false;

function getDataForFile(context: Context): FileData {
  if (context.filename === currentFilename) return cachedData!;

  currentFilename = context.filename;
  cachedData = calculateData(context);

  if (!resetTimerSet) {
    queueMicrotask(resetCache);
    resetTimerSet = true;
  }

  return cachedData;
}

function resetCache() {
  cachedData = null;
  currentFilename = null;
  resetTimerSet = false;
}
```

`resetCache` gets called in next micro-tick, and clears the cache before next file is linted, so the next lint run on same file gets a clean cache.

This depends on assumption that all language servers have a micro-tick between linting files. [From my research](https://github.com/chakra-ui/eslint-plugin-panda/pull/308#discussion_r2892462230), this appears to be a solid assumption, but it's still a little fragile.

But it still has a problem! There is one circumstance in which the same file gets linted multiple times in succession in a synchronous loop: When applying auto-fixes, ESLint will by default lint the same file up to 10 times in succession in a tight loop. There's no async break between passes, so `resetCache` does not get called in between. 2nd pass will get stale cache from the 1st pass, even though the code has changed.

### Manually track when files finish being linted

```ts
const rule1 = createRule({
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
});

const rule2 = createRule({
  create(context: Context) {
    const data = getDataForFile(context);
    // ...
  },
});

function createRule(rule: Rule): Rule {
  const originalCreate = rule.create;

  rule.create = function (context: Context): Visitor {
    lintingStarted();

    const program = context.sourceCode.ast;

    const visitor = originalCreate.call(this, context);

    // Add listener for when AST traversal completes
    const originalOnCodePathEnd = visitor.onCodePathEnd;
    visitor.onCodePathEnd = originalOnCodePathEnd == null
      ? function (_codePath, node) {
        if (node === program) lintingFinished();
      }
      : function (codePath, node) {
        originalOnCodePathEnd.call(this, codePath, node);
        if (node === program) lintingFinished();
      };

    return visitor;
  };

  return rule;
}

let rulesRunningCount = 0;

function lintingStarted() {
  // Clear cache on next micro-tick
  if (rulesRunningCount === 0) queueMicrotask(resetCache);
  // Increment counter
  rulesRunningCount++;
}

function lintingFinished() {
  // Decrement counter
  rulesRunningCount--;
  // If counter reached 0, all rules have finished running on this file
  if (rulesRunningCount === 0) resetCache();
}

function resetCache() {
  cachedData = null;
  rulesRunningCount = 0;
}
```

Notes:

* `onCodePathEnd` event on `Program` is the last call to the visitor for each file. It fires *after* `Program:exit` visitor.
* Queuing `resetCache` on next micro-tick is required to handle case where a rule throws an error during AST traversal, in which case `onCodePathEnd` will not fire. The fallback handles clearing the cache in this case in language server.

This works in ESLint CLI, in language server, *and* when linting a file multiple times in a synchronous loop when applying auto-fixes. It would also work in Oxlint.

However, it has a major downside in Oxlint: In Oxlint we use a fast AST visitor in common case where no rules register listeners for CFG events. If any rule *does* register a CFG event listener, we have to use a much slower walker which passes over the AST twice. Adding an `onCodePathEnd` listener to the visitor will cause a major slowdown in Oxlint.

It's also a lot of fiddly code to ask plugin authors to implement, though we could wrap it in an NPM package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linter plugins: Add APIs to support sharing per-file data between rules #20700

The problematic pattern

Why this is a problem in Oxlint

Motivations for Oxlint's current design

Why the `WeakMap` pattern is not ideal

Why other patterns don't work

Cache keyed by file path

Manually track when files finish being linted

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Linter plugins: Add APIs to support sharing per-file data between rules #20700

Description

The problematic pattern

Why this is a problem in Oxlint

Motivations for Oxlint's current design

Why the WeakMap pattern is not ideal

Why other patterns don't work

Cache keyed by file path

Manually track when files finish being linted

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why the `WeakMap` pattern is not ideal