Skip to content

fix: _removeFromScope#14430

Draft
liuxingbaoyu wants to merge 4 commits intobabel:mainfrom
liuxingbaoyu:fix-_removeFromScope
Draft

fix: _removeFromScope#14430
liuxingbaoyu wants to merge 4 commits intobabel:mainfrom
liuxingbaoyu:fix-_removeFromScope

Conversation

@liuxingbaoyu
Copy link
Member

@liuxingbaoyu liuxingbaoyu commented Apr 7, 2022

Q                       A
Fixed Issues? Fixes #14429
Patch: Bug Fix?
Major: Breaking Change? ×
Minor: New Feature? ×
Tests Added + Pass?
Documentation PR Link
Any Dependency Changes? ×
License MIT

@babel-bot
Copy link
Collaborator

babel-bot commented Apr 7, 2022

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/51643/

}

export function _removeFromScope(this: NodePath) {
const bindings = this.getBindingIdentifiers();
Copy link
Contributor

@JLHwung JLHwung Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally this issue should have been fixed in getBindingIdentifiers. An assignment expression does not introduce new bindings and therefore it should not contain any binding identifiers: In the spec the AssignmentPattern production is resolved to IdentifierReference:

https://tc39.es/ecma262/#sec-destructuring-assignment

However, practically the getBindingIdentifiers is used to retrieve identifier references / binding identifiers in a destructuring pattern so unless we come up with a getBindingAndReferenceIdentifiers, we may have to live with that.

The assignment expression is registered as a constant violation in the binding info, so to fix this issue, we can check whether this is an AssignmentExpression, if so,
instead of removing the binding, we should check the binding info and remove the registered constantViolations.

In other words, in your example posted in #14429.

var traverse = require("@babel/traverse").default;
var parser = require("@babel/parser");

var ast = parser.parse('var a=1;a=2;');
traverse(ast, {
  AssignmentExpression(path) {
    console.log(path.scope.getBinding("a"));
    path.remove();
    console.log(path.scope.getBinding("a"));
  }
});

The expected output after removing the assignment should be

Binding {identifier: Node, scope: Scope, path: NodePath, kind: 'var', constantViolations: Array(0), …}

And we should also add a new test case for ForXStatement:

var a = 1;
for (a of []) {} // <-- removing this should not remove the binding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great point! It should work now!

@liuxingbaoyu liuxingbaoyu force-pushed the fix-_removeFromScope branch from e9a0fd0 to b1379bb Compare April 7, 2022 17:51
@nicolo-ribaudo
Copy link
Member

Does this also fix #14413?

@liuxingbaoyu
Copy link
Member Author

liuxingbaoyu commented Apr 8, 2022

It seems not.

The two tests you proposed at #14413 (comment) now pass, but #14413 problem still exists.

REPL

 
edited:

#14413 (comment) This test also seems to be fixed

REPL
 

edited:

Oops, I forgot it was a separate repository

https://github.com/facebook/regenerator

We need to wait for it to have this pr to test.....

@liuxingbaoyu
Copy link
Member Author

liuxingbaoyu commented Apr 8, 2022

New discoveries!

Because I think this is also a bug, so in 5c70514 removed code reserved for compatibility.

But now when I'm researching the issue mentioned by @nicolo-ribaudo ,I see what this code does.

Please see #2101

And the reason for #14413 is also this code.

Please share your opinions!

@liuxingbaoyu
Copy link
Member Author

Are there any updates?

Can it be merged first?

Although it still has bugs in its handling of uids, there is currently no better solution, and the previous bug was more serious.

Before this, the uids policy was eager deletion, now it becomes passive deletion.

It used to affect all plugins using generateUid, but now it only partially affects hasBinding.

We can also add the uids strategy to BABEL_8_BREAKING.

Copy link
Contributor

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incremental scope info update algorithm presented in this PR has O(NBP) complexity, where N is number of nodes in the this NodePath, B is the number of resolvable bindings in current scope and P is max(#referencePath, #constantViolations).

It can be slow when this is a large AST tree. We can improve the algorithm to O(DBP), where D is the maximum depth of NodePath under the closest Program NodePath.

  1. Construct a set NR from the ancestry NodePaths of this up to the closest Program node.
  2. Construct a set R including only this
  3. Construct a list SA from this.scope and its ancestry scopes.
  4. For each scope S in SA
    4.1. For every binding B of this.scope
    4.1.1 Remove binding if S === this.scope and B.identifier === this.node
    4.1.2 for each cv of B.constantViolations
    if pathIsThisDescendant(cv, NR, R) returns true, then remove cv
    4.1.3 for each rp of B.referencePaths
    if pathIsThisDescendant(rp, NR, R) returns true, then remove rp

The subroutine pathIsThisDescendant(path, NR, R):

  1. Construct an empty list p
  2. Loop
    2.1 let parent = path.parentPath
    2.2 if NR has parent then
    add p to NR
    return false
    2.3 if R has parent then
    add p to R
    return true
    2.4 add parent to p
    2.5 assign parent to path

@liuxingbaoyu
Copy link
Member Author

Amazing algorithm! This will improve performance a lot!

4.1.1 Remove binding if S === this.scope and B.identifier === this.node

It looks like the bindings registered in the child nodes are being missed here, maybe we should have NodePath for each binding.

  registerBinding(
    kind: Binding["kind"],
    path: NodePath,
    bindingPath: NodePath = path,
  ) {

@JLHwung
Copy link
Contributor

JLHwung commented Apr 23, 2022

It looks like the bindings registered in the child nodes are being missed here

Good catch. In that case we can perform pathIsThisDescendant(binding.path, NR, R). I think we can further skip this check for let/const block-scoped bindings.

maybe we should have NodePath for each binding.

In this case binding.path should be sufficient since it is always the path/parentPath of binding.identifier.

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 26, 2022

@liuxingbaoyu would you like me to take a crack at implementing @JLHwung's algorithm? I've read through this change and understand his comments. I really want to get scope updates during replacement fixed so that my astx tool I'm building will work well.

@liuxingbaoyu
Copy link
Member Author

I've actually implemented it, but seem to be running into weird performance issues when testing in my personal project.
Later, I started to do other things that caused this to not be updated for a long time. I also want to say sorry to @JLHwung .

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 26, 2022

@liuxingbaoyu @JLHwung

I'm confused by this:

  1. For each scope S in SA
    4.1. For every binding B of this.scope
    4.1.1 Remove binding if S === this.scope and B.identifier === this.node

Isn't this equivalent to

  1. For every binding B of this.scope
    4.1. Remove binding if this.scope in SA (always true...) and B.identifier === this.node

Seems like you meant to go each binding in SA and see if the binding path is a descendant of this path?

Depending on the amount of bindings in ancestor scopes vs the amount of descendants of this path, it might be more efficient to traverse all identifier descendants of this path and see if a given identifier path is a binding path in SA.

@JLHwung
Copy link
Contributor

JLHwung commented Aug 26, 2022

@jedwards1211 Thank you for your efforts on tackling the issue.

Seems like you meant to go each binding in SA and see if the binding path is a descendant of this path?

Yes, I think 4.1 should have been "For every binding B of S", so they add up to:

For each scope S in SA
4.1. For every binding B of S
4.1.1 Remove binding B if S === this.scope and B.identifier === this.node
4.1.2 for each cv of B.constantViolations
if pathIsThisDescendant(cv, NR, R) returns true, then remove cv from B.constantViolations
4.1.3 for each rp of B.referencePaths
if pathIsThisDescendant(rp, NR, R) returns true, then remove rp from B.referencePaths

If S is this.scope, the task is straightforward. For other scopes, we have to update the referencePaths and constantViolations due to the AST nodes removal.

Depending on the amount of bindings in ancestor scopes vs the amount of descendants of this path, it might be more efficient to traverse all identifier descendants of this path and see if a given identifier path is a binding path in SA.

Yes if AST node is small and the removed node is deeply nested. Note that the check "if a given identifier path is a binding path in SA" still requires going through ancestry scopes and a linear search within its bindings. Besides bindingPath, we also need taking care of constantViolations and referencePaths.

Presumably we don't know the size of ast node without traversing. It could be as small as an identifier, or as big as a React component. I guess either would be fine, after all it should be still faster than scope.crawl().

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 26, 2022

@JLHwung gotcha about the linear search through ancestry scopes in that check.

Do we need to look at ancestor scopes/paths above the closest enclosing function? I don't understand how any descendant could bind in a scope above that, unless in non-strict mode Babel is intended to treat the first assignment of an undeclared identifier as a binding?

Edit: nevermind, now I understand the constant violations and reference paths could exist in any descendant scope, even if the binding is above a function scope

Also, if the path is the root of its own scope, we can exclude that scope from the list that we're checking right?

@liuxingbaoyu
Copy link
Member Author

liuxingbaoyu commented Aug 26, 2022

To be honest I can't remember.😰
I just did a rebase and noticed a lot of new weird errors.
In addition, I remembered the problems I encountered before. The new implementation may take dozens of times of time in some extreme scenarios, so I don't know what to do.
Maybe we need to keep the two-way reference, but I think that would be a big project.

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 26, 2022

And if I understand correctly, instead of

4.1.1 Remove binding B if S === this.scope and B.identifier === this.node

(in which case it would be pointless to iterate bindings of ancestor scopes)

It should be

4.1.1 Remove binding if pathIsThisDescendant(binding.path, NR, R)

Right? (Is this the case where a var is declared inside an if/loop block etc?)

@liuxingbaoyu
Copy link
Member Author

#14430 (comment)

I guess yes.

@jedwards1211
Copy link
Contributor

Definitely need to test the fact that var declarations can bind in parent scopes:

const { parse } = require('@babel/parser')
const traverse = require('@babel/traverse').default

const ast = parse(`
  if (true) {
    var a
    let b
  }
`)

traverse(ast, {
  BlockStatement(path) {
    const aBinding = path.scope.getBinding('a')
    const bBinding = path.scope.getBinding('b')
    console.log(aBinding.scope === bBinding.scope) // false
  },
})

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 27, 2022

@JLHwung

In this case binding.path should be sufficient since it is always the path/parentPath of binding.identifier.

Unfortunately not, as I'm finding while debugging failed tests...

const [a, [{b, ...c}], {d, ...e}, [{ f, ...g}, {h: [i, {j, ...k}] }]] = x;

For all of the bindings here, binding.path is the whole entire VariableDeclarator. (Is that a bug?)

For example, when babel-plugin-proposal-object-rest-spread is trying to replace {b, ...c} with a uid reference:
Ultimately we need to remove the c binding, but first we need to determine if binding.identifier is a descendant of {b, ...c}...but we don't have a path to that identifier; we only have references to the node, the path being removed ({b, ...c}, about ~4 levels above c) and the path to the VariableDeclarator (many levels above). So we can't use your pathIsThisDescendant algorithm...

I think @liuxingbaoyu is right that having a binding.identifierPath would work best.
If we're not going to do that, the only reasonable option seems to be to traverse all identifiers in the node being removed, look up their names in the scope, and remove the identifiers that match.

I don't have any experience with the const violations and reference paths yet. I assume a const violation path always points to an AssignmentExpression, and reference paths always point to Identifier nodes that reference the binding?

@jedwards1211
Copy link
Contributor

I guess I could traverse the removed node to build a set of all identifier nodes in it, and use that to determine if each binding is a descendant of the removed node... I wouldn't have to traverse inside nested functions since nothing inside them could bind at a higher scope.

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 27, 2022

Okay here's my plan:

  1. Build a set of potentially binding identifier nodes DN inside the node being removed:
    1a. During traversal, we can skip a lot of decendants that can't possibly contain bindings in an ancestor scope:
    • Classes (but add id of declarations)
    • Interfaces (but add id)
    • Functions (but add id of declarations)
    • Type Annotations
    • All Expressions? I could be wrong about this, but I can't think how they could contain a binding in an ancestor scope
  2. For each ancestor scope S (including current scope):
    2a. Remove any binding B in S where DN.has(B.identifier)
    2b. Remove any const violation CV in S where DN.has(CV.node)
    2c. Remove any reference path RP in S where DN.has(RP.node)

If const violation paths don't necessarily point to identifiers, I'll have to also build a set of identifier path inside the current node and use the pathIsThisDescendant algorithm on them.

Thanks to the set DN, step 2 is just O(number of ancestor scopes * max(number of bindings/const violations/reference paths per scope)).

Maybe for Babel 8 we could make a breaking change to require registering bindings with a path to the identifier instead of the node?

@liuxingbaoyu
Copy link
Member Author

I don't have any experience with the const violations and reference paths yet. I assume a const violation path always points to an AssignmentExpression, and reference paths always point to Identifier nodes that reference the binding?

Presumably so, since current citations are not always reliable.

I guess I could traverse the removed node to build a set of all identifier nodes in it, and use that to determine if each binding is a descendant of the removed node... I wouldn't have to traverse inside nested functions since nothing inside them could bind at a higher scope.

But maybe there are identifiers that refer to outside bindings?

Maybe for Babel 8 we could make a breaking change to require registering bindings with a path to the identifier instead of the node?

Yes, that's probably the only good way. But we need to make sure the path is always reliable first, and I'm not sure about that at the moment.😭

@jedwards1211
Copy link
Contributor

Oh that'strue about references. Maybe pathIsThisDescendant will be sufficient for const violations and reference paths

@jedwards1211
Copy link
Contributor

This is starting to seem like an intractable mess :(

There are problems like this:

const f = ({ a = get(), b, c, ...z }) => {
  const v = b + 3;
};

First babel-plugin-proposal-object-rest-spread unshifts const { a = get(), b, c, ...z } = _ref into the body. That updates the scope bindings for a, b, c , and z...but the binding.identifier nodes are the same objects that came from the function param. Meaning that when the function param gets replaced and _removeFromScope() runs, my binding.indentifier set lookup concludes the identifiers are inside the function param, and removes them from scope.

Now there are no bindings for a, b, c on scope at all, and removeUnusedExcludedKeys crashes because it expects them to still be present.

I think having binding.identifierPath is our only hope.

And even then I worry that the way scope is maintained is too inconsistent. For example:

     if (container) {
        container.push(declar); // this doesn't crawl and update scope bindings
      } else {
        parentPath.ensureBlock();
        parentPath.get("body").unshiftContainer("body", declar); // but this does
      }

@jedwards1211
Copy link
Contributor

Wait though...if it's true what I'm saying about the plugin inserting the new declaration before removing the function param, it should cause duplicate binding errors...need to dig harder into what's really going on...

@liuxingbaoyu
Copy link
Member Author

I noticed that you are writing astx.
To be honest, probably doing a simple process in your project is a better option.
Because a full detection would increase the time dozens of times in extreme cases, this may be unacceptable for babel.
In addition, there are currently a large number of non-standard plugin usage, which may also cause compatibility problems.
I really like the work you've done on this, but it's probably a lot harder than you expect, so here's a quick reminder.😃

the binding.identifier nodes are the same objects that came from the function param.

Are they the same node object? Maybe you can modify babel-plugin-proposal-object-rest-spread to update it manually? Plugins don't necessarily have fully spec behavior.

And even then I worry that the way scope is maintained is too inconsistent. For example:

You're right, it would be more difficult.

@jedwards1211
Copy link
Contributor

Okay no this plugin code is definitely flawed:

    if (paramPath.isObjectPattern() && hasRestElement(paramPath)) {
      const uid = parentPath.scope.generateUidIdentifier("ref");

      const declar = t.variableDeclaration("let", [
        t.variableDeclarator(paramPath.node, uid),
      ]);

      if (container) {
        container.push(declar);
      } else {
        parentPath.ensureBlock();
        parentPath.get("body").unshiftContainer("body", declar);
      }
      paramPath.replaceWith(t.cloneNode(uid));
    }

Between the unshiftContainer and the replaceWith, the AST is temporarily in this bad state:

const f = ({ a = get(), b, c, ...z } /* about to get replaced with _ref */) => {
  const { a = get(), b, c, ...z } = _ref
  const v = b + 3;
};

I don't know why duplicate declaration errors aren't getting triggered. But if there's any hope of managing scope properly, one of the cardinal rules has to be to never temporarily insert duplicate bindings into the AST...instead plugins need to remove or replace the old bindings first...

@liuxingbaoyu
Copy link
Member Author

I don't know why duplicate declaration errors aren't getting triggered.

The binding seems to need to be registered manually, if the plugin doesn't register it manually, it won't be detected.

@jedwards1211
Copy link
Contributor

@liuxingbaoyu well unshiftContainer can trigger a scope crawl, right? But I think in this case the scope was already initialized

@liuxingbaoyu
Copy link
Member Author

@liuxingbaoyu well unshiftContainer can trigger a scope crawl, right? But I think in this case the scope was already initialized

It seems so, kind of weird.

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 27, 2022

To be honest, probably doing a simple process in your project is a better option

Yeah that's what I'm leaning toward now.
I could still finish this PR, it's just that a lot of plugins just don't work correctly if I make replaceWith automatically remove from scope (which I don't have to include in this PR anyway). The architecture seems kind of schizophrenic because remove manipulates scope, and a lot of insertion methods manipulate scope, but it's seemingly done manually in a lot of replaceWith cases...

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 27, 2022

@JLHwung @liuxingbaoyu one good way to catch a lot of existing bugs would be to have all the tests compare the final bindings of the scopes to ones built from scratch by recrawling the final AST

@liuxingbaoyu
Copy link
Member Author

The architecture seems kind of schizophrenic because remove manipulates scope, and a lot of insertion methods manipulate scope, but it's seemingly done manually in a lot of replaceWith cases...

Yes, babel is so free that there are countless practical use cases.

one good way to catch a lot of existing bugs would be to have all the tests compare the final bindings of the scopes to ones built from scratch by recrawling the final AST

Yes, that is correct, currently we have a simple test to make sure the binding is not lost. There are probably many 3rd party plugins that don't ensure this though, so how to start making changes can be a headache.😓

@jedwards1211
Copy link
Contributor

jedwards1211 commented Aug 28, 2022

I meant more in terms of dangling bindings that should have gotten removed. But I've realized, most of babel's transpiling doesn't need to remove any bindings on scope. It mostly just needs to introduce intermediary variables. So I guess that's why there hasn't been a great need for a more systematic way of managing removal.

I just realized another thing none of us has considered - if a binding that shadows another gets removed, we'd also need to transfer its const violations and reference paths to the un-shadowed binding

@liuxingbaoyu
Copy link
Member Author

liuxingbaoyu commented Aug 28, 2022

I meant more in terms of dangling bindings that should have gotten removed. But I've realized, most of babel's transpiling doesn't need to remove any bindings on scope. It mostly just needs to introduce intermediary variables. So I guess that's why there hasn't been a great need for a more systematic way of managing removal.

Yes, actually converting JS to a compatible version doesn't require a particularly robust analysis system.

I don't know if you have come across Container is falsy, which can be common in complex operations and hard to debug.😓

I just realized another thing none of us has considered - if a binding that shadows another gets removed, we'd also need to transfer its const violations and reference paths to the un-shadowed binding

Amazing idea, I really hadn't thought of this. Unfortunately this may also have compatibility implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: path.remove() removes var binding by mistake

5 participants