Skip to content

Bind top-most parent when importing nested module#14946

Merged
dcreager merged 31 commits intomainfrom
dcreager/module-resolution
Dec 16, 2024
Merged

Bind top-most parent when importing nested module#14946
dcreager merged 31 commits intomainfrom
dcreager/module-resolution

Conversation

@dcreager
Copy link
Member

@dcreager dcreager commented Dec 12, 2024

When importing a nested module, we were correctly creating a binding for the top-most parent, but we were binding that to the nested module, not to that parent module. Moreover, we weren't treating those submodules as members of their containing parents. This PR addresses both issues, so that nested imports work as expected.

As discussed in Slack whatever chat app I find myself in these days 😄, this requires keeping track of which modules have been imported within the current file, so that when we resolve member access on a module reference, we can see if that member has been imported as a submodule. If so, we return the submodule reference immediately, instead of checking whether the parent module's definition defines the symbol.

This is currently done in a flow insensitive manner. The SemanticIndex now tracks all of the modules that are imported (via import, not via from...import). The member access logic mentioned above currently only considers module imports in the file containing the attribute expression.

@dcreager
Copy link
Member Author

(This is still marked as draft because I still plan on adding some additional test cases, but should be code complete if folks want to take an initial look at the implementation)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 12, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@MichaReiser MichaReiser added the ty Multi-file analysis & type inference label Dec 13, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I only have a few nit comments

* main:
  [red-knot] Display definition range in trace logs (#14955)
  [red-knot] Move the `ClassBase` enum to its own submodule (#14957)
  [red-knot] mdtest: python version requirements (#14954)
  [airflow]: Import modules that has been moved to airflow providers (AIR303) (#14764)
  [red-knot] Support `typing.TYPE_CHECKING` (#14952)
  Add tracing support to mdtest (#14935)
  Re-enable the fuzzer job on PRs (#14953)
  [red-knot] Improve `match` mdtests (#14951)
  Rename `custom-typeshed-dir`, `target-version` and `current-directory` CLI options (#14930)
  [red-knot] Add narrowing for 'while' loops (#14947)
  [`ruff`]  Skip SQLModel base classes for `mutable-class-default` (`RUF012`) (#14949)
  [red-knot] Tests for 'while' loop boundness (#14944)
* main:
  [red-knot] Use `type[Unknown]` rather than `Unknown` as the fallback metaclass for invalid classes (#14961)
  [red-knot] Make `is_subtype_of` exhaustive (#14924)
  [red-knot] Alphabetize rules (#14960)
  [red-knot] Understand `Annotated` (#14950)
@dcreager dcreager marked this pull request as ready for review December 13, 2024 21:49
@carljm
Copy link
Contributor

carljm commented Dec 13, 2024

As discussed in Slack

I doubt it 😆

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely!

Comment on lines +1444 to +1445
// this module named [name], then the result is that submodule, even if the module
// also defines a (non-module) symbol with that name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize now reading this that the priority between a non-submodule package attribute and a submodule is a bit of a subtle thing.

What actually happens at runtime is that, when the submodule is imported, it writes its own name to the containing package namespace. This means that which takes priority actually depends on whether the package __init__.py itself imports the submodule before creating its own attribute of that same name, or not.

In other words, given pkg/a.py and these two versions of pkg/__init__.py:

a = 1
from . import a as _
a = 1

Then import pkg.a; print(pkg.a) will print <module 'pkg.a' ...> in the first case, but it will print 1 in the second case. Because in the first case, we first execute pkg/__init__.py fully, and only then import the submodule, overwriting the value 1 for the name "a" in the pkg namespace with the submodule. In the second case, we first import the submodule, which sets the submodule as the value for the a name, and then we continue executing __init__.py, overwriting the submodule with the value 1.

Neither mypy nor pyright model this accurately. Pyright prefers the submodule in both scenarios, and mypy prefers 1 in both scenarios.

What you have here currently would match pyright.

Is it worth trying to model this accurately? I think probably not, at least not for now. But we could put some of this context into this comment? And/or we could add some tests for the above two cases, with commentary on how they behave at runtime vs in our model?

I think the behavior you have here currently is good for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment and added a test case with commentary

fn module_ty_from_name(&self, module_name: &ModuleName) -> Option<Type<'db>> {
resolve_module(self.db, module_name).map(|module| Type::ModuleLiteral(module.file()))
resolve_module(self.db, module_name)
.map(|module| Type::ModuleLiteral(ModuleLiteralType::new(self.db, self.file, module)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize now that what I had originally imagined we would do would be to look up the submodule data from self.index here and populate the interned ModuleLiteralType with the set of its imported submodule names. This means we would store more data per ModuleLiteralType (a set of submodule names, instead of just a File), but member lookups (in particular, those which don't match a submodule) would be a bit cheaper: the initial check would just be a membership test for the attribute name in the submodule set, without needing to build a ModuleName.

But I suspect all of this is just in the noise performance-wise (and the extra storage per ModuleLiteralType may well outweigh the cheaper lookups), so I don't see any reason to switch to that approach. Just thought the alternative worth mentioning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point that this would make us construct a ModuleName every time we dereference a module. I had considered a more complex representation for ModuleName, but figured it would be better to experiment with that separately instead of trying to fold it into this. (Or better to just say this is good enough and leave it alone for now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that unless we a) are seeing a clear regression in the existing benchmark, or b) there's a very clear alternative that will definitely perform better, it's best not to spend much time on performance optimizations when we don't even know if they are meaningful in the overall performance profile. So I would suggest leaving this as-is for now.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/// module_name.push(&Name::new("bar"));
/// assert_eq!(&module_name, "foo.bar");
/// ```
pub fn push(&mut self, name: &Name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we work quite hard in the other methods of this struct to maintain the invariant that the wrapped string can only be a valid module name. I think this method breaks that invariant, since the Name node could wrap a string that constitutes an invalid identifier. (Remember that we have a parser that supports error recovery, so it can produce Name nodes that wrap invalid identifiers! There are few guarantees.)

I think to fix this we could:

  • Change the signature of this method so that it accepts a ModuleName rather than a Name -- but then it has the same signature of ModuleName::extend.
  • Verify that the string passed in is valid using ModuleName::is_valid_name, and change the return type of this method to return a Result, where an error variant is returned if you passed in a string that constitutes an invalid identifier. But then it becomes less ergonomic to use the method.

Overall I'm wondering if we need this new method, or if it might just be better to use the existing ModuleName::extend()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a great point — I had gotten it into my mind somehow that Name was similarly checked, and so appending one would be safe just like extending a ModuleName is. But it's not, and even if we added one, I believe we have to handle __getattr__ calls here, too, which aren't guaranteed to be valid identifiers.

Updated this to use extend and to check the result of ModuleName::new

dcreager and others added 10 commits December 16, 2024 10:42
Co-authored-by: Carl Meyer <carl@astral.sh>
* main: (25 commits)
  [`pydocstyle`] Skip leading whitespace for `D403` (#14963)
  Update pre-commit dependencies (#15008)
  Check diagnostic refresh support from client capability (#15014)
  Update Rust crate colored to v2.2.0 (#15010)
  Update dependency monaco-editor to v0.52.2 (#15006)
  Update Rust crate thiserror to v2.0.7 (#15005)
  Update Rust crate serde to v1.0.216 (#15004)
  Update Rust crate libc to v0.2.168 (#15003)
  Update Rust crate fern to v0.7.1 (#15002)
  Update Rust crate chrono to v0.4.39 (#15001)
  Update Rust crate bstr to v1.11.1 (#15000)
  Update NPM Development dependencies (#14999)
  Update dependency ruff to v0.8.3 (#15007)
  Pin mdformat plugins in pre-commit (#14992)
  Use stripping block (`|-`) for page descriptions (#14980)
  [`perflint`] Fix panic in `perf401` (#14971)
  Improve the documentation of E201/E202 (#14983)
  [ruff_python_ast] Add name and default functions to TypeParam. (#14964)
  [red-knot] Emit an error if a bare `Annotated` or `Literal` is used in a type expression (#14973)
  [red-knot] Fix bugs relating to assignability of dynamic `type[]` types (#14972)
  ...
* main:
  Bump zizmor pre-commit hook to the latest version and fix new warnings (#15022)
  Add `actionlint` as a pre-commit hook (with shellcheck integration) (#15021)
  Update dependency mdformat-mkdocs to v4 (#15011)
Comment on lines +2893 to +2897
/// The file in which this module was imported.
///
/// We need this in order to know which submodules should be attached to it as attributes
/// (because the submodules were also imported in this file).
pub importing_file: File,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we have multiple ModuleLiteralTypes for the same module? One for each place from which it's imported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right

Comment on lines +1450 to +1455
if importing_index.imports_module(&full_submodule_name) {
if let Some(submodule) = resolve_module(db, &full_submodule_name) {
let submodule_ty = Type::module_literal(db, importing_file, submodule);
return Symbol::Type(submodule_ty, Boundness::Bound);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a fine-tuning comment:
It could make sense to define an imported_modules(db, file) query that returns the set with all known imported modules (it would just be a thin wrapper around semantic_index(db, file).imported_modules().

The advantage of having this intermediate query is that currenlty, Type::member has to re-execute whenever the AST of importing-file changes. Salsa might to truncate any dependent queries because it can determine that the type returned by Type::member is still the same and, therefore, dependent queries don't have to re-run. However, salsa could determine this sooner if Type::member only dependent on a imported_modules query. It would then only have to re-run the imported_modules query and, if the imports are unchanged, can't stop immediately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion! I wrapped the imported_modules set in an Arc, since it seems like the set now needs to be able to outlive the containing SemanticIndex, is that right?

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks merge-ready to me!

@dcreager dcreager merged commit 4ddf922 into main Dec 16, 2024
@dcreager dcreager deleted the dcreager/module-resolution branch December 16, 2024 21:15
dcreager added a commit that referenced this pull request Dec 17, 2024
#14946 fixed our handling of nested imports with the `import` statement,
but didn't touch `from...import` statements.

cf
#14826 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants