Bind top-most parent when importing nested module#14946
Conversation
|
(This is still marked as draft because I still plan on adding some additional test cases, but should be code complete if folks want to take an initial look at the implementation) |
|
MichaReiser
left a comment
There was a problem hiding this comment.
This is great. I only have a few nit comments
* main: [red-knot] Display definition range in trace logs (#14955) [red-knot] Move the `ClassBase` enum to its own submodule (#14957) [red-knot] mdtest: python version requirements (#14954) [airflow]: Import modules that has been moved to airflow providers (AIR303) (#14764) [red-knot] Support `typing.TYPE_CHECKING` (#14952) Add tracing support to mdtest (#14935) Re-enable the fuzzer job on PRs (#14953) [red-knot] Improve `match` mdtests (#14951) Rename `custom-typeshed-dir`, `target-version` and `current-directory` CLI options (#14930) [red-knot] Add narrowing for 'while' loops (#14947) [`ruff`] Skip SQLModel base classes for `mutable-class-default` (`RUF012`) (#14949) [red-knot] Tests for 'while' loop boundness (#14944)
I doubt it 😆 |
crates/red_knot_python_semantic/resources/mdtest/import/tracking.md
Outdated
Show resolved
Hide resolved
| // this module named [name], then the result is that submodule, even if the module | ||
| // also defines a (non-module) symbol with that name. |
There was a problem hiding this comment.
I realize now reading this that the priority between a non-submodule package attribute and a submodule is a bit of a subtle thing.
What actually happens at runtime is that, when the submodule is imported, it writes its own name to the containing package namespace. This means that which takes priority actually depends on whether the package __init__.py itself imports the submodule before creating its own attribute of that same name, or not.
In other words, given pkg/a.py and these two versions of pkg/__init__.py:
a = 1from . import a as _
a = 1Then import pkg.a; print(pkg.a) will print <module 'pkg.a' ...> in the first case, but it will print 1 in the second case. Because in the first case, we first execute pkg/__init__.py fully, and only then import the submodule, overwriting the value 1 for the name "a" in the pkg namespace with the submodule. In the second case, we first import the submodule, which sets the submodule as the value for the a name, and then we continue executing __init__.py, overwriting the submodule with the value 1.
Neither mypy nor pyright model this accurately. Pyright prefers the submodule in both scenarios, and mypy prefers 1 in both scenarios.
What you have here currently would match pyright.
Is it worth trying to model this accurately? I think probably not, at least not for now. But we could put some of this context into this comment? And/or we could add some tests for the above two cases, with commentary on how they behave at runtime vs in our model?
I think the behavior you have here currently is good for now.
There was a problem hiding this comment.
Updated the comment and added a test case with commentary
| fn module_ty_from_name(&self, module_name: &ModuleName) -> Option<Type<'db>> { | ||
| resolve_module(self.db, module_name).map(|module| Type::ModuleLiteral(module.file())) | ||
| resolve_module(self.db, module_name) | ||
| .map(|module| Type::ModuleLiteral(ModuleLiteralType::new(self.db, self.file, module))) |
There was a problem hiding this comment.
I realize now that what I had originally imagined we would do would be to look up the submodule data from self.index here and populate the interned ModuleLiteralType with the set of its imported submodule names. This means we would store more data per ModuleLiteralType (a set of submodule names, instead of just a File), but member lookups (in particular, those which don't match a submodule) would be a bit cheaper: the initial check would just be a membership test for the attribute name in the submodule set, without needing to build a ModuleName.
But I suspect all of this is just in the noise performance-wise (and the extra storage per ModuleLiteralType may well outweigh the cheaper lookups), so I don't see any reason to switch to that approach. Just thought the alternative worth mentioning.
There was a problem hiding this comment.
Yeah that's a good point that this would make us construct a ModuleName every time we dereference a module. I had considered a more complex representation for ModuleName, but figured it would be better to experiment with that separately instead of trying to fold it into this. (Or better to just say this is good enough and leave it alone for now)
There was a problem hiding this comment.
I think that unless we a) are seeing a clear regression in the existing benchmark, or b) there's a very clear alternative that will definitely perform better, it's best not to spend much time on performance optimizations when we don't even know if they are meaningful in the overall performance profile. So I would suggest leaving this as-is for now.
| /// module_name.push(&Name::new("bar")); | ||
| /// assert_eq!(&module_name, "foo.bar"); | ||
| /// ``` | ||
| pub fn push(&mut self, name: &Name) { |
There was a problem hiding this comment.
Hmm, we work quite hard in the other methods of this struct to maintain the invariant that the wrapped string can only be a valid module name. I think this method breaks that invariant, since the Name node could wrap a string that constitutes an invalid identifier. (Remember that we have a parser that supports error recovery, so it can produce Name nodes that wrap invalid identifiers! There are few guarantees.)
I think to fix this we could:
- Change the signature of this method so that it accepts a
ModuleNamerather than aName-- but then it has the same signature ofModuleName::extend. - Verify that the string passed in is valid using
ModuleName::is_valid_name, and change the return type of this method to return aResult, where an error variant is returned if you passed in a string that constitutes an invalid identifier. But then it becomes less ergonomic to use the method.
Overall I'm wondering if we need this new method, or if it might just be better to use the existing ModuleName::extend()?
There was a problem hiding this comment.
Yes that's a great point — I had gotten it into my mind somehow that Name was similarly checked, and so appending one would be safe just like extending a ModuleName is. But it's not, and even if we added one, I believe we have to handle __getattr__ calls here, too, which aren't guaranteed to be valid identifiers.
Updated this to use extend and to check the result of ModuleName::new
Co-authored-by: Carl Meyer <carl@astral.sh>
* main: (25 commits) [`pydocstyle`] Skip leading whitespace for `D403` (#14963) Update pre-commit dependencies (#15008) Check diagnostic refresh support from client capability (#15014) Update Rust crate colored to v2.2.0 (#15010) Update dependency monaco-editor to v0.52.2 (#15006) Update Rust crate thiserror to v2.0.7 (#15005) Update Rust crate serde to v1.0.216 (#15004) Update Rust crate libc to v0.2.168 (#15003) Update Rust crate fern to v0.7.1 (#15002) Update Rust crate chrono to v0.4.39 (#15001) Update Rust crate bstr to v1.11.1 (#15000) Update NPM Development dependencies (#14999) Update dependency ruff to v0.8.3 (#15007) Pin mdformat plugins in pre-commit (#14992) Use stripping block (`|-`) for page descriptions (#14980) [`perflint`] Fix panic in `perf401` (#14971) Improve the documentation of E201/E202 (#14983) [ruff_python_ast] Add name and default functions to TypeParam. (#14964) [red-knot] Emit an error if a bare `Annotated` or `Literal` is used in a type expression (#14973) [red-knot] Fix bugs relating to assignability of dynamic `type[]` types (#14972) ...
| /// The file in which this module was imported. | ||
| /// | ||
| /// We need this in order to know which submodules should be attached to it as attributes | ||
| /// (because the submodules were also imported in this file). | ||
| pub importing_file: File, |
There was a problem hiding this comment.
Does that mean we have multiple ModuleLiteralTypes for the same module? One for each place from which it's imported?
| if importing_index.imports_module(&full_submodule_name) { | ||
| if let Some(submodule) = resolve_module(db, &full_submodule_name) { | ||
| let submodule_ty = Type::module_literal(db, importing_file, submodule); | ||
| return Symbol::Type(submodule_ty, Boundness::Bound); | ||
| } | ||
| } |
There was a problem hiding this comment.
This is more of a fine-tuning comment:
It could make sense to define an imported_modules(db, file) query that returns the set with all known imported modules (it would just be a thin wrapper around semantic_index(db, file).imported_modules().
The advantage of having this intermediate query is that currenlty, Type::member has to re-execute whenever the AST of importing-file changes. Salsa might to truncate any dependent queries because it can determine that the type returned by Type::member is still the same and, therefore, dependent queries don't have to re-run. However, salsa could determine this sooner if Type::member only dependent on a imported_modules query. It would then only have to re-run the imported_modules query and, if the imports are unchanged, can't stop immediately.
There was a problem hiding this comment.
Done, thanks for the suggestion! I wrapped the imported_modules set in an Arc, since it seems like the set now needs to be able to outlive the containing SemanticIndex, is that right?
carljm
left a comment
There was a problem hiding this comment.
This looks merge-ready to me!
#14946 fixed our handling of nested imports with the `import` statement, but didn't touch `from...import` statements. cf #14826 (comment)
When importing a nested module, we were correctly creating a binding for the top-most parent, but we were binding that to the nested module, not to that parent module. Moreover, we weren't treating those submodules as members of their containing parents. This PR addresses both issues, so that nested imports work as expected.
As discussed in
Slackwhatever chat app I find myself in these days 😄, this requires keeping track of which modules have been imported within the current file, so that when we resolve member access on a module reference, we can see if that member has been imported as a submodule. If so, we return the submodule reference immediately, instead of checking whether the parent module's definition defines the symbol.This is currently done in a flow insensitive manner. The
SemanticIndexnow tracks all of the modules that are imported (viaimport, not viafrom...import). The member access logic mentioned above currently only considers module imports in the file containing the attribute expression.