Bind top-most parent when importing nested module by dcreager · Pull Request #14946 · astral-sh/ruff

dcreager · 2024-12-12T20:37:25Z

When importing a nested module, we were correctly creating a binding for the top-most parent, but we were binding that to the nested module, not to that parent module. Moreover, we weren't treating those submodules as members of their containing parents. This PR addresses both issues, so that nested imports work as expected.

As discussed in ~~Slack~~ whatever chat app I find myself in these days 😄, this requires keeping track of which modules have been imported within the current file, so that when we resolve member access on a module reference, we can see if that member has been imported as a submodule. If so, we return the submodule reference immediately, instead of checking whether the parent module's definition defines the symbol.

This is currently done in a flow insensitive manner. The SemanticIndex now tracks all of the modules that are imported (via import, not via from...import). The member access logic mentioned above currently only considers module imports in the file containing the attribute expression.

dcreager · 2024-12-12T20:37:58Z

(This is still marked as draft because I still plan on adding some additional test cases, but should be code complete if folks want to take an initial look at the implementation)

github-actions · 2024-12-12T20:44:53Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

MichaReiser

This is great. I only have a few nit comments

crates/red_knot_python_semantic/src/module_name.rs

crates/red_knot_python_semantic/src/semantic_index/builder.rs

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/src/types/display.rs

crates/red_knot_python_semantic/src/types/infer.rs

crates/red_knot_python_semantic/src/types.rs

* main: [red-knot] Display definition range in trace logs (#14955) [red-knot] Move the `ClassBase` enum to its own submodule (#14957) [red-knot] mdtest: python version requirements (#14954) [airflow]: Import modules that has been moved to airflow providers (AIR303) (#14764) [red-knot] Support `typing.TYPE_CHECKING` (#14952) Add tracing support to mdtest (#14935) Re-enable the fuzzer job on PRs (#14953) [red-knot] Improve `match` mdtests (#14951) Rename `custom-typeshed-dir`, `target-version` and `current-directory` CLI options (#14930) [red-knot] Add narrowing for 'while' loops (#14947) [`ruff`] Skip SQLModel base classes for `mutable-class-default` (`RUF012`) (#14949) [red-knot] Tests for 'while' loop boundness (#14944)

* main: [red-knot] Use `type[Unknown]` rather than `Unknown` as the fallback metaclass for invalid classes (#14961) [red-knot] Make `is_subtype_of` exhaustive (#14924) [red-knot] Alphabetize rules (#14960) [red-knot] Understand `Annotated` (#14950)

carljm · 2024-12-13T22:23:38Z

As discussed in Slack

I doubt it 😆

carljm

Lovely!

crates/red_knot_python_semantic/resources/mdtest/import/tracking.md

crates/red_knot_python_semantic/src/types.rs

carljm · 2024-12-13T22:54:30Z

crates/red_knot_python_semantic/src/types.rs

+                // this module named [name], then the result is that submodule, even if the module
+                // also defines a (non-module) symbol with that name.


I realize now reading this that the priority between a non-submodule package attribute and a submodule is a bit of a subtle thing.

What actually happens at runtime is that, when the submodule is imported, it writes its own name to the containing package namespace. This means that which takes priority actually depends on whether the package __init__.py itself imports the submodule before creating its own attribute of that same name, or not.

In other words, given pkg/a.py and these two versions of pkg/__init__.py:

a = 1

from . import a as _ a = 1

Then import pkg.a; print(pkg.a) will print <module 'pkg.a' ...> in the first case, but it will print 1 in the second case. Because in the first case, we first execute pkg/__init__.py fully, and only then import the submodule, overwriting the value 1 for the name "a" in the pkg namespace with the submodule. In the second case, we first import the submodule, which sets the submodule as the value for the a name, and then we continue executing __init__.py, overwriting the submodule with the value 1.

Neither mypy nor pyright model this accurately. Pyright prefers the submodule in both scenarios, and mypy prefers 1 in both scenarios.

What you have here currently would match pyright.

Is it worth trying to model this accurately? I think probably not, at least not for now. But we could put some of this context into this comment? And/or we could add some tests for the above two cases, with commentary on how they behave at runtime vs in our model?

I think the behavior you have here currently is good for now.

Updated the comment and added a test case with commentary

carljm · 2024-12-13T23:07:13Z

crates/red_knot_python_semantic/src/types/infer.rs

    fn module_ty_from_name(&self, module_name: &ModuleName) -> Option<Type<'db>> {
-        resolve_module(self.db, module_name).map(|module| Type::ModuleLiteral(module.file()))
+        resolve_module(self.db, module_name)
+            .map(|module| Type::ModuleLiteral(ModuleLiteralType::new(self.db, self.file, module)))


I realize now that what I had originally imagined we would do would be to look up the submodule data from self.index here and populate the interned ModuleLiteralType with the set of its imported submodule names. This means we would store more data per ModuleLiteralType (a set of submodule names, instead of just a File), but member lookups (in particular, those which don't match a submodule) would be a bit cheaper: the initial check would just be a membership test for the attribute name in the submodule set, without needing to build a ModuleName.

But I suspect all of this is just in the noise performance-wise (and the extra storage per ModuleLiteralType may well outweigh the cheaper lookups), so I don't see any reason to switch to that approach. Just thought the alternative worth mentioning.

Yeah that's a good point that this would make us construct a ModuleName every time we dereference a module. I had considered a more complex representation for ModuleName, but figured it would be better to experiment with that separately instead of trying to fold it into this. (Or better to just say this is good enough and leave it alone for now)

I think that unless we a) are seeing a clear regression in the existing benchmark, or b) there's a very clear alternative that will definitely perform better, it's best not to spend much time on performance optimizations when we don't even know if they are meaningful in the overall performance profile. So I would suggest leaving this as-is for now.

AlexWaygood

Thanks!

AlexWaygood · 2024-12-13T22:08:42Z

crates/red_knot_python_semantic/src/module_name.rs

+    /// module_name.push(&Name::new("bar"));
+    /// assert_eq!(&module_name, "foo.bar");
+    /// ```
+    pub fn push(&mut self, name: &Name) {


Hmm, we work quite hard in the other methods of this struct to maintain the invariant that the wrapped string can only be a valid module name. I think this method breaks that invariant, since the Name node could wrap a string that constitutes an invalid identifier. (Remember that we have a parser that supports error recovery, so it can produce Name nodes that wrap invalid identifiers! There are few guarantees.)

I think to fix this we could:

Change the signature of this method so that it accepts a ModuleName rather than a Name -- but then it has the same signature of ModuleName::extend.

Verify that the string passed in is valid using ModuleName::is_valid_name, and change the return type of this method to return a Result, where an error variant is returned if you passed in a string that constitutes an invalid identifier. But then it becomes less ergonomic to use the method.

Overall I'm wondering if we need this new method, or if it might just be better to use the existing ModuleName::extend()?

Yes that's a great point — I had gotten it into my mind somehow that Name was similarly checked, and so appending one would be safe just like extending a ModuleName is. But it's not, and even if we added one, I believe we have to handle __getattr__ calls here, too, which aren't guaranteed to be valid identifiers.

Updated this to use extend and to check the result of ModuleName::new

crates/red_knot_python_semantic/src/semantic_index/builder.rs

crates/red_knot_python_semantic/src/types.rs

Co-authored-by: Carl Meyer <carl@astral.sh>

* main: (25 commits) [`pydocstyle`] Skip leading whitespace for `D403` (#14963) Update pre-commit dependencies (#15008) Check diagnostic refresh support from client capability (#15014) Update Rust crate colored to v2.2.0 (#15010) Update dependency monaco-editor to v0.52.2 (#15006) Update Rust crate thiserror to v2.0.7 (#15005) Update Rust crate serde to v1.0.216 (#15004) Update Rust crate libc to v0.2.168 (#15003) Update Rust crate fern to v0.7.1 (#15002) Update Rust crate chrono to v0.4.39 (#15001) Update Rust crate bstr to v1.11.1 (#15000) Update NPM Development dependencies (#14999) Update dependency ruff to v0.8.3 (#15007) Pin mdformat plugins in pre-commit (#14992) Use stripping block (`|-`) for page descriptions (#14980) [`perflint`] Fix panic in `perf401` (#14971) Improve the documentation of E201/E202 (#14983) [ruff_python_ast] Add name and default functions to TypeParam. (#14964) [red-knot] Emit an error if a bare `Annotated` or `Literal` is used in a type expression (#14973) [red-knot] Fix bugs relating to assignability of dynamic `type[]` types (#14972) ...

* main: Bump zizmor pre-commit hook to the latest version and fix new warnings (#15022) Add `actionlint` as a pre-commit hook (with shellcheck integration) (#15021) Update dependency mdformat-mkdocs to v4 (#15011)

crates/red_knot_python_semantic/src/types.rs

MichaReiser · 2024-12-16T18:39:35Z

crates/red_knot_python_semantic/src/types.rs

+    /// The file in which this module was imported.
+    ///
+    /// We need this in order to know which submodules should be attached to it as attributes
+    /// (because the submodules were also imported in this file).
+    pub importing_file: File,


Does that mean we have multiple ModuleLiteralTypes for the same module? One for each place from which it's imported?

Yes that's right

MichaReiser · 2024-12-16T18:45:21Z

crates/red_knot_python_semantic/src/types.rs

+                    if importing_index.imports_module(&full_submodule_name) {
+                        if let Some(submodule) = resolve_module(db, &full_submodule_name) {
+                            let submodule_ty = Type::module_literal(db, importing_file, submodule);
+                            return Symbol::Type(submodule_ty, Boundness::Bound);
+                        }
+                    }


This is more of a fine-tuning comment:
It could make sense to define an imported_modules(db, file) query that returns the set with all known imported modules (it would just be a thin wrapper around semantic_index(db, file).imported_modules().

The advantage of having this intermediate query is that currenlty, Type::member has to re-execute whenever the AST of importing-file changes. Salsa might to truncate any dependent queries because it can determine that the type returned by Type::member is still the same and, therefore, dependent queries don't have to re-run. However, salsa could determine this sooner if Type::member only dependent on a imported_modules query. It would then only have to re-run the imported_modules query and, if the imports are unchanged, can't stop immediately.

Done, thanks for the suggestion! I wrapped the imported_modules set in an Arc, since it seems like the set now needs to be able to outlive the containing SemanticIndex, is that right?

carljm

This looks merge-ready to me!

#14946 fixed our handling of nested imports with the `import` statement, but didn't touch `from...import` statements. cf #14826 (comment)

dcreager added 4 commits December 12, 2024 15:03

Include module name in Type::ModuleLiteral

0ba077a

Resolve submodule references

2c093dd

Bind top-most parent when importing nested module

4877db4

Mark all parent modules as imported too

8b4e4f9

Placate the linter

9d4f731

MichaReiser added the ty Multi-file analysis & type inference label Dec 13, 2024

MichaReiser reviewed Dec 13, 2024

View reviewed changes

dcreager added 15 commits December 13, 2024 09:58

Add nested module test cases

370c133

Handle nested imports with as clause correctly

d707e9f

Fix merge conflicts

75e7d30

Linter

008bafe

Rename back to file

58c0ddc

file → module

4d5afcc

Rename the new ModuleName methods

9a64764

Use extend

74bfcac

Store Module in module reference type

42f49f9

I guess this has to be interned, not tracked?

6e9b295

Track containing file in Type::ModuleLiteral

5fb1246

Placate clippy

8215bc4

Spelling typo

c4df0a2

dcreager marked this pull request as ready for review December 13, 2024 21:49

dcreager requested review from AlexWaygood, carljm and sharkdp as code owners December 13, 2024 21:49

carljm approved these changes Dec 13, 2024

View reviewed changes

AlexWaygood reviewed Dec 14, 2024

View reviewed changes

dcreager and others added 10 commits December 16, 2024 10:42

Apply suggestions from code review

d22a522

Co-authored-by: Carl Meyer <carl@astral.sh>

Finish rename

ba828fd

Reflow mdtest

631a7f3

Just use ModuleName::extend

2abf3e6

Use deref

6646795

Add smart constructor

14ab423

Clarify that submodules do not always take precedence

9829538

Merge branch 'main' into dcreager/module-resolution

5c0739a

* main: Bump zizmor pre-commit hook to the latest version and fix new warnings (#15022) Add `actionlint` as a pre-commit hook (with shellcheck integration) (#15021) Update dependency mdformat-mkdocs to v4 (#15011)

linter

0b50375

MichaReiser reviewed Dec 16, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved

MichaReiser reviewed Dec 16, 2024

View reviewed changes

Add add'l salsa caching

65291c5

carljm approved these changes Dec 16, 2024

View reviewed changes

dcreager merged commit 4ddf922 into main Dec 16, 2024

dcreager deleted the dcreager/module-resolution branch December 16, 2024 21:15

dcreager mentioned this pull request Dec 16, 2024

Handle nested imports correctly in from ... import #15026

Merged

dcreager added a commit that referenced this pull request Dec 17, 2024

Handle nested imports correctly in from ... import (#15026)

91c9168

#14946 fixed our handling of nested imports with the `import` statement, but didn't touch `from...import` statements. cf #14826 (comment)

dcreager mentioned this pull request Dec 17, 2024

Incorrect resolution of nested modules #14826

Closed

AlexWaygood mentioned this pull request Nov 23, 2025

[ty] Implement from x.y import z ensuring x.y resolves in the current file #21583

Draft

		// this module named [name], then the result is that submodule, even if the module
		// also defines a (non-module) symbol with that name.

Conversation

dcreager commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcreager commented Dec 12, 2024

Uh oh!

github-actions bot commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm commented Dec 13, 2024

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dcreager commented Dec 12, 2024 •

edited

Loading

github-actions bot commented Dec 12, 2024 •

edited

Loading

`ruff-ecosystem` results