refactor(assembly): improve link-time symbol resolution internals by bitwalker · Pull Request #2637 · 0xMiden/miden-vm

bitwalker · 2026-02-02T20:51:27Z

In the course of looking into an issue described
here, I discovered that a lot of the symbol resolver complexity was due to handling edge cases at the wrong level. Not only did this result in a lot of excess complexity, but it also failed to account for issues such as the one identified in the linked comment above.

This commit rewrites the core resolve_path implementation to work in terms of "expanding" a path into its absolute form, and then resolving the absolute path. This is much more efficient, since we can take a couple of shortcuts based on a few rules, and results in much more intuitive behavior, and a considerably simpler implementation.

Two new tests were added to catch a few edge cases that were not well represented by our existing test suite, and can be used to prevent future regressions as well.

This is a non-breaking change in the public interface of the assembler.

@bobbinth This is targeted against main since it fixes issues in the current release without any breaking changes (at least, things that would be caught by our test suite and compilation of the stdlib).

In the course of looking into an issue described [here](0xMiden/protocol#2377 (comment)), I discovered that a lot of the symbol resolver complexity was due to handling edge cases at the wrong level. Not only did this result in a lot of excess complexity, but it also failed to account for issues such as the one identified in the linked comment above. This commit rewrites the core `resolve_path` implementation to work in terms of "expanding" a path into its absolute form, and then resolving the absolute path. This is much more efficient, since we can take a couple of shortcuts based on a few rules, and results in much more intuitive behavior, and a considerably simpler implementation. Two new tests were added to catch a few edge cases that were not well represented by our existing test suite, and can be used to prevent future regressions as well. This is a non-breaking change in the public interface of the assembler.

bitwalker · 2026-02-02T21:11:51Z

GitHub seems to be choking at the moment, I'll try to get all the jobs re-run as soon as it settles down again

bobbinth

Looks good! Thank you! I left a few comments inline - mostly about adding more context/clarifications.

Overall, while this does simplify things (and that's great), I can't say that I fully followed all the code as it is still quite complex (e.g., I only did a cursory review of the SymbolResolver::expand_path() method). I do wonder if in the longer term we can simplify this even more by maybe giving up some efficiency - though, I don't have any specific proposals for this.

bobbinth · 2026-02-03T05:31:50Z

-        <Self as StartsWith<str>>::starts_with_exactly(self, prefix.as_str())
+        let mut components = self.components();
+        for prefix_component in prefix.components() {
+            let prefix_component = prefix_component.expect("invalid prefix path");


Could we add a safety comment explaining why we expect prefix_component to always be valid?

More generally, why do we need to do component validation on iteration rather than on Path construction? Seems like it would be easier to reason about some of these things if we could always assume that a Path is well-formed.

Even if we assert in Path::new (which just losslessly converts from str to Path), we'd need two implementations of the components iterator: one used when validating/constructing from a string, and one that assumes the components are valid and just asserts on that assumption being violated. Most of the implementation could actually be shared, we'd probably just have the "assume valid" Iterator wrap the current impl and assert on Err. That might be better than the current setup.

In any case, I couldn't make those sorts of changes here because they'd be breaking changes, and I wanted to keep this as non-invasive as possible.

bobbinth · 2026-02-03T05:53:55Z

    #[inline]
    fn starts_with_exactly(&self, prefix: &str) -> bool {


It may be worth adding a comment noting that this compares the prefix only against the first component (excluding the root marker). It took me a bit of time to understand this.

Good point - we document this for join now, but didn't reflect those semantics in the docs here.

bobbinth · 2026-02-03T05:55:16Z

+    pub fn strip_prefix<'a>(&'a self, prefix: &Self) -> Option<&'a Self> {
+        let mut components = self.components();
+        for prefix_component in prefix.components() {
+            let prefix_component = prefix_component.expect("invalid prefix path");


Similar comment as before re adding a note about why we are guaranteed to have a valid prefix_component here.

We're not guaranteed to - but it is part of the inherent contract of these APIs that paths are valid once constructed. If we make the changes discussed in the other comment, then we can actually enforce that, but that sort of change will have to happen in next

bobbinth · 2026-02-03T06:04:04Z

+                // 1. A reference to a symbol in the current module (possibly imported)
+                // 1. A reference to a symbol relative to an imported module, e.g. `push.mod::CONST`
+                // 2. A reference to a symbol relative to an imported module, e.g.
+                //    `push.mod::submod::CONST`
+                // 3. An absolute path expressed relatively, e.g. `push.root::mod::submod::CONST`,
+                //    which should have been expressed as `push.::root::mod::submod::CONST`, but the
+                //    `::` prefix was omitted/forgotten.


Was Line 269 supposed to be deleted? The content seems to be the same as line 270, but it is numbered as 1 (same as line 268).

Whoops, no, it's just incorrectly numbered

huitseeker · 2026-02-03T11:37:25Z

+                //
+                // If the symbol is an imported item, then we expand the imported path recursively.
+                break match self
+                    .resolve_local_with_index(context.module, Span::new(span, symbol.as_str()))?


The old resolve function had a branch that rebound gid.module to kernel_index when context.in_syscall() was true. That guard is gone here, so syscall targets can now bind to non-kernel items.

For example, I think that if a user module defines foo and the kernel also has foo, syscall.foo might now resolve to the user's foo instead of the kernel's foo.

Good point, this should be enforced by resolve_invoke_target, seems like we don't have a regression test for that either, so I'll add one

huitseeker · 2026-02-03T11:37:28Z

+                // and the result of this recursive expansion is what gets returned from this
+                // function.
+                let (imported_symbol, subpath) = path.split_first().expect("multi-component path");
+                if ignored_imports.contains(imported_symbol) {


The old find function tracked visited full paths like [a, b, a] to detect cycles. This only tracks import names like a, so multi-alias cycles can slip through.

IIUC, if module A imports b and module B imports a, then resolving a::b::a::x would keep expanding instead of raising a recursive alias error. The ignored_imports set catches use lib::lib style cycles, but not cycles with distinct names.

I'll add a regression test for this and make sure we're still properly handling cycles like that - but I don't believe the issue you've described is actually an infinite expansion (at least not without a reproducer), though the behavior would certainly depend on where one is resolving a path like a::b::a::x from. But let's assume for a second that it is from b, which looks like so:

pub use a pub proc foo exec.a::b::a::x end

And a looks like:

pub use b pub proc x push.1 end

So we have a cyclic dependency between a and b, but one that should be resolvable. Resolving exec.a::b::a::x in b would proceed as follows:

The path is relative, so we split off a to resolve as an import, which expands to just a. We add a to ignored_imports and proceed to resolve the now partially-expanded path a::b::a::x

The path is relative, so we split off a to resolve as an import, but this time it is ignored, instead we attempt to expand the path as absolute, i.e. ::a::b::a::x

The path is absolute, so we search for a module with the longest matching prefix, which is a, so we attempt to resolve b::a::x relative to a

The path is relative, so we split off b to resolve as an import, which expands to just b, which is added to ignored_imports, and we proceed to resolve the now partially-expanded path b::a::x

The path is relative, and again we ignore the import b, and convert the path to absolute form and attempt to resolve as ::b::a::x

The longest matching prefix of ::b::a::x is b, so we attempt to resolve a::x relative to b

We expand the a import to a, ignore it, and proceed to resolve ::a::x

The longest matching prefix of ::a::x is a, so we attempt to resolve x relative to a

The relative path x is a symbol, so we resolve it to a local definition in a

I did find a particularly pathological case though which does produce an issue - but I'm not sure if it would have succeeded previously either, in any case, here's the test:

#[test] fn test_linking_recursive_expansion() -> TestResult { let context = TestContext::default(); let a_lib = context.parse_module_with_path( "a", source_file!( &context, r#" pub use b::a pub proc x push.1 end "# ), )?; let b_lib = context.parse_module_with_path( "b", source_file!( &context, r#" pub use a::a pub proc foo exec.a::x end "# ), )?; let assembler = Assembler::new(context.source_manager()); let _ = assembler.assemble_library([a_lib, b_lib])?; Ok(()) }

huitseeker · 2026-02-03T11:37:32Z

+                } else {
+                    prefix
+                };
+                components.next().is_some_and(


Nit: This only checks the first component against the entire prefix string, so path("foo::bar").starts_with("foo") returns true but path("foo::bar").starts_with("foo::bar") returns false.

The old behavior used string prefix matching, so path("foo::bar").starts_with("foo") would return true as well, but the behavior was different for multi-component prefixes. This might be quite fine if the API is only used with single-component prefixes, but it's a subtle change.

Correct - however in our uses we only ever intend to match the leading component when using a string, and the subtleties around matching portions of a path with strings caused bugs when dealing with quoted identifiers.

This change, while technically breaking, actually guarantees the correct semantics for our intended uses of these functions. Essentially, like join, we want to ensure that when component structure matters, one has to be explicit about constructing a Path for multi-component matching/joining, in order to guarantee consistent behavior when dealing with paths that may have quoted components.

…ions

bobbinth

Looks good! Thank you! Let's create an issue to address the remaining items (i.e., #2637 (comment) and #2637 (comment)), and probably more broadly, to figure out how we can simplify path/symbol resolution (e.g., by imposing some restrictions or giving up some performance).

bobbinth · 2026-02-04T21:30:18Z

Question: what's the motivation for moving this file?

We had the miden-assembly crate depending on miden-processor and miden-core-lib (both of which depend on miden-assembly) just for two integration tests (and also used by the doctests in the miden-assembly README).

The problem is that this meant that changing anything in the assembler and then running an assembler test, would first need to build both of those crates (and of particular note, miden-core-lib). If there was an issue in the assembler that you were trying to work out, and that issue affected any of the MASM in miden-core-lib, you would be unable to run any of the miden-assembly tests without getting miden-core-lib to build successfully first. That was obviously a problem for me when tinkering with anything involving the linker/symbol resolution.

Now, the two tests that used the processor live in the processor test suite, and this cyclic dependency is now gone. The only downside is that a couple of the doctests in the miden-assembly readme can no longer reference miden-core-lib, but the need for that is going away soon anyway.

bobbinth · 2026-02-04T21:36:53Z

+    let assembler = Assembler::with_kernel(source_manager, kernel_lib);
+    // If you wanted to link against the core library, you'd extend the above
+    // with: `.with_dynamic_library(&miden_core_lib::CoreLibrary::default())?;`


I don't mind this change, but what's the motivation for making it?

See my other comment above.

More generally though, I think the miden-assembly README should probably not be referencing stuff from miden-core-lib just for the sake of an example. We can illustate the APIs just fine without doing that, and it saves us the need to have a cyclic dependency between miden-assembly and miden-core-lib which can be a real pain as mentioned in my other comment.

Lastly, referencing Library via Rust crates like that is going away anyway, so all of these examples will need to be rewritten regardless.

I left the comment in though just to preserve the usage that was documented there, it just isn't statically checked anymore.

bitwalker requested review from bobbinth and plafer February 2, 2026 20:51

bitwalker self-assigned this Feb 2, 2026

bitwalker added the assembly Related to Miden assembly label Feb 2, 2026

bitwalker force-pushed the bitwalker/symbol-resolution-fixes branch from a5d8bb2 to 4b17762 Compare February 2, 2026 20:52

bobbinth approved these changes Feb 3, 2026

View reviewed changes

PhilippGackstatter mentioned this pull request Feb 3, 2026

Replace const re-declaration when bug is fixed 0xMiden/protocol#2392

Closed

huitseeker reviewed Feb 3, 2026

View reviewed changes

bitwalker added 4 commits February 4, 2026 15:29

doc(assembly): a few documentation/comment fixes/improvements

2b3196c

fix(assembly): ensure resolve_invoke_target validates syscall resolut…

bde8b1c

…ions

test: move assembler tests which require processor to processor crate

9fb2ef9

test(assembly): add regression tests for specific linker edge cases

d626391

bitwalker force-pushed the bitwalker/symbol-resolution-fixes branch from 85f780f to d626391 Compare February 4, 2026 20:47

bobbinth approved these changes Feb 4, 2026

View reviewed changes

bobbinth merged commit c4ed229 into main Feb 4, 2026
17 checks passed

bobbinth deleted the bitwalker/symbol-resolution-fixes branch February 4, 2026 21:43

bitwalker mentioned this pull request Feb 4, 2026

Refine semantics of imports/paths expressed in Miden Assembly #2643

Open

		#[inline]
		fn starts_with_exactly(&self, prefix: &str) -> bool {

Conversation

bitwalker commented Feb 2, 2026

Uh oh!

bitwalker commented Feb 2, 2026

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huitseeker Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huitseeker Feb 3, 2026 •

edited

Loading