Skip to content

refactor(assembly): improve link-time symbol resolution internals#2637

Merged
bobbinth merged 5 commits intomainfrom
bitwalker/symbol-resolution-fixes
Feb 4, 2026
Merged

refactor(assembly): improve link-time symbol resolution internals#2637
bobbinth merged 5 commits intomainfrom
bitwalker/symbol-resolution-fixes

Conversation

@bitwalker
Copy link
Copy Markdown
Collaborator

In the course of looking into an issue described
here, I discovered that a lot of the symbol resolver complexity was due to handling edge cases at the wrong level. Not only did this result in a lot of excess complexity, but it also failed to account for issues such as the one identified in the linked comment above.

This commit rewrites the core resolve_path implementation to work in terms of "expanding" a path into its absolute form, and then resolving the absolute path. This is much more efficient, since we can take a couple of shortcuts based on a few rules, and results in much more intuitive behavior, and a considerably simpler implementation.

Two new tests were added to catch a few edge cases that were not well represented by our existing test suite, and can be used to prevent future regressions as well.

This is a non-breaking change in the public interface of the assembler.

@bobbinth This is targeted against main since it fixes issues in the current release without any breaking changes (at least, things that would be caught by our test suite and compilation of the stdlib).

@bitwalker bitwalker requested review from bobbinth and plafer February 2, 2026 20:51
@bitwalker bitwalker self-assigned this Feb 2, 2026
@bitwalker bitwalker added the assembly Related to Miden assembly label Feb 2, 2026
In the course of looking into an issue described
[here](0xMiden/protocol#2377 (comment)),
I discovered that a lot of the symbol resolver complexity was due to
handling edge cases at the wrong level. Not only did this result in a
lot of excess complexity, but it also failed to account for issues such
as the one identified in the linked comment above.

This commit rewrites the core `resolve_path` implementation to work in
terms of "expanding" a path into its absolute form, and then resolving
the absolute path. This is much more efficient, since we can take a
couple of shortcuts based on a few rules, and results in much more
intuitive behavior, and a considerably simpler implementation.

Two new tests were added to catch a few edge cases that were not well
represented by our existing test suite, and can be used to prevent
future regressions as well.

This is a non-breaking change in the public interface of the assembler.
@bitwalker bitwalker force-pushed the bitwalker/symbol-resolution-fixes branch from a5d8bb2 to 4b17762 Compare February 2, 2026 20:52
@bitwalker
Copy link
Copy Markdown
Collaborator Author

GitHub seems to be choking at the moment, I'll try to get all the jobs re-run as soon as it settles down again

Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a few comments inline - mostly about adding more context/clarifications.

Overall, while this does simplify things (and that's great), I can't say that I fully followed all the code as it is still quite complex (e.g., I only did a cursory review of the SymbolResolver::expand_path() method). I do wonder if in the longer term we can simplify this even more by maybe giving up some efficiency - though, I don't have any specific proposals for this.

<Self as StartsWith<str>>::starts_with_exactly(self, prefix.as_str())
let mut components = self.components();
for prefix_component in prefix.components() {
let prefix_component = prefix_component.expect("invalid prefix path");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a safety comment explaining why we expect prefix_component to always be valid?

More generally, why do we need to do component validation on iteration rather than on Path construction? Seems like it would be easier to reason about some of these things if we could always assume that a Path is well-formed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we assert in Path::new (which just losslessly converts from str to Path), we'd need two implementations of the components iterator: one used when validating/constructing from a string, and one that assumes the components are valid and just asserts on that assumption being violated. Most of the implementation could actually be shared, we'd probably just have the "assume valid" Iterator wrap the current impl and assert on Err. That might be better than the current setup.

In any case, I couldn't make those sorts of changes here because they'd be breaking changes, and I wanted to keep this as non-invasive as possible.

Comment on lines 466 to 467
#[inline]
fn starts_with_exactly(&self, prefix: &str) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a comment noting that this compares the prefix only against the first component (excluding the root marker). It took me a bit of time to understand this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - we document this for join now, but didn't reflect those semantics in the docs here.

pub fn strip_prefix<'a>(&'a self, prefix: &Self) -> Option<&'a Self> {
let mut components = self.components();
for prefix_component in prefix.components() {
let prefix_component = prefix_component.expect("invalid prefix path");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as before re adding a note about why we are guaranteed to have a valid prefix_component here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not guaranteed to - but it is part of the inherent contract of these APIs that paths are valid once constructed. If we make the changes discussed in the other comment, then we can actually enforce that, but that sort of change will have to happen in next

Comment on lines +268 to +274
// 1. A reference to a symbol in the current module (possibly imported)
// 1. A reference to a symbol relative to an imported module, e.g. `push.mod::CONST`
// 2. A reference to a symbol relative to an imported module, e.g.
// `push.mod::submod::CONST`
// 3. An absolute path expressed relatively, e.g. `push.root::mod::submod::CONST`,
// which should have been expressed as `push.::root::mod::submod::CONST`, but the
// `::` prefix was omitted/forgotten.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was Line 269 supposed to be deleted? The content seems to be the same as line 270, but it is numbered as 1 (same as line 268).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, no, it's just incorrectly numbered

//
// If the symbol is an imported item, then we expand the imported path recursively.
break match self
.resolve_local_with_index(context.module, Span::new(span, symbol.as_str()))?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old resolve function had a branch that rebound gid.module to kernel_index when context.in_syscall() was true. That guard is gone here, so syscall targets can now bind to non-kernel items.

For example, I think that if a user module defines foo and the kernel also has foo, syscall.foo might now resolve to the user's foo instead of the kernel's foo.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this should be enforced by resolve_invoke_target, seems like we don't have a regression test for that either, so I'll add one

// and the result of this recursive expansion is what gets returned from this
// function.
let (imported_symbol, subpath) = path.split_first().expect("multi-component path");
if ignored_imports.contains(imported_symbol) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old find function tracked visited full paths like [a, b, a] to detect cycles. This only tracks import names like a, so multi-alias cycles can slip through.

IIUC, if module A imports b and module B imports a, then resolving a::b::a::x would keep expanding instead of raising a recursive alias error. The ignored_imports set catches use lib::lib style cycles, but not cycles with distinct names.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a regression test for this and make sure we're still properly handling cycles like that - but I don't believe the issue you've described is actually an infinite expansion (at least not without a reproducer), though the behavior would certainly depend on where one is resolving a path like a::b::a::x from. But let's assume for a second that it is from b, which looks like so:

pub use a

pub proc foo
    exec.a::b::a::x
end

And a looks like:

pub use b

pub proc x
    push.1
end

So we have a cyclic dependency between a and b, but one that should be resolvable. Resolving exec.a::b::a::x in b would proceed as follows:

  • The path is relative, so we split off a to resolve as an import, which expands to just a. We add a to ignored_imports and proceed to resolve the now partially-expanded path a::b::a::x
  • The path is relative, so we split off a to resolve as an import, but this time it is ignored, instead we attempt to expand the path as absolute, i.e. ::a::b::a::x
  • The path is absolute, so we search for a module with the longest matching prefix, which is a, so we attempt to resolve b::a::x relative to a
  • The path is relative, so we split off b to resolve as an import, which expands to just b, which is added to ignored_imports, and we proceed to resolve the now partially-expanded path b::a::x
  • The path is relative, and again we ignore the import b, and convert the path to absolute form and attempt to resolve as ::b::a::x
  • The longest matching prefix of ::b::a::x is b, so we attempt to resolve a::x relative to b
  • We expand the a import to a, ignore it, and proceed to resolve ::a::x
  • The longest matching prefix of ::a::x is a, so we attempt to resolve x relative to a
  • The relative path x is a symbol, so we resolve it to a local definition in a

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did find a particularly pathological case though which does produce an issue - but I'm not sure if it would have succeeded previously either, in any case, here's the test:

#[test]
fn test_linking_recursive_expansion() -> TestResult {
    let context = TestContext::default();

    let a_lib = context.parse_module_with_path(
        "a",
        source_file!(
            &context,
            r#"
        pub use b::a
        pub proc x
            push.1
        end
        "#
        ),
    )?;

    let b_lib = context.parse_module_with_path(
        "b",
        source_file!(
            &context,
            r#"
        pub use a::a
        pub proc foo
            exec.a::x
        end
        "#
        ),
    )?;

    let assembler = Assembler::new(context.source_manager());
    let _ = assembler.assemble_library([a_lib, b_lib])?;

    Ok(())
}

} else {
prefix
};
components.next().is_some_and(
Copy link
Copy Markdown
Collaborator

@huitseeker huitseeker Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This only checks the first component against the entire prefix string, so path("foo::bar").starts_with("foo") returns true but path("foo::bar").starts_with("foo::bar") returns false.

The old behavior used string prefix matching, so path("foo::bar").starts_with("foo") would return true as well, but the behavior was different for multi-component prefixes. This might be quite fine if the API is only used with single-component prefixes, but it's a subtle change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - however in our uses we only ever intend to match the leading component when using a string, and the subtleties around matching portions of a path with strings caused bugs when dealing with quoted identifiers.

This change, while technically breaking, actually guarantees the correct semantics for our intended uses of these functions. Essentially, like join, we want to ensure that when component structure matters, one has to be explicit about constructing a Path for multi-component matching/joining, in order to guarantee consistent behavior when dealing with paths that may have quoted components.

@bitwalker bitwalker force-pushed the bitwalker/symbol-resolution-fixes branch from 85f780f to d626391 Compare February 4, 2026 20:47
Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! Let's create an issue to address the remaining items (i.e., #2637 (comment) and #2637 (comment)), and probably more broadly, to figure out how we can simplify path/symbol resolution (e.g., by imposing some restrictions or giving up some performance).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what's the motivation for moving this file?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the miden-assembly crate depending on miden-processor and miden-core-lib (both of which depend on miden-assembly) just for two integration tests (and also used by the doctests in the miden-assembly README).

The problem is that this meant that changing anything in the assembler and then running an assembler test, would first need to build both of those crates (and of particular note, miden-core-lib). If there was an issue in the assembler that you were trying to work out, and that issue affected any of the MASM in miden-core-lib, you would be unable to run any of the miden-assembly tests without getting miden-core-lib to build successfully first. That was obviously a problem for me when tinkering with anything involving the linker/symbol resolution.

Now, the two tests that used the processor live in the processor test suite, and this cyclic dependency is now gone. The only downside is that a couple of the doctests in the miden-assembly readme can no longer reference miden-core-lib, but the need for that is going away soon anyway.

Comment thread crates/assembly/README.md
Comment on lines +207 to +209
let assembler = Assembler::with_kernel(source_manager, kernel_lib);
// If you wanted to link against the core library, you'd extend the above
// with: `.with_dynamic_library(&miden_core_lib::CoreLibrary::default())?;`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this change, but what's the motivation for making it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment above.

More generally though, I think the miden-assembly README should probably not be referencing stuff from miden-core-lib just for the sake of an example. We can illustate the APIs just fine without doing that, and it saves us the need to have a cyclic dependency between miden-assembly and miden-core-lib which can be a real pain as mentioned in my other comment.

Lastly, referencing Library via Rust crates like that is going away anyway, so all of these examples will need to be rewritten regardless.

I left the comment in though just to preserve the usage that was documented there, it just isn't statically checked anymore.

@bobbinth bobbinth merged commit c4ed229 into main Feb 4, 2026
17 checks passed
@bobbinth bobbinth deleted the bitwalker/symbol-resolution-fixes branch February 4, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

assembly Related to Miden assembly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants