Skip to content

StringLifting? #7370

@kripken

Description

@kripken

It would be nice to allow toolchains to emit magic JS string imports all the time, which would make the output immediately runnable in VMs. That would be instead of emitting stringref and letting Binaryen lower it. The benefit of stringref is that Binaryen can optimize strings (it has them in the IR), but in a debug build you don't need that, and just want to run the build. Right now, toolchains can do some work to emit either JS string imports or stringref, depending on build type (debug or optimized), but we could save them the effort if Binaryen could read JS string imports and turn them into optimizable stringref.

We already have a StringLowering pass that turns stringref into JS string imports, which works well, so we could have a StringLifting that does the inverse. However, the inverse problem is a lot harder, consider e.g.

(module
  (import "\'" "foo" (global $string.foo externref))
  (func $use
    (local $temp externref)
    (local.set $temp (global.get $string.foo))
  )
)

We can turn that imported JS string into a string.const, but the type would change from externref to stringref, and no longer fit in the local.

The existing lowering pass handles this by just turning every stringref into externref, which is fine as the goal is to lower away all native wasm strings. But we can't do that in a lifting pass, as there might be legitimate and unrelated externref uses to keep.

Inferring the types to change (in locals, globals, params, results, struct and array fields, tags, etc. etc.) would be... challenging, and likely brittle.

If we used type imports this could work - toolchains would not use raw externref but something more specific. But that proposal is far off (phase 1), so toolchains can't depend on it.

We could use custom annotations instead. I looked a little into how that might work, but it seems like in contexts.h, where we get the annotations, we'd need to do something with them. That seems like a widespread and annoying change at the parsing level. Perhaps instead we could stash the annotations on the IR or on the side (like we do with debug info), and then a lifting pass could use that?

To be honest that doesn't seem very appealing either, both in terms of needed work on the Binaryen side, and toolchains - they'd need to add these annotations everywhere, and forgetting some annotation - say on some struct field - would lead to very odd errors.

As all of this is meant to save toolchains time, I looked at a huge 33MB wasm file from Java (the largest file I have that uses strings). Running --string-lowering-magic-imports (and reading and writing the binary) takes 14 seconds on my modest laptop - on a beefy machine it would be significantly faster. So we are talking single digits of seconds here, most likely. In that case, the benefit to toolchains seems pretty modest?

@tlively What are your thoughts here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions