StringLifting?

It would be nice to allow toolchains to emit magic JS string imports all the time, which would make the output immediately runnable in VMs. That would be instead of emitting stringref and letting Binaryen lower it. The benefit of stringref is that Binaryen can optimize strings (it has them in the IR), but in a debug build you don't need that, and just want to run the build. Right now, toolchains can do some work to emit either JS string imports or stringref, depending on build type (debug or optimized), but we could save them the effort if Binaryen could read JS string imports and turn them into optimizable stringref.

We already have a String**Lowering** pass that turns stringref into JS string imports, which works well, so we could have a String**Lifting** that does the inverse. However, the inverse problem is a lot harder, consider e.g.

```wat
(module
  (import "\'" "foo" (global $string.foo externref))
  (func $use
    (local $temp externref)
    (local.set $temp (global.get $string.foo))
  )
)
```
We can turn that imported JS string into a `string.const`, but the type would change from `externref` to `stringref`, and no longer fit in the local.

The existing lowering pass handles this by just turning every `stringref` into `externref`, which is fine as the goal is to lower away all native wasm strings. But we can't do that in a lifting pass, as there might be legitimate and unrelated `externref` uses to keep.

Inferring the types to change (in locals, globals, params, results, struct and array fields, tags, etc. etc.) would be... challenging, and likely brittle.

If we used type imports this could work - toolchains would not use raw externref but something more specific. But that proposal is far off (phase 1), so toolchains can't depend on it.

We could use custom annotations instead. I looked a little into how that might work, but it seems like in `contexts.h`, where we get the annotations, we'd need to do something with them. That seems like a widespread and annoying change at the parsing level. Perhaps instead we could stash the annotations on the IR or on the side (like we do with debug info), and then a lifting pass could use that?

To be honest that doesn't seem very appealing either, both in terms of needed work on the Binaryen side, and toolchains - they'd need to add these annotations everywhere, and forgetting some annotation - say on some struct field - would lead to very odd errors.

As all of this is meant to save toolchains time, I looked at a huge 33MB wasm file from Java (the largest file I have that uses strings). Running `--string-lowering-magic-imports` (and reading and writing the binary) takes 14 seconds on my modest laptop - on a beefy machine it would be significantly faster. So we are talking single digits of seconds here, most likely. In that case, the benefit to toolchains seems pretty modest?

@tlively What are your thoughts here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StringLifting? #7370

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StringLifting? #7370

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions