Skip to content

Conversation

@zherczeg
Copy link
Contributor

@zherczeg zherczeg commented Apr 8, 2025

This patch moves (and reworks) param_type_names / result_type_names out of IR to the parser, and fix the following issues:

Parse (ref index) form, not just (ref $name) form
Resolve references in types and locals
Display location when a named reference is not found

@zherczeg zherczeg force-pushed the parser_structures branch 3 times, most recently from cbf8b0f to 9df02a7 Compare April 8, 2025 16:13
@zherczeg zherczeg marked this pull request as ready for review April 8, 2025 16:28
@zherczeg
Copy link
Contributor Author

zherczeg commented Apr 8, 2025

This patch can be a superior patch for the issue, thank you @sbc100 . It removes many things added in #2571 so it might better to merge the two patches, and drop the other PR.

The patch keeps all parser related data structures in the parser, and use pointers to the original data. It works as long as the data structures are allocated in memory, and not moved (copied) during parsing.

FuncDeclaration func_decl;
CHECK_RESULT(ParseTypeUseOpt(&func_decl));
CHECK_RESULT(ParseUnboundFuncSignature(&func_decl.sig));
CHECK_RESULT(ParseUnboundFuncSignature(&decl->sig));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using locals for storing type vector is not possible anymore. I hope this is the only part where it caused an issue. I cannot check it with asserts.

This is the weakness of the patch.


Result result = Result::Ok;

for (auto it : context_.resolve_types) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core part. We resolve all names after the module is parsed in a single loop.

@zherczeg zherczeg force-pushed the parser_structures branch 3 times, most recently from 71679ff to 8922fe8 Compare April 9, 2025 07:25
@zherczeg
Copy link
Contributor Author

zherczeg commented Apr 9, 2025

I have simplified the code, this method does not even need a separate class, everything is inside WabtParser as private constructs. I only needed to move one static function to static private.

I think the two patch could be merged, and #2571 can be closed.

@zherczeg zherczeg force-pushed the parser_structures branch 2 times, most recently from 44781cc to 31c1144 Compare April 9, 2025 09:37
@zherczeg
Copy link
Contributor Author

zherczeg commented Apr 9, 2025

I have found even more simplifications.

@zherczeg
Copy link
Contributor Author

What do you think about this patch?

@zherczeg zherczeg force-pushed the parser_structures branch 2 times, most recently from fe3ff14 to 8b88d5e Compare April 14, 2025 11:57
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the title now. Does this patch just refactor the internal state or does it also change / fix some behaviour too?


// Parser may parse multiple modules. Reset internal status.
resolve_types_.clear();
resolve_funcs_.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be whole bunch of other state stored in WastParser (e.g. errors_, last_module_index_, etc) .. are you sure re-use is supported? Or is the other state reset somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. A spec test may contain several modules,

This is an example call stack:

#0  wabt::WastParser::ParseModuleFieldList
#1  wabt::WastParser::ParseScriptModule
#2  wabt::WastParser::ParseModuleCommand
#3  wabt::WastParser::ParseCommand
#4  wabt::WastParser::ParseCommandList
#5  wabt::WastParser::ParseScript
#6  wabt::ParseWastScript

The ParseCommandList has a loop. But nothing looks like a reset. I suspect the new module construction is the "reset". The point of this patch is moving out parser related stuff from the module, so I suspect this is the reason of the missing reset. Any suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the rest of the state is not per-module state but per-wast-file state.

In that case I agree this is probably needed. Maybe update the commend to say something like "Reset module-specific state"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch updated. Thank you!

@zherczeg zherczeg changed the title Move parser private data into WastParserContext Move parser related data from IR to the parser and fix some issues Apr 14, 2025
@zherczeg zherczeg force-pushed the parser_structures branch from 8b88d5e to ca17a81 Compare April 14, 2025 18:14
@zherczeg
Copy link
Contributor Author

I have updated the patch description.

@zherczeg zherczeg force-pushed the parser_structures branch from ca17a81 to 502529f Compare April 14, 2025 18:44
@sbc100
Copy link
Member

sbc100 commented Apr 14, 2025

Can you update the PR description? It seems out of date (unless I'm misunderstanding).

@zherczeg
Copy link
Contributor Author

I have updated the PR description (the first comment). Is it ok?

@zherczeg
Copy link
Contributor Author

I have found another issue, which should be solved in some way, although not in this patch.

Currently this is happily accepted (func $f (param (ref 20))) even if there is no type 20. The parser only checks the invalid names. The parser could check the index as well, but invalid references can be read by the binary reader, so it seems to me that this is a shared-validator business. However, the shared validator has no location info, which is ok for binary data, but not really for text.

The easiest solution is doing checks both in the parser and the shared validator, but double checking is not that nice. Do you have any suggestions what to do with this issue? I can do the coding.

@sbc100 sbc100 changed the title Move parser related data from IR to the parser and fix some issues Move parser related data from IR to the parser and improve reference type parsing Apr 15, 2025
Result result = Result::Ok;

for (auto it : resolve_types_) {
result |= ResolveRefTypes(module, *it.target_types, it.vars, errors_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not CHECK_RESULT here? I suppose so that we report more than just the first error? Is that necessary though? Do we do it elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually get multiple errors. But if you prefer a single error, I can change the code.

}
}

Result result = Result::Ok;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to the effect of "Module parsing is now completely. Type names can now be resolved"

Result WastParser::ParseBlockDeclaration(BlockDeclaration* decl) {
WABT_TRACE(ParseBlockDeclaration);
FuncDeclaration func_decl;
CHECK_RESULT(ParseTypeUseOpt(&func_decl));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply remove func_decl and just use decl for the ParseTypeUseOpt call too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the two structures are the same.


Result WastParser::ResolveRefTypes(const Module* module,
TypeVector& types,
ReferenceVars& ref_vars,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the style we use it either a const ref (for const arguments) or pointer to mutable argument. So these two args should either be const ref of pointer I think.

@zherczeg zherczeg force-pushed the parser_structures branch from 502529f to c2ebd48 Compare April 15, 2025 16:37
Parse (ref index) form, not just (ref $name) form
Resolve references in types and locals
Display location when a named reference is not found
@zherczeg zherczeg force-pushed the parser_structures branch from c2ebd48 to d590c3f Compare April 15, 2025 17:17
@zherczeg
Copy link
Contributor Author

Patch is rebased to master

@sbc100 sbc100 merged commit 32f04e7 into WebAssembly:main Apr 15, 2025
18 checks passed
@zherczeg zherczeg deleted the parser_structures branch April 16, 2025 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants