Skip to content

Support non-uniform layer structures in MFU estimation #19919

@aramasethu

Description

@aramasethu

The current MFU estimator assumes all layers have the same structure. Models like Qwen3.5-397B-A17B have non-uniform layer structures, which the estimator doesn't account for. The FLOPs and memory bandwidth estimates would be inaccurate for these architectures. Refer to the following comment:

Other models with varying layer widths or mixed configurations may also not fit the current assumptions
Should we consider cases where the model's layer structures differ? For example, the latest Qwen3.5-397B-A17B: https://huggingface.co/Qwen/Qwen3.5-397B-A17B. Of course, I think this issue can be addressed later.

Originally posted by @sufeng-buaa in #19395 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions