Skip to content

[Bug] difficulty to tune mem-fractoin-static for MTP models #8472

@lifuhuang

Description

@lifuhuang

Checklist

Motivation

We find out during GLM4.5 onboarding that it is very difficult to tune mem-fraction-static, implying potential bugs in memory estimation logic when handling spec decoding / mtp models. Creating a ticket for tracking. (cc @zhyncs @hebiao064 )

Behaviors:

  • None of the GLM4.5 model can run without explicitly specifying mem-fraction-static when MTP is enabled.
  • The full weight (300B) model can start successfully with 0.8, but the lighter weight Air version (100B) instead will run into OOM unless we adjust mem-fraction-static to 0.5 or below.

Related resources

No response

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions