Checklist
Motivation
We find out during GLM4.5 onboarding that it is very difficult to tune mem-fraction-static, implying potential bugs in memory estimation logic when handling spec decoding / mtp models. Creating a ticket for tracking. (cc @zhyncs @hebiao064 )
Behaviors:
- None of the GLM4.5 model can run without explicitly specifying mem-fraction-static when MTP is enabled.
- The full weight (300B) model can start successfully with 0.8, but the lighter weight Air version (100B) instead will run into OOM unless we adjust mem-fraction-static to 0.5 or below.
Related resources
No response
Checklist
Motivation
We find out during GLM4.5 onboarding that it is very difficult to tune mem-fraction-static, implying potential bugs in memory estimation logic when handling spec decoding / mtp models. Creating a ticket for tracking. (cc @zhyncs @hebiao064 )
Behaviors:
Related resources
No response