The new HybridModel class is a “heterogenous layer” model built to replace the existing GPTModel and MambaModel classes. The HybridModel class will be used as the foundation for future Nemotron models and next-generation open-source models. It will be highly-customizable to support varying layer types and sizes.
Please feel free to read our design doc and ask any follow-up questions in this issue!
Authors: @Phlip79, @janEbert
The new
HybridModelclass is a “heterogenous layer” model built to replace the existing GPTModel andMambaModelclasses. TheHybridModelclass will be used as the foundation for future Nemotron models and next-generation open-source models. It will be highly-customizable to support varying layer types and sizes.Please feel free to read our design doc and ask any follow-up questions in this issue!
Authors: @Phlip79, @janEbert