Skip to content

Tune best performance for FSDP on GB300 for LLAMA31 405B #2118

@sanandaraj5597

Description

@sanandaraj5597

We need to tune LLAMA3 405B FP8-CS performance on GB300, some of the optimization to enable:

  1. Gradient Accumulation Fusion - @dingqingy-nv
  2. Low SM NVLink communication - @dingqingy-nv (Please take help from Youngeun on this)
  3. Try TP1 by using more memory on GB300 GPU - @malay-nagda
  4. If TP1 doesn't work the way we want it above, try Hybrid-FSDP - @malay-nagda

cc: @erhoo82

Metadata

Metadata

Labels

area:perfPerformance optimizations and benchmarkingfeatureNew capabilities, enhancements, or enablement workperformanceperformance/optimizePerformance optimization trackingtrackingTracking issue for an ongoing project with smaller steps

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions