-
Notifications
You must be signed in to change notification settings - Fork 364
Logging to MLFlow #1387
Copy link
Copy link
Open
Labels
area:trainingTraining loop, callbacks, and runtime integrationTraining loop, callbacks, and runtime integrationfeatureNew capabilities, enhancements, or enablement workNew capabilities, enhancements, or enablement workwaiting-on-maintainersWaiting on maintainers to respondWaiting on maintainers to respond
Milestone
Metadata
Metadata
Assignees
Labels
area:trainingTraining loop, callbacks, and runtime integrationTraining loop, callbacks, and runtime integrationfeatureNew capabilities, enhancements, or enablement workNew capabilities, enhancements, or enablement workwaiting-on-maintainersWaiting on maintainers to respondWaiting on maintainers to respond
Type
Fields
Give feedbackNo fields configured for issues without a type.
Is your feature request related to a problem? Please describe.
I need to use MLFlow for experiment tracking and logging in Megatron Bridge training runs. Currently, Megatron Bridge only supports TensorBoard and wandb for logging metrics, hyperparameters, and artifacts.
Describe the solution you'd like
Add native MLFlow logging support to Megatron Bridge's
LoggerConfigand training, following the same pattern as the existing wandb integration. This would include:LoggerConfig- Gets an MLFlow experiment and starts a run with specified name and tags
- Handles distributed training (e.g. logging only from last rank)
Describe alternatives you've considered
Additional context