Skip to content

Logging to MLFlow #1387

@sbhavani

Description

@sbhavani

Is your feature request related to a problem? Please describe.
I need to use MLFlow for experiment tracking and logging in Megatron Bridge training runs. Currently, Megatron Bridge only supports TensorBoard and wandb for logging metrics, hyperparameters, and artifacts.

Describe the solution you'd like
Add native MLFlow logging support to Megatron Bridge's LoggerConfig and training, following the same pattern as the existing wandb integration. This would include:

  1. Configuration options in LoggerConfig
  2. Logger initialization that:
    - Gets an MLFlow experiment and starts a run with specified name and tags
    - Handles distributed training (e.g. logging only from last rank)
  3. Metric logging throughout the training loop for all metrics currently logged to wandb
  4. Artifact logging for checkpoints

Describe alternatives you've considered

  • Implementing a general logger plugin: A more extensible solution would be to create a plugin for loggers (e.g. callbacks). This would allow users to implement any logger without modifying core code. This is a larger change that could be considered separately.

Additional context

Metadata

Metadata

Labels

area:trainingTraining loop, callbacks, and runtime integrationfeatureNew capabilities, enhancements, or enablement workwaiting-on-maintainersWaiting on maintainers to respond

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions