Configurations#
This page provides a comprehensive reference for all configuration parameters available in AReaL’s command-line interface. These parameters are defined using dataclasses and can be specified in YAML configuration files or overridden via command line arguments.
Usage#
Configuration files are specified using the --config parameter:
python -m areal.launcher --config path/to/config.yaml
You can override specific parameters from the command line:
python -m areal.launcher --config path/to/config.yaml actor.lr=1e-4 seed=42
For detailed examples, see the experiment configurations in the examples/ directory.
Table of Contents#
Core Experiment Configurations#
Training Configurations#
Inference Configurations#
Dataset#
System and Cluster Configurations#
Logging and Monitoring#
Others#
BaseExperiment Configuration#
Base configuration class for all experiment types with common settings.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
GRPO Configuration#
A dummy place holder of GRPO config for backward compatibility.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Generation hyperparameters for evaluation. If None, use gconfig. |
|
Required |
- |
|
|
Required |
- |
|
|
|
|
- |
|
|
|
- |
|
boolean |
|
Enable dynamic batch sizing in prepare_batch. When True, batch collection stops when (accepted + rejected) >= batch_size, returning only accepted results. This results in variable-sized batches of valid data. |
PPO Configuration#
Configuration for Proximal Policy Optimization (PPO) reinforcement learning experiments.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Generation hyperparameters for evaluation. If None, use gconfig. |
|
Required |
- |
|
|
Required |
- |
|
|
|
|
- |
|
|
|
- |
|
boolean |
|
Enable dynamic batch sizing in prepare_batch. When True, batch collection stops when (accepted + rejected) >= batch_size, returning only accepted results. This results in variable-sized batches of valid data. |
RW Configuration#
Configuration for Reward Model (RW) training experiments.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
SFT Configuration#
Configuration for Supervised Fine-Tuning (SFT) experiments.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
FSDPEngine Configuration#
Configuration for Fully Sharded Data Parallel (FSDP) training backend.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
FSDP wrap policy, specifying model layers to wrap. |
|
boolean |
|
Whether to offload FSDP parameters to CPU. |
FSDPWrapPolicy#
Policy configuration for FSDP model layer wrapping. None defaults to wrapping transformer decoder layers defined by transformers.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
list of string | None |
|
A list of transformer layer names for FSDP to wrap. |
MicroBatch Specification#
Specification for splitting micro-batches during training.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer | None |
|
Number of micro-batches (or minimum number if max_tokens_per_mb is set). Used when max_tokens_per_mb is None or as minimum count |
|
integer |
|
Granularity of each micro-batch. Adjacent sequences are grouped by this size when dividing microbatches. |
|
integer | None |
|
Maximum tokens per micro-batch for each forward pass. When set, n_mbs becomes the minimum number of micro-batches. |
|
integer |
|
Divisor for the number of micro-batches. The final number of micro-batches will be adjusted to be divisible by this value. |
Norm Configuration#
Configuration for reward/advantage normalization.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
Mean level for normalization. None for no mean normalization. Choices: |
|
boolean |
|
Whether to use leave-one-out average. |
|
string | None |
|
Standard deviation level for normalization. None for no std normalization. Choices: |
|
boolean |
|
Whether to use unbiased standard deviation computation. Defaults to True (changed from False in v0.3.4). |
|
float |
|
The eps when dividing by standard deviation to avoid numerical issues. |
|
integer |
|
Group size for group-level normalization |
Optimizer Configuration#
Configuration for model optimization during training.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Optimizer type. Adam_bf16 currently only supported FSDP Engine. Choices: |
|
float |
|
Learning rate |
|
float |
|
Weight decay |
|
float |
|
Adam beta1 parameter. Only effective when optimizer_type is adam/adam_bf16 |
|
float |
|
Adam beta2 parameter. Only effective when optimizer_type is adam/adam_bf16 |
|
float |
|
Adam epsilon parameter. Only effective when optimizer_type is adam/adam_bf16 |
|
float |
|
Minimum learning rate ratio after annealing |
|
string |
|
Learning rate scheduler type Choices: |
|
float |
|
Proportion of training steps for warmup |
|
boolean |
|
Enable optimizer state offloading |
|
float |
|
Initial loss scaling factor |
|
float |
|
Minimum loss scaling factor |
|
float |
|
Window size for loss scaling adjustment |
|
integer |
|
Hysteresis (scaling factor) for loss scaling |
|
float |
|
Gradient clipping threshold |
PPOActor Configuration#
Configuration for PPO actor model, a subclass of a TrainEngine.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
|
Path to HuggingFace checkpoint |
|
string |
|
Attention implementation for huggingface transformers model. Choices: |
|
boolean |
|
Initialize model weights randomly |
|
boolean |
|
Whether to use a critic/reward model |
|
float |
|
Temperature during generation. |
|
Required |
- |
|
|
boolean |
|
Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training. |
|
boolean |
|
Disable dropout layers during training |
|
boolean |
|
Enable gradient checkpointing |
|
string |
|
Parameter data type. |
|
string |
|
Gradient reduction data type. |
|
|
|
Optimizer configuration. None means no training. |
|
string |
|
Weight update backend type. Choices: |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
boolean |
|
Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
|
integer |
|
lora rank |
|
integer |
|
lora alpha |
|
list of string |
Required |
lora target_modules. |
|
string |
|
peft method type. Only LoRA is supported for now. |
|
|
Required |
Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController. |
|
|
integer |
|
Number of minibatches for each PPO update |
|
float |
|
Clipping factor for policy ratio |
|
float | None |
|
Clipping factor (higher value) for policy ratio. Default is None. When eps_clip_higher is set (decoupled), eps_clip will be used as the lower value. |
|
float | None |
|
Dual clipping factor for policy ratio, must be > 1.0. None disables dual clipping. |
|
float | None |
|
The second momentum threshold for M2PO. |
|
|
|
Normalization configuration for rewards |
|
float |
|
Reward scaling factor |
|
float |
|
Reward bias |
|
float |
|
Maximum absolute value for reward clipping |
|
boolean |
|
Penalty for overlong sequences. Used within DAPO. |
|
integer | None |
|
Number of tokens in the tail that will receive a penalty |
|
float | None |
|
Penalty factor for tokens in the tail |
|
boolean |
|
Mask truncated generations (no EOS token) and exclude from training |
|
float |
|
Discount factor for future rewards |
|
float |
|
Lambda parameter for GAE |
|
|
|
Normalization configuration for advantages. |
|
float |
|
KL divergence coefficient |
|
string |
|
KL divergence estimator Choices: |
|
boolean |
|
Use SAPO loss (mutually exclusive with PPO clipping) |
|
float |
|
SAPO temperature for positive advantages |
|
float |
|
SAPO temperature for negative advantages |
|
boolean |
|
Recompute log probability and replace the log probability returned by inference. |
|
boolean |
|
Use the decoupled loss. Implicitly enables recompute_logprob. |
|
float | None |
|
Filter out tokens where behav_imp_weight exceeds behav_imp_weight_cap when computing loss. Must be > 1.0. use_decoupled_loss must be true. |
|
string |
|
Level at which to compute importance sampling ratios. ‘token’: per-token ratios (standard PPO). ‘sequence’: sequence-level geometric mean of per-token ratios (GSPO). Choices: |
|
string |
|
Method for computing proximal policy log-probabilities in decoupled PPO. Only effective when use_decoupled_loss=True. Options: ‘recompute’ (default): Standard decoupled PPO, recompute proximal policy via forward pass. ‘loglinear’: Use log-linear interpolation to approximate proximal policy (skip forward pass). ‘metrics’: Like ‘recompute’, but also compute approximation metrics for evaluation. Choices: |
|
boolean |
|
Log statistics for agent trajectories |
|
list of string |
Required |
Keys for logging agent trajectory statistics |
|
integer |
|
Maximum number of new tokens to generate |
PPOCritic Configuration#
Configuration for PPO critic model, a subclass of a TrainEngine.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
|
Path to HuggingFace checkpoint |
|
string |
|
Attention implementation for huggingface transformers model. Choices: |
|
boolean |
|
Initialize model weights randomly |
|
boolean |
|
Whether to use a critic/reward model |
|
float |
|
Temperature during generation. |
|
Required |
- |
|
|
boolean |
|
Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training. |
|
boolean |
|
Disable dropout layers during training |
|
boolean |
|
Enable gradient checkpointing |
|
string |
|
Parameter data type. |
|
string |
|
Gradient reduction data type. |
|
|
|
Optimizer configuration. None means no training. |
|
string |
|
Weight update backend type. Choices: |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
boolean |
|
Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
|
integer |
|
lora rank |
|
integer |
|
lora alpha |
|
list of string |
Required |
lora target_modules. |
|
string |
|
peft method type. Only LoRA is supported for now. |
|
|
Required |
Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController. |
|
|
integer |
|
Number of minibatches for each PPO update |
|
float |
|
Clipping factor for value loss |
|
boolean |
|
Mask truncated generations (no EOS token) and exclude from training |
TrainEngine Configuration#
Core configuration for model training, including optimization and backend settings.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
|
Path to HuggingFace checkpoint |
|
string |
|
Attention implementation for huggingface transformers model. Choices: |
|
boolean |
|
Initialize model weights randomly |
|
boolean |
|
Whether to use a critic/reward model |
|
float |
|
Temperature during generation. |
|
Required |
- |
|
|
boolean |
|
Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training. |
|
boolean |
|
Disable dropout layers during training |
|
boolean |
|
Enable gradient checkpointing |
|
string |
|
Parameter data type. |
|
string |
|
Gradient reduction data type. |
|
|
|
Optimizer configuration. None means no training. |
|
string |
|
Weight update backend type. Choices: |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
boolean |
|
Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
|
integer |
|
lora rank |
|
integer |
|
lora alpha |
|
list of string |
Required |
lora target_modules. |
|
string |
|
peft method type. Only LoRA is supported for now. |
|
|
Required |
Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController. |
GenerationHyperparameters#
Controls text generation behavior for rollout.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
|
Number of sequences to generate per prompt. |
|
integer |
|
Maximum number of tokens to generate. |
|
integer |
|
Minimum number of tokens to generate. |
|
integer |
|
Maximum number of tokens including prompt and generated tokens. |
|
boolean |
|
Whether to use greedy decoding (max probability). |
|
float |
|
Nucleus sampling probability threshold (0.0, 1.0]. |
|
integer |
|
Number of highest probability tokens to consider. |
|
float |
|
Sampling temperature. Higher values increase diversity. |
|
list of integer |
Required |
Stop generation when encountering these token IDs. |
|
boolean |
|
Do not stop generation when EOS is encountered. |
|
boolean |
|
Skip special tokens when decoding/displaying outputs. |
|
list of string | None |
|
One or multiple stop words. Generation will stop if one of these words is sampled. |
|
float |
|
Penalizes tokens based on their frequency in generation so far. Must be between -2 and 2 where negative numbers encourage repetition. |
|
string |
|
Lora name to be used for this generation. |
|
boolean |
|
Enable beam search in the vLLM engine. When enabled, sampling parameters like temperature, top-p, and top-k are auto ignored. |
InferenceEngine Configuration#
Configuration for inference servers, including offpolicyness control.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
Root directory for logs and trajectory dumps. |
|
integer | None |
|
Maximum number of concurrent rollouts to the inference engine. Defaults to consumer_batch_size. |
|
integer | None |
|
Input/Output queue size for async rollout. |
|
integer |
|
Batch size for consuming rollouts from the queue. |
|
integer |
|
Maximum off-policyness for the head. If the current version is more than this many versions behind, the request will not be accepted. |
|
boolean |
|
Whether to output verbose tracing messages for each generation request. |
|
boolean |
|
Whether to check the format of produced trajectories of a customized workflow. Useful when debugging the workflow in isolation. Should be False during RL training. |
|
string |
|
Request scheduling policy Choices: |
|
string |
|
Path to tokenizer for trajectory text decoding. |
|
boolean |
|
Whether to dump the trajectories to files under fileroot. |
|
float |
|
Timeout in seconds of connecting to remote servers or launching local servers. |
|
float |
|
Timeout for HTTP requests. |
|
integer |
|
Number of retries for failed requests. |
|
float |
|
The grace period after calling /pause_generation. Wait until all requests have been dropped. |
|
|
Required |
inference engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the RolloutController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the RolloutController. |
|
|
boolean |
|
Whether to use LoRA. Should be same as actors LORA option. |
|
|
|
OpenAI proxy configuration (used when workflow is AgentWorkflow). |
SGLang Configuration#
Configuration for SGLang runtime. Refer to:
sgl-project/sglang for detailed documentation.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
integer | None |
|
- |
|
list of integer | None |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
string | None |
|
- |
|
boolean |
|
- |
|
string | None |
|
- |
|
integer | None |
|
- |
|
float | None |
|
- |
|
integer | None |
|
- |
|
integer | None |
|
- |
|
integer |
|
- |
|
string |
|
- |
|
float |
|
- |
|
integer |
|
- |
|
string |
|
- |
|
string |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
boolean | None |
|
- |
|
integer | None |
|
- |
|
list of string | None |
|
- |
|
list of string | None |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string | None |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
vLLM Configuration#
Configuration for vLLM runtime. Refer to:
https://docs.vllm.ai/en/stable/api/index.html for detailed documentation.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
string |
|
- |
|
string |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
float |
|
- |
|
boolean |
|
- |
|
integer | None |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
float |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
string |
|
- |
TrainDataset Configuration#
Configuration for training dataset loading and preprocessing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to the dataset. Can be a local path or a HuggingFace dataset name. |
|
string |
Required |
Type of training method, e.g., ‘sft’, ‘rl’, etc. |
|
integer |
|
Batch size for the dataloader |
|
boolean |
|
Whether to shuffle the dataset |
|
boolean |
|
Pin memory for faster data loading (set True for GPU training) |
|
integer |
|
Number of worker processes for data loading |
|
boolean |
|
Drop the last incomplete batch |
|
integer | None |
|
Maximum token length of sequences in dataset. Longer sequences are filtered out. |
ValidDataset Configuration#
Configuration for validation dataset loading and preprocessing.
It has different default values with TrainDatasetConfig. shuffle and drop_last
default to False.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to the dataset. Can be a local path or a HuggingFace dataset name. |
|
string |
Required |
Type of training method, e.g., ‘sft’, ‘rl’, etc. |
|
integer |
|
Batch size for the dataloader |
|
boolean |
|
Whether to shuffle the dataset |
|
boolean |
|
Pin memory for faster data loading (set True for GPU training) |
|
integer |
|
Number of worker processes for data loading |
|
boolean |
|
Drop the last incomplete batch |
|
integer | None |
|
Maximum token length of sequences in dataset. Longer sequences are filtered out. |
Cluster Specification Configuration#
Configuration for cluster specification and distributed computing setup.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
Required |
Name resolving configuration. |
|
|
string |
|
Name of the cluster. Used to set specific environs. |
|
string |
|
Root for logs and checkpoints. Should be available on all nodes. |
|
integer |
|
The size of the cluster. Used to decide slurm hostname suffix. |
|
integer |
|
Number of GPUs per node (physical). |
NameResolve Configuration#
Configuration for distributed name resolution and service discovery.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Type of the distributed KV store for name resolving. Choices: |
|
string |
|
Record root for NFS name resolving. Should be available on all nodes. |
|
string |
|
Address of the ETCD3 server. |
|
string |
|
Name of the distributed Ray KV store. |
Evaluator Configuration#
Configuration for model evaluation scheduling and timing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
integer | None |
|
Trigger frequency in epochs. None disables epoch-based saving. |
|
integer | None |
|
Trigger frequency in steps. None disables step-based saving. |
|
integer | None |
|
Trigger frequency in seconds. None disables time-based saving. |
Recover Configuration#
Configuration for experiment recovery and fault tolerance.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
integer | None |
|
Trigger frequency in epochs. None disables epoch-based saving. |
|
integer | None |
|
Trigger frequency in steps. None disables step-based saving. |
|
integer | None |
|
Trigger frequency in seconds. None disables time-based saving. |
|
string |
|
Recovery mode for the launcher. Options: ‘disabled’: Never recover from previous runs. ‘auto’: Automatically recover from previous runs if recover info and checkpoints are available. ‘fault’: Only recover from previous runs if the new run fails. ‘resume’: Force to resume, raise an error if no recover info was found. Never resume if failed again. |
|
integer |
|
Number of recovery retries (auto/fault modes only). |
Saver Configuration#
Configuration for model checkpoint saving scheduling and timing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
integer | None |
|
Trigger frequency in epochs. None disables epoch-based saving. |
|
integer | None |
|
Trigger frequency in steps. None disables step-based saving. |
|
integer | None |
|
Trigger frequency in seconds. None disables time-based saving. |
StatsLogger Configuration#
Configuration for experiment statistics logging and tracking services.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
Required |
Weights & Biases configuration. |
|
|
Required |
SwanLab configuration. |
|
|
Required |
TensorBoard configuration. Only ‘path’ field required. |
Swanlab Configuration#
Configuration for SwanLab experiment tracking and monitoring.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
|
string | None |
|
- |
|
|
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
TensorBoard Configuration#
Configuration for TensorBoard logging and visualization.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
WandB Configuration#
Configuration for Weights & Biases experiment tracking.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
list of string | None |
|
- |
|
|
|
- |
|
string | None |
|
- |
ArchonEngine Configuration#
Configuration for Archon Engine training backend.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Attention backend type. Choices: |
|
boolean |
|
Whether to offload FSDP parameters to CPU. |
|
boolean |
|
Enable torch.compile for TransformerBlocks. |
|
string |
|
Activation checkpointing granularity. Choices: |
|
integer |
|
For selective recompute: checkpoint every N layers. Set to 0 for op-level selective checkpointing. |
DistributedDataParallel Configuration#
Configuration for Megatron’s DistributedDataParallel.
Refer to Megatron-LM documentation for details.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer | None |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
FP8Engine Configuration#
Configuration for FP8 (8-bit floating point) training.
This configuration encapsulates all FP8-related parameters and can be reused across different engines (e.g., Megatron, FSDP). When None in the parent config, FP8 training is disabled.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
FP8 precision mode. Options: ‘e4m3’ (uniform e4m3), ‘hybrid’ (e4m3 for activations/weights, e5m2 for output activation gradients). |
|
string |
|
FP8 scaling recipe. Options: ‘tensorwise’, ‘delayed’, ‘mxfp8’ (Blackwell only), ‘blockwise’. |
|
boolean |
|
Keep parameters in FP8 precision to save memory. Not all parameters will be converted to fp8; for example, biases will remain unchanged. |
|
integer |
|
Margin for FP8 scaling factor computation. |
|
integer |
|
Length of amax history window for scaling factor computation. |
|
string |
|
Algorithm for choosing amax value. Options: ‘max’ (largest in history window), ‘most_recent’. |
|
boolean |
|
When False, override FP8 config and compute weight gradients in higher precision. |
|
boolean |
|
Use FP8 implementation of Dot Product Attention. |
|
boolean |
|
Use FP8 implementation of Multi Head Attention. |
|
boolean |
|
Reduce FP8 AMAX only in TP or TP-CP domain. |
|
boolean |
|
Retain first and last N TransformerBlocks in BF16 instead of FP8. |
|
integer |
|
Number of layers at start to keep in BF16 when first_last_layers_bf16 is True. |
|
integer |
|
Number of layers at end to keep in BF16 when first_last_layers_bf16 is True. |
|
boolean |
|
Whether to use direct FP8 conversion during weight updates and save/load. When True, FP8 parameters are directly converted between TE FP8 and PyTorch FP8 without intermediate dequantization/quantization. |
MegatronEngine Configuration#
Configuration for Megatron-LM training framework.
Refer to Megatron-LM documentation for implementation details.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
Required |
- |
|
|
integer |
|
Virtual pipeline parallel size for Megatron interleaved schedule. Set to >1 to enable VPP. Default is 1 (disabled). |
|
boolean |
|
- |
|
boolean |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
integer | None |
|
- |
|
boolean | None |
|
- |
|
list of string | None |
|
- |
|
string | None |
|
- |
|
boolean |
|
Enable overlapping between shared expert computations and dispatcher communications. Without this, the shared experts execute after the routed experts. |
|
boolean |
|
- |
|
string |
|
Type of token dispatcher. Options: ‘allgather’,’alltoall’ and ‘flex’. |
|
boolean |
|
Fuse token rearrangement ops during token dispatching. |
|
|
|
- |
|
boolean |
|
Enable tree training with flex attention module. |
OpenAIProxy Configuration#
Configuration for OpenAI proxy when using AgentWorkflow workflows.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
OpenAI proxy mode: ‘inline’ (in-process) or ‘subproc’ (subprocess). |
|
string |
|
Parser for tool calls in model output. |
|
string |
|
Parser for reasoning content ( |
|
string |
|
Chat template type: ‘hf’ (standard) or ‘concat’ (multi-turn concatenation). Choices: |
|
integer | None |
|
Maximum total tokens for the engine (prompt + completion). |
|
float |
|
Discount factor for multi-turn reward propagation. |
|
string |
|
Export style: ‘individual’ (all interactions) or ‘concat’ (leaf nodes only). Choices: |
|
integer |
|
Maximum number of worker processes for subprocess mode execution pool. |
|
integer |
|
Session timeout in seconds. Sessions inactive longer than this will be garbage collected. |
PerfTracer Configuration#
Configuration for perf tracer emission.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
boolean |
|
Explicitly enable or disable perf tracing. Set to true to capture perf traces. |
|
integer |
|
Flush trace events to disk every N calls to save(step=…). A value of 1 writes on every step; values <= 0 fall back to 1. |
|
list of integer | None |
|
List of step numbers at which to capture detailed profiling traces. If None, no detailed profiling traces are captured. |
|
|
|
Session tracing configuration. |
Scheduler Configuration#
Configuration for worker scheduling. Used in the single-controller mode. Experimental.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
|
Required |
- |
|
string |
|
- |
|
string |
|
- |
Scheduling Specification#
Configuration class: SchedulingSpec
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
|
Number of CPU cores required per GPU |
|
integer |
|
Number of GPU units required. Used only when allocating pods. |
|
integer |
|
Amount of memory (GB) required per GPU |
|
integer |
|
Number of ports to expose |
|
string |
|
Docker/Singularity container image to use. Currently only used by Slurm. Will be potentially used by Kubernetes in the future. |
|
string |
|
Task type (e.g., worker, engine) Choices: |
|
|
Required |
Environment variables for the container |
|
string | None |
|
Command to execute inside the container. Defaults to AReaL’s RPC server. |
|
string |
|
Additional arguments to pass to the srun command. Only used by slurm. |
|
list of string | None |
|
Additional bash commands to setup the container before running the torchrun command. Only used by slurm. |
|
string |
|
Type of containers used in slurm Choices: |
|
string |
|
Mount path for slurm. |
|
string | None |
|
sbatch/srun’s |
|
string | None |
|
sbatch/srun’s |
SchedulingStrategy#
Configuration class: SchedulingStrategy
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- Choices: |
|
string | None |
|
The target role to be colocated with |
|
boolean |
|
When True with colocation, the target worker spawns a new process on the same node/GPUs instead of sharing its process. Provides process isolation while sharing GPU resources. |
SessionTracer Configuration#
Configuration for per-session lifecycle tracing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
Enable per-session lifecycle tracing alongside perf events. When true, session metadata is captured to sessions.jsonl. |
|
integer |
|
Flush session trace records once this many entries are ready. Values <= 0 fall back to 1. |