Configurations#

This page provides a comprehensive reference for all configuration parameters available in AReaL’s command-line interface. These parameters are defined using dataclasses and can be specified in YAML configuration files or overridden via command line arguments.

Usage#

Configuration files are specified using the --config parameter:

python -m areal.launcher --config path/to/config.yaml

You can override specific parameters from the command line:

python -m areal.launcher --config path/to/config.yaml actor.lr=1e-4 seed=42

For detailed examples, see the experiment configurations in the examples/ directory.

Table of Contents#

BaseExperiment Configuration#

Base configuration class for all experiment types with common settings.

Parameter	Type	Default	Description
`experiment_name`	string	Required	Name of the experiment (no ‘_’ or ‘/’). Required.
`trial_name`	string	Required	Name of the trial (no ‘-’ or ‘/’). Required.
`cluster`	`ClusterSpecConfig`	Required	Cluster specification. Mainly used by slurm.
`allocation_mode`	string	`""`	Pattern-based GPU parallel strategy allocation mode.
`seed`	integer	`1`	Random seed for reproducibility.
`enable_offload`	boolean	`False`	Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD).
`total_train_epochs`	integer	`1`	Total number of epochs to train the model.
`total_train_steps`	integer \| None	`None`	Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training.
`total_train_n_seqs`	integer \| None	`None`	Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training.
`tokenizer_path`	string	`""`	Path to the tokenizer.
`train_dataset`	`TrainDatasetConfig`	Required	-
`valid_dataset`	`ValidDatasetConfig` \| None	`None`	-
`saver`	`SaverConfig`	Required	-
`evaluator`	`EvaluatorConfig`	Required	-
`stats_logger`	`StatsLoggerConfig`	Required	-
`perf_tracer`	`PerfTracerConfig` \| None	`None`	Performance tracer configuration. None means disabled.
`recover`	`RecoverConfig`	Required	-
`sglang`	`SGLangConfig`	Required	-
`vllm`	`vLLMConfig`	Required	-
`scheduler`	`SchedulerConfig`	Required	-

GRPO Configuration#

A dummy place holder of GRPO config for backward compatibility.

Parameter	Type	Default	Description
`experiment_name`	string	Required	Name of the experiment (no ‘_’ or ‘/’). Required.
`trial_name`	string	Required	Name of the trial (no ‘-’ or ‘/’). Required.
`cluster`	`ClusterSpecConfig`	Required	Cluster specification. Mainly used by slurm.
`allocation_mode`	string	`""`	Pattern-based GPU parallel strategy allocation mode.
`seed`	integer	`1`	Random seed for reproducibility.
`enable_offload`	boolean	`False`	Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD).
`total_train_epochs`	integer	`1`	Total number of epochs to train the model.
`total_train_steps`	integer \| None	`None`	Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training.
`total_train_n_seqs`	integer \| None	`None`	Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training.
`tokenizer_path`	string	`""`	Path to the tokenizer.
`train_dataset`	`TrainDatasetConfig`	Required	-
`valid_dataset`	`ValidDatasetConfig` \| None	`None`	-
`saver`	`SaverConfig`	Required	-
`evaluator`	`EvaluatorConfig`	Required	-
`stats_logger`	`StatsLoggerConfig`	Required	-
`perf_tracer`	`PerfTracerConfig` \| None	`None`	Performance tracer configuration. None means disabled.
`recover`	`RecoverConfig`	Required	-
`sglang`	`SGLangConfig`	Required	-
`vllm`	`vLLMConfig`	Required	-
`scheduler`	`SchedulerConfig`	Required	-
`gconfig`	`GenerationHyperparameters`	Required	-
`eval_gconfig`	`GenerationHyperparameters` \| None	`None`	Generation hyperparameters for evaluation. If None, use gconfig.
`rollout`	`InferenceEngineConfig`	Required	-
`actor`	`PPOActorConfig`	Required	-
`ref`	`PPOActorConfig` \| None	`None`	-
`critic`	`PPOCriticConfig` \| None	`None`	-
`dynamic_bs`	boolean	`False`	Enable dynamic batch sizing in prepare_batch. When True, batch collection stops when (accepted + rejected) >= batch_size, returning only accepted results. This results in variable-sized batches of valid data.

PPO Configuration#

Configuration for Proximal Policy Optimization (PPO) reinforcement learning experiments.

Parameter	Type	Default	Description
`experiment_name`	string	Required	Name of the experiment (no ‘_’ or ‘/’). Required.
`trial_name`	string	Required	Name of the trial (no ‘-’ or ‘/’). Required.
`cluster`	`ClusterSpecConfig`	Required	Cluster specification. Mainly used by slurm.
`allocation_mode`	string	`""`	Pattern-based GPU parallel strategy allocation mode.
`seed`	integer	`1`	Random seed for reproducibility.
`enable_offload`	boolean	`False`	Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD).
`total_train_epochs`	integer	`1`	Total number of epochs to train the model.
`total_train_steps`	integer \| None	`None`	Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training.
`total_train_n_seqs`	integer \| None	`None`	Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training.
`tokenizer_path`	string	`""`	Path to the tokenizer.
`train_dataset`	`TrainDatasetConfig`	Required	-
`valid_dataset`	`ValidDatasetConfig` \| None	`None`	-
`saver`	`SaverConfig`	Required	-
`evaluator`	`EvaluatorConfig`	Required	-
`stats_logger`	`StatsLoggerConfig`	Required	-
`perf_tracer`	`PerfTracerConfig` \| None	`None`	Performance tracer configuration. None means disabled.
`recover`	`RecoverConfig`	Required	-
`sglang`	`SGLangConfig`	Required	-
`vllm`	`vLLMConfig`	Required	-
`scheduler`	`SchedulerConfig`	Required	-
`gconfig`	`GenerationHyperparameters`	Required	-
`eval_gconfig`	`GenerationHyperparameters` \| None	`None`	Generation hyperparameters for evaluation. If None, use gconfig.
`rollout`	`InferenceEngineConfig`	Required	-
`actor`	`PPOActorConfig`	Required	-
`ref`	`PPOActorConfig` \| None	`None`	-
`critic`	`PPOCriticConfig` \| None	`None`	-
`dynamic_bs`	boolean	`False`	Enable dynamic batch sizing in prepare_batch. When True, batch collection stops when (accepted + rejected) >= batch_size, returning only accepted results. This results in variable-sized batches of valid data.

RW Configuration#

Configuration for Reward Model (RW) training experiments.

Parameter	Type	Default	Description
`experiment_name`	string	Required	Name of the experiment (no ‘_’ or ‘/’). Required.
`trial_name`	string	Required	Name of the trial (no ‘-’ or ‘/’). Required.
`cluster`	`ClusterSpecConfig`	Required	Cluster specification. Mainly used by slurm.
`allocation_mode`	string	`""`	Pattern-based GPU parallel strategy allocation mode.
`seed`	integer	`1`	Random seed for reproducibility.
`enable_offload`	boolean	`False`	Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD).
`total_train_epochs`	integer	`1`	Total number of epochs to train the model.
`total_train_steps`	integer \| None	`None`	Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training.
`total_train_n_seqs`	integer \| None	`None`	Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training.
`tokenizer_path`	string	`""`	Path to the tokenizer.
`train_dataset`	`TrainDatasetConfig`	Required	-
`valid_dataset`	`ValidDatasetConfig` \| None	`None`	-
`saver`	`SaverConfig`	Required	-
`evaluator`	`EvaluatorConfig`	Required	-
`stats_logger`	`StatsLoggerConfig`	Required	-
`perf_tracer`	`PerfTracerConfig` \| None	`None`	Performance tracer configuration. None means disabled.
`recover`	`RecoverConfig`	Required	-
`sglang`	`SGLangConfig`	Required	-
`vllm`	`vLLMConfig`	Required	-
`scheduler`	`SchedulerConfig`	Required	-
`actor`	`TrainEngineConfig`	Required	-

SFT Configuration#

Configuration for Supervised Fine-Tuning (SFT) experiments.

Parameter	Type	Default	Description
`experiment_name`	string	Required	Name of the experiment (no ‘_’ or ‘/’). Required.
`trial_name`	string	Required	Name of the trial (no ‘-’ or ‘/’). Required.
`cluster`	`ClusterSpecConfig`	Required	Cluster specification. Mainly used by slurm.
`allocation_mode`	string	`""`	Pattern-based GPU parallel strategy allocation mode.
`seed`	integer	`1`	Random seed for reproducibility.
`enable_offload`	boolean	`False`	Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD).
`total_train_epochs`	integer	`1`	Total number of epochs to train the model.
`total_train_steps`	integer \| None	`None`	Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training.
`total_train_n_seqs`	integer \| None	`None`	Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training.
`tokenizer_path`	string	`""`	Path to the tokenizer.
`train_dataset`	`TrainDatasetConfig`	Required	-
`valid_dataset`	`ValidDatasetConfig` \| None	`None`	-
`saver`	`SaverConfig`	Required	-
`evaluator`	`EvaluatorConfig`	Required	-
`stats_logger`	`StatsLoggerConfig`	Required	-
`perf_tracer`	`PerfTracerConfig` \| None	`None`	Performance tracer configuration. None means disabled.
`recover`	`RecoverConfig`	Required	-
`sglang`	`SGLangConfig`	Required	-
`vllm`	`vLLMConfig`	Required	-
`scheduler`	`SchedulerConfig`	Required	-
`actor`	`TrainEngineConfig`	Required	-

FSDPEngine Configuration#

Configuration for Fully Sharded Data Parallel (FSDP) training backend.

Parameter	Type	Default	Description
`wrap_policy`	`FSDPWrapPolicy` \| None	`None`	FSDP wrap policy, specifying model layers to wrap.
`offload_params`	boolean	`False`	Whether to offload FSDP parameters to CPU.

FSDPWrapPolicy#

Policy configuration for FSDP model layer wrapping. None defaults to wrapping transformer decoder layers defined by transformers.

Parameter	Type	Default	Description
`transformer_layer_cls_to_wrap`	list of string \| None	`None`	A list of transformer layer names for FSDP to wrap.

MicroBatch Specification#

Specification for splitting micro-batches during training.

Parameter	Type	Default	Description
`n_mbs`	integer \| None	`1`	Number of micro-batches (or minimum number if max_tokens_per_mb is set). Used when max_tokens_per_mb is None or as minimum count
`granularity`	integer	`1`	Granularity of each micro-batch. Adjacent sequences are grouped by this size when dividing microbatches.
`max_tokens_per_mb`	integer \| None	`None`	Maximum tokens per micro-batch for each forward pass. When set, n_mbs becomes the minimum number of micro-batches.
`n_mbs_divisor`	integer	`1`	Divisor for the number of micro-batches. The final number of micro-batches will be adjusted to be divisible by this value.

Norm Configuration#

Configuration for reward/advantage normalization.

Parameter	Type	Default	Description
`mean_level`	string \| None	`"batch"`	Mean level for normalization. None for no mean normalization. Choices: `batch`, `group`, `None`
`mean_leave1out`	boolean	`False`	Whether to use leave-one-out average.
`std_level`	string \| None	`"batch"`	Standard deviation level for normalization. None for no std normalization. Choices: `batch`, `group`, `None`
`std_unbiased`	boolean	`True`	Whether to use unbiased standard deviation computation. Defaults to True (changed from False in v0.3.4).
`eps`	float	`1e-05`	The eps when dividing by standard deviation to avoid numerical issues.
`group_size`	integer	`1`	Group size for group-level normalization

Optimizer Configuration#

Configuration for model optimization during training.

Parameter	Type	Default	Description
`type`	string	`"adam"`	Optimizer type. Adam_bf16 currently only supported FSDP Engine. Choices: `adam`, `sgd`, `adam_bf16`
`lr`	float	`0.001`	Learning rate
`weight_decay`	float	`0.01`	Weight decay
`beta1`	float	`0.9`	Adam beta1 parameter. Only effective when optimizer_type is adam/adam_bf16
`beta2`	float	`0.999`	Adam beta2 parameter. Only effective when optimizer_type is adam/adam_bf16
`eps`	float	`1e-08`	Adam epsilon parameter. Only effective when optimizer_type is adam/adam_bf16
`min_lr_ratio`	float	`0.0`	Minimum learning rate ratio after annealing
`lr_scheduler_type`	string	`"constant"`	Learning rate scheduler type Choices: `linear`, `cosine`, `constant`
`warmup_steps_proportion`	float	`0.001`	Proportion of training steps for warmup
`offload`	boolean	`False`	Enable optimizer state offloading
`initial_loss_scale`	float	`4294967296`	Initial loss scaling factor
`min_loss_scale`	float	`1.0`	Minimum loss scaling factor
`loss_scale_window`	float	`5`	Window size for loss scaling adjustment
`hysteresis`	integer	`2`	Hysteresis (scaling factor) for loss scaling
`gradient_clipping`	float	`1.0`	Gradient clipping threshold

PPOActor Configuration#

Configuration for PPO actor model, a subclass of a TrainEngine.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`path`	string	`""`	Path to HuggingFace checkpoint
`attn_impl`	string	`"flash_attention_2"`	Attention implementation for huggingface transformers model. Choices: `flash_attention_2`
`init_from_scratch`	boolean	`False`	Initialize model weights randomly
`is_critic`	boolean	`False`	Whether to use a critic/reward model
`temperature`	float	`1.0`	Temperature during generation.
`mb_spec`	`MicroBatchSpec`	Required	-
`pad_to_maximum`	boolean	`False`	Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training.
`disable_dropout`	boolean	`False`	Disable dropout layers during training
`gradient_checkpointing`	boolean	`False`	Enable gradient checkpointing
`dtype`	string	`"bfloat16"`	Parameter data type.
`grad_reduce_dtype`	string	`"float32"`	Gradient reduction data type.
`optimizer`	`OptimizerConfig` \| None	`None`	Optimizer configuration. None means no training.
`weight_update_mode`	string	`"xccl"`	Weight update backend type. Choices: `disk`, `xccl`
`fsdp`	`FSDPEngineConfig`	Required	-
`archon`	`ArchonEngineConfig`	Required	-
`megatron`	`MegatronEngineConfig`	Required	-
`use_lora`	boolean	`False`	Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang.
`lora_rank`	integer	`32`	lora rank
`lora_alpha`	integer	`16`	lora alpha
`target_modules`	list of string	Required	lora target_modules.
`peft_type`	string	`"lora"`	peft method type. Only LoRA is supported for now.
`scheduling_spec`	`tuple`	Required	Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController.
`scheduling_strategy`	`SchedulingStrategy`	Required	The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController.
`ppo_n_minibatches`	integer	`4`	Number of minibatches for each PPO update
`eps_clip`	float	`0.2`	Clipping factor for policy ratio
`eps_clip_higher`	float \| None	`None`	Clipping factor (higher value) for policy ratio. Default is None. When eps_clip_higher is set (decoupled), eps_clip will be used as the lower value.
`c_clip`	float \| None	`None`	Dual clipping factor for policy ratio, must be > 1.0. None disables dual clipping.
`m2_threshold`	float \| None	`None`	The second momentum threshold for M2PO.
`reward_norm`	`NormConfig` \| None	`None`	Normalization configuration for rewards
`reward_scaling`	float	`1.0`	Reward scaling factor
`reward_bias`	float	`0.0`	Reward bias
`reward_clip`	float	`20.0`	Maximum absolute value for reward clipping
`overlong_reward_penalty`	boolean	`False`	Penalty for overlong sequences. Used within DAPO.
`overlong_tokens`	integer \| None	`None`	Number of tokens in the tail that will receive a penalty
`overlong_penalty_factor`	float \| None	`None`	Penalty factor for tokens in the tail
`mask_no_eos_with_zero`	boolean	`False`	Mask truncated generations (no EOS token) and exclude from training
`discount`	float	`1.0`	Discount factor for future rewards
`gae_lambda`	float	`1.0`	Lambda parameter for GAE
`adv_norm`	`NormConfig` \| None	`None`	Normalization configuration for advantages.
`kl_ctl`	float	`0.1`	KL divergence coefficient
`kl_estimator`	string	`"k1"`	KL divergence estimator Choices: `k1`, `k2`, `k3`
`use_sapo_loss`	boolean	`False`	Use SAPO loss (mutually exclusive with PPO clipping)
`sapo_tau_pos`	float	`1.0`	SAPO temperature for positive advantages
`sapo_tau_neg`	float	`1.05`	SAPO temperature for negative advantages
`recompute_logprob`	boolean	`False`	Recompute log probability and replace the log probability returned by inference.
`use_decoupled_loss`	boolean	`False`	Use the decoupled loss. Implicitly enables recompute_logprob.
`behav_imp_weight_cap`	float \| None	`None`	Filter out tokens where behav_imp_weight exceeds behav_imp_weight_cap when computing loss. Must be > 1.0. use_decoupled_loss must be true.
`importance_sampling_level`	string	`"token"`	Level at which to compute importance sampling ratios. ‘token’: per-token ratios (standard PPO). ‘sequence’: sequence-level geometric mean of per-token ratios (GSPO). Choices: `token`, `sequence`
`prox_logp_method`	string	`"recompute"`	Method for computing proximal policy log-probabilities in decoupled PPO. Only effective when use_decoupled_loss=True. Options: ‘recompute’ (default): Standard decoupled PPO, recompute proximal policy via forward pass. ‘loglinear’: Use log-linear interpolation to approximate proximal policy (skip forward pass). ‘metrics’: Like ‘recompute’, but also compute approximation metrics for evaluation. Choices: `recompute`, `loglinear`, `metrics`
`log_agent_stats`	boolean	`False`	Log statistics for agent trajectories
`log_agent_stats_keys`	list of string	Required	Keys for logging agent trajectory statistics
`max_new_tokens`	integer	`1024`	Maximum number of new tokens to generate

PPOCritic Configuration#

Configuration for PPO critic model, a subclass of a TrainEngine.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`path`	string	`""`	Path to HuggingFace checkpoint
`attn_impl`	string	`"flash_attention_2"`	Attention implementation for huggingface transformers model. Choices: `flash_attention_2`
`init_from_scratch`	boolean	`False`	Initialize model weights randomly
`is_critic`	boolean	`False`	Whether to use a critic/reward model
`temperature`	float	`1.0`	Temperature during generation.
`mb_spec`	`MicroBatchSpec`	Required	-
`pad_to_maximum`	boolean	`False`	Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training.
`disable_dropout`	boolean	`False`	Disable dropout layers during training
`gradient_checkpointing`	boolean	`False`	Enable gradient checkpointing
`dtype`	string	`"bfloat16"`	Parameter data type.
`grad_reduce_dtype`	string	`"float32"`	Gradient reduction data type.
`optimizer`	`OptimizerConfig` \| None	`None`	Optimizer configuration. None means no training.
`weight_update_mode`	string	`"xccl"`	Weight update backend type. Choices: `disk`, `xccl`
`fsdp`	`FSDPEngineConfig`	Required	-
`archon`	`ArchonEngineConfig`	Required	-
`megatron`	`MegatronEngineConfig`	Required	-
`use_lora`	boolean	`False`	Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang.
`lora_rank`	integer	`32`	lora rank
`lora_alpha`	integer	`16`	lora alpha
`target_modules`	list of string	Required	lora target_modules.
`peft_type`	string	`"lora"`	peft method type. Only LoRA is supported for now.
`scheduling_spec`	`tuple`	Required	Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController.
`scheduling_strategy`	`SchedulingStrategy`	Required	The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController.
`ppo_n_minibatches`	integer	`4`	Number of minibatches for each PPO update
`eps_clip`	float	`0.5`	Clipping factor for value loss
`mask_no_eos_with_zero`	boolean	`False`	Mask truncated generations (no EOS token) and exclude from training

TrainEngine Configuration#

Core configuration for model training, including optimization and backend settings.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`path`	string	`""`	Path to HuggingFace checkpoint
`attn_impl`	string	`"flash_attention_2"`	Attention implementation for huggingface transformers model. Choices: `flash_attention_2`
`init_from_scratch`	boolean	`False`	Initialize model weights randomly
`is_critic`	boolean	`False`	Whether to use a critic/reward model
`temperature`	float	`1.0`	Temperature during generation.
`mb_spec`	`MicroBatchSpec`	Required	-
`pad_to_maximum`	boolean	`False`	Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training.
`disable_dropout`	boolean	`False`	Disable dropout layers during training
`gradient_checkpointing`	boolean	`False`	Enable gradient checkpointing
`dtype`	string	`"bfloat16"`	Parameter data type.
`grad_reduce_dtype`	string	`"float32"`	Gradient reduction data type.
`optimizer`	`OptimizerConfig` \| None	`None`	Optimizer configuration. None means no training.
`weight_update_mode`	string	`"xccl"`	Weight update backend type. Choices: `disk`, `xccl`
`fsdp`	`FSDPEngineConfig`	Required	-
`archon`	`ArchonEngineConfig`	Required	-
`megatron`	`MegatronEngineConfig`	Required	-
`use_lora`	boolean	`False`	Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang.
`lora_rank`	integer	`32`	lora rank
`lora_alpha`	integer	`16`	lora alpha
`target_modules`	list of string	Required	lora target_modules.
`peft_type`	string	`"lora"`	peft method type. Only LoRA is supported for now.
`scheduling_spec`	`tuple`	Required	Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController.
`scheduling_strategy`	`SchedulingStrategy`	Required	The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController.

GenerationHyperparameters#

Controls text generation behavior for rollout.

Parameter	Type	Default	Description
`n_samples`	integer	`1`	Number of sequences to generate per prompt.
`max_new_tokens`	integer	`16384`	Maximum number of tokens to generate.
`min_new_tokens`	integer	`0`	Minimum number of tokens to generate.
`max_tokens`	integer	`32768`	Maximum number of tokens including prompt and generated tokens.
`greedy`	boolean	`False`	Whether to use greedy decoding (max probability).
`top_p`	float	`1.0`	Nucleus sampling probability threshold (0.0, 1.0].
`top_k`	integer	`100000000`	Number of highest probability tokens to consider.
`temperature`	float	`1.0`	Sampling temperature. Higher values increase diversity.
`stop_token_ids`	list of integer	Required	Stop generation when encountering these token IDs.
`ignore_eos`	boolean	`False`	Do not stop generation when EOS is encountered.
`skip_special_tokens`	boolean	`True`	Skip special tokens when decoding/displaying outputs.
`stop`	list of string \| None	`None`	One or multiple stop words. Generation will stop if one of these words is sampled.
`frequency_penalty`	float	`0.0`	Penalizes tokens based on their frequency in generation so far. Must be between -2 and 2 where negative numbers encourage repetition.
`lora_name`	string	`""`	Lora name to be used for this generation.
`use_beam_search`	boolean	`False`	Enable beam search in the vLLM engine. When enabled, sampling parameters like temperature, top-p, and top-k are auto ignored.

InferenceEngine Configuration#

Configuration for inference servers, including offpolicyness control.

Parameter	Type	Default	Description
`experiment_name`	string \| None	`None`	-
`trial_name`	string \| None	`None`	-
`fileroot`	string \| None	`None`	Root directory for logs and trajectory dumps.
`max_concurrent_rollouts`	integer \| None	`None`	Maximum number of concurrent rollouts to the inference engine. Defaults to consumer_batch_size.
`queue_size`	integer \| None	`None`	Input/Output queue size for async rollout.
`consumer_batch_size`	integer	`1`	Batch size for consuming rollouts from the queue.
`max_head_offpolicyness`	integer	`0`	Maximum off-policyness for the head. If the current version is more than this many versions behind, the request will not be accepted.
`enable_rollout_tracing`	boolean	`False`	Whether to output verbose tracing messages for each generation request.
`check_trajectory_format`	boolean	`False`	Whether to check the format of produced trajectories of a customized workflow. Useful when debugging the workflow in isolation. Should be False during RL training.
`schedule_policy`	string	`"round_robin"`	Request scheduling policy Choices: `round_robin`
`tokenizer_path`	string	`""`	Path to tokenizer for trajectory text decoding.
`dump_to_file`	boolean	`False`	Whether to dump the trajectories to files under fileroot.
`setup_timeout`	float	`300.0`	Timeout in seconds of connecting to remote servers or launching local servers.
`request_timeout`	float	`3600`	Timeout for HTTP requests.
`request_retries`	integer	`3`	Number of retries for failed requests.
`pause_grace_period`	float	`0.0`	The grace period after calling /pause_generation. Wait until all requests have been dropped.
`scheduling_spec`	`tuple`	Required	inference engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the RolloutController.
`scheduling_strategy`	`SchedulingStrategy`	Required	The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the RolloutController.
`use_lora`	boolean	`False`	Whether to use LoRA. Should be same as actors LORA option.
`openai`	`OpenAIProxyConfig` \| None	`None`	OpenAI proxy configuration (used when workflow is AgentWorkflow).

SGLang Configuration#

Configuration for SGLang runtime. Refer to:

sgl-project/sglang for detailed documentation.

Parameter	Type	Default	Description
`model_path`	string	`""`	-
`random_seed`	integer	`1`	-
`skip_tokenizer_init`	boolean	`False`	-
`disable_cuda_graph`	boolean	`False`	-
`disable_radix_cache`	boolean	`True`	-
`disable_cuda_graph_padding`	boolean	`False`	-
`enable_nccl_nvls`	boolean	`False`	-
`disable_outlines_disk_cache`	boolean	`False`	-
`disable_custom_all_reduce`	boolean	`False`	-
`disable_overlap_schedule`	boolean	`False`	-
`enable_mixed_chunk`	boolean	`False`	-
`enable_dp_attention`	boolean	`False`	-
`enable_ep_moe`	boolean	`False`	-
`enable_torch_compile`	boolean	`False`	-
`torch_compile_max_bs`	integer	`32`	-
`cuda_graph_max_bs`	integer \| None	`None`	-
`cuda_graph_bs`	list of integer \| None	`None`	-
`torchao_config`	string	`""`	-
`enable_nan_detection`	boolean	`False`	-
`enable_p2p_check`	boolean	`False`	-
`triton_attention_reduce_in_fp32`	boolean	`False`	-
`triton_attention_num_kv_splits`	integer	`8`	-
`num_continuous_decode_steps`	integer	`1`	-
`enable_memory_saver`	boolean	`False`	-
`allow_auto_truncate`	boolean	`False`	-
`attention_backend`	string \| None	`"fa3"`	-
`enable_multimodal`	boolean	`False`	-
`sampling_backend`	string \| None	`None`	-
`context_length`	integer \| None	`32768`	-
`mem_fraction_static`	float \| None	`0.9`	-
`max_running_requests`	integer \| None	`None`	-
`chunked_prefill_size`	integer \| None	`-1`	-
`max_prefill_tokens`	integer	`32768`	-
`schedule_policy`	string	`"lpm"`	-
`schedule_conservativeness`	float	`1.0`	-
`cpu_offload_gb`	integer	`0`	-
`dtype`	string	`"bfloat16"`	-
`kv_cache_dtype`	string	`"auto"`	-
`dp_size`	integer	`1`	-
`ep_size`	integer	`1`	-
`enable_lora`	boolean \| None	`None`	-
`max_lora_rank`	integer \| None	`None`	-
`lora_target_modules`	list of string \| None	`None`	-
`lora_paths`	list of string \| None	`None`	-
`max_loaded_loras`	integer	`1`	-
`max_loras_per_batch`	integer	`1`	-
`lora_backend`	string	`"triton"`	-
`log_level`	string	`"warning"`	-
`log_level_http`	string \| None	`"warning"`	-
`log_requests`	boolean	`False`	-
`log_requests_level`	integer	`0`	-
`show_time_cost`	boolean	`False`	-
`enable_metrics`	boolean	`True`	-
`decode_log_interval`	integer	`1`	-
`enable_multithread_load`	boolean	`False`	-
`enable_fast_load`	boolean	`False`	-

vLLM Configuration#

Configuration for vLLM runtime. Refer to:

https://docs.vllm.ai/en/stable/api/index.html for detailed documentation.

Parameter	Type	Default	Description
`model`	string	`""`	-
`seed`	integer	`1`	-
`skip_tokenizer_init`	boolean	`False`	-
`enforce_eager`	boolean	`False`	-
`dtype`	string	`"bfloat16"`	-
`distributed_executor_backend`	string	`"mp"`	-
`max_num_seqs`	integer	`256`	-
`block_size`	integer	`16`	-
`swap_space`	integer	`4`	-
`cpu_offload_gb`	float	`0`	-
`disable_sliding_window`	boolean	`True`	-
`max_model_len`	integer \| None	`32768`	-
`no_enable_chunked_prefill`	boolean	`False`	-
`no_enable_prefix_caching`	boolean	`True`	-
`gpu_memory_utilization`	float	`0.9`	-
`worker_extension_cls`	string	`"areal.thirdparty.vllm.vllm_worker_extension.VLLMWorkerExtension"`	-
`enable_sleep_mode`	boolean	`False`	-
`uvicorn_log_level`	string	`"warning"`	-
`enable_lora`	boolean	`False`	-
`lora_modules`	string	`""`	-

TrainDataset Configuration#

Configuration for training dataset loading and preprocessing.

Parameter	Type	Default	Description
`path`	string	Required	Path to the dataset. Can be a local path or a HuggingFace dataset name.
`type`	string	Required	Type of training method, e.g., ‘sft’, ‘rl’, etc.
`batch_size`	integer	`1`	Batch size for the dataloader
`shuffle`	boolean	`True`	Whether to shuffle the dataset
`pin_memory`	boolean	`False`	Pin memory for faster data loading (set True for GPU training)
`num_workers`	integer	`0`	Number of worker processes for data loading
`drop_last`	boolean	`True`	Drop the last incomplete batch
`max_length`	integer \| None	`None`	Maximum token length of sequences in dataset. Longer sequences are filtered out.

ValidDataset Configuration#

Configuration for validation dataset loading and preprocessing.

It has different default values with TrainDatasetConfig. shuffle and drop_last default to False.

Parameter	Type	Default	Description
`path`	string	Required	Path to the dataset. Can be a local path or a HuggingFace dataset name.
`type`	string	Required	Type of training method, e.g., ‘sft’, ‘rl’, etc.
`batch_size`	integer	`1`	Batch size for the dataloader
`shuffle`	boolean	`False`	Whether to shuffle the dataset
`pin_memory`	boolean	`False`	Pin memory for faster data loading (set True for GPU training)
`num_workers`	integer	`0`	Number of worker processes for data loading
`drop_last`	boolean	`False`	Drop the last incomplete batch
`max_length`	integer \| None	`None`	Maximum token length of sequences in dataset. Longer sequences are filtered out.

Cluster Specification Configuration#

Configuration for cluster specification and distributed computing setup.

Parameter	Type	Default	Description
`name_resolve`	`NameResolveConfig`	Required	Name resolving configuration.
`cluster_name`	string	`"local"`	Name of the cluster. Used to set specific environs.
`fileroot`	string	`"/tmp/areal/"`	Root for logs and checkpoints. Should be available on all nodes.
`n_nodes`	integer	`32`	The size of the cluster. Used to decide slurm hostname suffix.
`n_gpus_per_node`	integer	`8`	Number of GPUs per node (physical).

NameResolve Configuration#

Configuration for distributed name resolution and service discovery.

Parameter	Type	Default	Description
`type`	string	`"nfs"`	Type of the distributed KV store for name resolving. Choices: `nfs`, `etcd3`, `ray`
`nfs_record_root`	string	`"/tmp/areal/name_resolve"`	Record root for NFS name resolving. Should be available on all nodes.
`etcd3_addr`	string	`"localhost:2379"`	Address of the ETCD3 server.
`ray_actor_name`	string	`"ray_kv_store"`	Name of the distributed Ray KV store.

Evaluator Configuration#

Configuration for model evaluation scheduling and timing.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`fileroot`	string	Required	-
`freq_epochs`	integer \| None	`None`	Trigger frequency in epochs. None disables epoch-based saving.
`freq_steps`	integer \| None	`None`	Trigger frequency in steps. None disables step-based saving.
`freq_secs`	integer \| None	`None`	Trigger frequency in seconds. None disables time-based saving.

Recover Configuration#

Configuration for experiment recovery and fault tolerance.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`fileroot`	string	Required	-
`freq_epochs`	integer \| None	`None`	Trigger frequency in epochs. None disables epoch-based saving.
`freq_steps`	integer \| None	`None`	Trigger frequency in steps. None disables step-based saving.
`freq_secs`	integer \| None	`None`	Trigger frequency in seconds. None disables time-based saving.
`mode`	string	`"disabled"`	Recovery mode for the launcher. Options: ‘disabled’: Never recover from previous runs. ‘auto’: Automatically recover from previous runs if recover info and checkpoints are available. ‘fault’: Only recover from previous runs if the new run fails. ‘resume’: Force to resume, raise an error if no recover info was found. Never resume if failed again.
`retries`	integer	`3`	Number of recovery retries (auto/fault modes only).

Saver Configuration#

Configuration for model checkpoint saving scheduling and timing.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`fileroot`	string	Required	-
`freq_epochs`	integer \| None	`None`	Trigger frequency in epochs. None disables epoch-based saving.
`freq_steps`	integer \| None	`None`	Trigger frequency in steps. None disables step-based saving.
`freq_secs`	integer \| None	`None`	Trigger frequency in seconds. None disables time-based saving.

StatsLogger Configuration#

Configuration for experiment statistics logging and tracking services.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`fileroot`	string	Required	-
`wandb`	`WandBConfig`	Required	Weights & Biases configuration.
`swanlab`	`SwanlabConfig`	Required	SwanLab configuration.
`tensorboard`	`TensorBoardConfig`	Required	TensorBoard configuration. Only ‘path’ field required.

Swanlab Configuration#

Configuration for SwanLab experiment tracking and monitoring.

Parameter	Type	Default	Description
`project`	string \| None	`None`	-
`name`	string \| None	`None`	-
`config`	`dict` \| None	`None`	-
`logdir`	string \| None	`None`	-
`mode`	string \| None	`"disabled"`	-
`api_key`	string \| None	`None`	-

TensorBoard Configuration#

Configuration for TensorBoard logging and visualization.

Parameter	Type	Default	Description
`path`	string \| None	`None`	-

WandB Configuration#

Configuration for Weights & Biases experiment tracking.

Parameter	Type	Default	Description
`mode`	string	`"disabled"`	-
`wandb_base_url`	string	`""`	-
`wandb_api_key`	string	`""`	-
`entity`	string \| None	`None`	-
`project`	string \| None	`None`	-
`name`	string \| None	`None`	-
`job_type`	string \| None	`None`	-
`group`	string \| None	`None`	-
`notes`	string \| None	`None`	-
`tags`	list of string \| None	`None`	-
`config`	`dict` \| None	`None`	-
`id_suffix`	string \| None	`"train"`	-

ArchonEngine Configuration#

Configuration for Archon Engine training backend.

Parameter	Type	Default	Description
`attn_type`	string	`"varlen"`	Attention backend type. Choices: `varlen`, `sdpa`
`offload_params`	boolean	`False`	Whether to offload FSDP parameters to CPU.
`enable_compile`	boolean	`True`	Enable torch.compile for TransformerBlocks.
`recompute_granularity`	string	`"full"`	Activation checkpointing granularity. Choices: `none`, `full`, `selective`
`recompute_num_layers`	integer	`1`	For selective recompute: checkpoint every N layers. Set to 0 for op-level selective checkpointing.

DistributedDataParallel Configuration#

Configuration for Megatron’s DistributedDataParallel.

Refer to Megatron-LM documentation for details.

Parameter	Type	Default	Description
`grad_reduce_in_fp32`	boolean	`True`	-
`overlap_grad_reduce`	boolean	`False`	-
`overlap_param_gather`	boolean	`False`	-
`align_param_gather`	boolean	`False`	-
`use_distributed_optimizer`	boolean	`True`	-
`check_for_nan_in_grad`	boolean	`False`	-
`bucket_size`	integer \| None	`None`	-
`average_in_collective`	boolean	`False`	-
`fp8_param_gather`	boolean	`False`	-

FP8Engine Configuration#

Configuration for FP8 (8-bit floating point) training.

This configuration encapsulates all FP8-related parameters and can be reused across different engines (e.g., Megatron, FSDP). When None in the parent config, FP8 training is disabled.

Parameter	Type	Default	Description
`mode`	string	`"e4m3"`	FP8 precision mode. Options: ‘e4m3’ (uniform e4m3), ‘hybrid’ (e4m3 for activations/weights, e5m2 for output activation gradients).
`recipe`	string	`"delayed"`	FP8 scaling recipe. Options: ‘tensorwise’, ‘delayed’, ‘mxfp8’ (Blackwell only), ‘blockwise’.
`param`	boolean	`False`	Keep parameters in FP8 precision to save memory. Not all parameters will be converted to fp8; for example, biases will remain unchanged.
`margin`	integer	`0`	Margin for FP8 scaling factor computation.
`amax_history_len`	integer	`1`	Length of amax history window for scaling factor computation.
`amax_compute_algo`	string	`"most_recent"`	Algorithm for choosing amax value. Options: ‘max’ (largest in history window), ‘most_recent’.
`wgrad`	boolean	`True`	When False, override FP8 config and compute weight gradients in higher precision.
`dot_product_attention`	boolean	`False`	Use FP8 implementation of Dot Product Attention.
`multi_head_attention`	boolean	`False`	Use FP8 implementation of Multi Head Attention.
`tp_only_amax_red`	boolean	`False`	Reduce FP8 AMAX only in TP or TP-CP domain.
`first_last_layers_bf16`	boolean	`False`	Retain first and last N TransformerBlocks in BF16 instead of FP8.
`num_layers_at_start_in_bf16`	integer	`1`	Number of layers at start to keep in BF16 when first_last_layers_bf16 is True.
`num_layers_at_end_in_bf16`	integer	`1`	Number of layers at end to keep in BF16 when first_last_layers_bf16 is True.
`direct_convert`	boolean	`True`	Whether to use direct FP8 conversion during weight updates and save/load. When True, FP8 parameters are directly converted between TE FP8 and PyTorch FP8 without intermediate dequantization/quantization.

MegatronEngine Configuration#

Configuration for Megatron-LM training framework.

Refer to Megatron-LM documentation for implementation details.

Parameter	Type	Default	Description
`wrap_with_ddp`	boolean	`True`	-
`use_torch_fsdp2`	boolean	`False`	-
`use_custom_fsdp`	boolean	`False`	-
`ddp`	`DistributedDataParallelConfig`	Required	-
`virtual_pipeline_parallel_size`	integer	`1`	Virtual pipeline parallel size for Megatron interleaved schedule. Set to >1 to enable VPP. Default is 1 (disabled).
`overlap_param_gather_with_optimizer_step`	boolean	`False`	-
`use_precision_aware_optimizer`	boolean	`False`	-
`main_grads_dtype`	string	`"float32"`	-
`main_params_dtype`	string	`"float32"`	-
`exp_avg_dtype`	string	`"float32"`	-
`exp_avg_sq_dtype`	string	`"float32"`	-
`async_save`	boolean	`False`	-
`use_checkpoint_opt_param_scheduler`	boolean	`True`	-
`use_deterministic_algorithms`	boolean	`False`	-
`recompute_granularity`	string \| None	`"full"`	-
`recompute_method`	string \| None	`"uniform"`	-
`recompute_num_layers`	integer \| None	`1`	-
`distribute_saved_activations`	boolean \| None	`None`	-
`recompute_modules`	list of string \| None	`None`	-
`moe_router_dtype`	string \| None	`"fp32"`	-
`moe_shared_expert_overlap`	boolean	`False`	Enable overlapping between shared expert computations and dispatcher communications. Without this, the shared experts execute after the routed experts.
`moe_enable_deepep`	boolean	`False`	-
`moe_token_dispatcher_type`	string	`"alltoall"`	Type of token dispatcher. Options: ‘allgather’,’alltoall’ and ‘flex’.
`moe_permute_fusion`	boolean	`False`	Fuse token rearrangement ops during token dispatching.
`fp8_config`	`FP8EngineConfig` \| None	`None`	-
`enable_tree_training`	boolean	`False`	Enable tree training with flex attention module.

OpenAIProxy Configuration#

Configuration for OpenAI proxy when using AgentWorkflow workflows.

Parameter	Type	Default	Description
`mode`	string	`"inline"`	OpenAI proxy mode: ‘inline’ (in-process) or ‘subproc’ (subprocess). `inline` mode runs the provided agent workflow directly in the same process. It can use the provided `base_url` and `http_client` to reduce overhead. `subproc` mode launches a separate process to run the agent with `OPENAI_BASE_URL` environment variable, which offers more flexible deployment options at the cost of larger overhead. Choices: `inline`, `subproc`
`tool_call_parser`	string	`"qwen3"`	Parser for tool calls in model output.
`reasoning_parser`	string	`"qwen3"`	Parser for reasoning content ( tags).
`chat_template_type`	string	`"hf"`	Chat template type: ‘hf’ (standard) or ‘concat’ (multi-turn concatenation). Choices: `hf`, `concat`
`engine_max_tokens`	integer \| None	`None`	Maximum total tokens for the engine (prompt + completion).
`turn_discount`	float	`1.0`	Discount factor for multi-turn reward propagation.
`export_style`	string	`"individual"`	Export style: ‘individual’ (all interactions) or ‘concat’ (leaf nodes only). Choices: `individual`, `concat`
`subproc_max_workers`	integer	`4`	Maximum number of worker processes for subprocess mode execution pool.
`session_timeout_seconds`	integer	`3600`	Session timeout in seconds. Sessions inactive longer than this will be garbage collected.

PerfTracer Configuration#

Configuration for perf tracer emission.

Parameter	Type	Default	Description
`experiment_name`	string	Required	-
`trial_name`	string	Required	-
`fileroot`	string	Required	-
`enabled`	boolean	`False`	Explicitly enable or disable perf tracing. Set to true to capture perf traces.
`save_interval`	integer	`1`	Flush trace events to disk every N calls to save(step=…). A value of 1 writes on every step; values <= 0 fall back to 1.
`profile_steps`	list of integer \| None	`None`	List of step numbers at which to capture detailed profiling traces. If None, no detailed profiling traces are captured.
`session_tracer`	`SessionTracerConfig` \| None	`None`	Session tracing configuration.

Scheduler Configuration#

Configuration for worker scheduling. Used in the single-controller mode. Experimental.

Parameter	Type	Default	Description
`type`	string \| None	`None`	-
`endpoint`	string	`"http://localhost:8081"`	-
`deploy_mode`	string	`"separation"`	-
`functioncall_service_domain`	string	`"http://localhost:8080"`	-
`reward_functioncall_config`	`dict`	Required	-
`reward_model_path`	string	`""`	-
`reward_model_service_url`	string	`"http://localhost:30000/classify"`	-

Scheduling Specification#

Configuration class: SchedulingSpec

Parameter	Type	Default	Description
`cpu`	integer	`4`	Number of CPU cores required per GPU
`gpu`	integer	`0`	Number of GPU units required. Used only when allocating pods.
`mem`	integer	`32`	Amount of memory (GB) required per GPU
`port_count`	integer	`2`	Number of ports to expose
`image`	string	`"/storage/openpsi/images/areal-latest.sif"`	Docker/Singularity container image to use. Currently only used by Slurm. Will be potentially used by Kubernetes in the future.
`task_type`	string	`"worker"`	Task type (e.g., worker, engine) Choices: `worker`, `engine`
`env_vars`	`dict`	Required	Environment variables for the container
`cmd`	string \| None	`None`	Command to execute inside the container. Defaults to AReaL’s RPC server.
`srun_additional_args`	string	`"--unbuffered --mpi=pmi2 -K --chdir $PWD"`	Additional arguments to pass to the srun command. Only used by slurm.
`additional_bash_cmds`	list of string \| None	`None`	Additional bash commands to setup the container before running the torchrun command. Only used by slurm.
`container_type`	string	`"apptainer"`	Type of containers used in slurm Choices: `apptainer`, `none`
`mount`	string	`"/storage:/storage"`	Mount path for slurm.
`nodelist`	string \| None	`None`	sbatch/srun’s `--nodelist` option for slurm.
`exclude`	string \| None	`None`	sbatch/srun’s `--exclude` option for slurm.

SchedulingStrategy#

Configuration class: SchedulingStrategy

Parameter	Type	Default	Description
`type`	string	`"separation"`	- Choices: `separation`, `colocation`
`target`	string \| None	`None`	The target role to be colocated with
`fork`	boolean	`True`	When True with colocation, the target worker spawns a new process on the same node/GPUs instead of sharing its process. Provides process isolation while sharing GPU resources.

SessionTracer Configuration#

Configuration for per-session lifecycle tracing.

Parameter	Type	Default	Description
`enabled`	boolean	`False`	Enable per-session lifecycle tracing alongside perf events. When true, session metadata is captured to sessions.jsonl.
`flush_threshold`	integer	`256`	Flush session trace records once this many entries are ready. Values <= 0 fall back to 1.

Configurations

Contents