注意
跳转至页面底部下载完整示例代码。
使用 Ray Tune 进行超参数调优#
创建日期:2020 年 8 月 31 日 | 最后更新:2026 年 1 月 8 日 | 最后验证:2024 年 11 月 5 日
作者: Ricardo Decal
本教程展示了如何将 Ray Tune 集成到您的 PyTorch 训练工作流中,以执行可扩展且高效的超参数调优。
如何修改 PyTorch 训练循环以适配 Ray Tune
如何无需更改代码即可将超参数搜索扩展到多个节点和 GPU
如何使用
tune.Tuner定义超参数搜索空间并运行搜索如何使用早停调度器 (ASHA) 并报告指标/保存检查点
如何使用检查点恢复训练并加载最佳模型
PyTorch v2.9+ 和
torchvisionRay Tune (
ray[tune]) v2.52.1+GPU 是可选的,但建议使用以加快训练速度
Ray 是 PyTorch 基金会旗下的一个项目,是一个用于扩展 AI 和 Python 应用程序的开源统一框架。它通过处理分布式计算的复杂性来帮助运行分布式作业。Ray Tune 是构建在 Ray 之上的超参数调优库,使您无需更改代码即可将超参数搜索从单机扩展到大型集群。
本教程改编自 PyTorch CIFAR10 分类器训练教程,旨在演示如何使用 Ray Tune 运行多 GPU 超参数搜索。
设置#
要运行本教程,请安装以下依赖项
pip install "ray[tune]" torchvision
首先导入必要的库
from functools import partial
import os
import tempfile
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
# New: imports for Ray Tune
import ray
from ray import tune
from ray.tune import Checkpoint
from ray.tune.schedulers import ASHAScheduler
数据加载#
将数据加载器封装在一个构造函数中。在本教程中,我们将全局数据目录传递给该函数,以便在不同的试验(trial)之间重用数据集。在集群环境中,您可以使用共享存储(如网络文件系统)来防止每个节点单独下载数据。
def load_data(data_dir="./data"):
# Mean and standard deviation of the CIFAR10 training subset.
transform = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.4914, 0.48216, 0.44653), (0.2022, 0.19932, 0.20086))]
)
trainset = torchvision.datasets.CIFAR10(
root=data_dir, train=True, download=True, transform=transform
)
testset = torchvision.datasets.CIFAR10(
root=data_dir, train=False, download=True, transform=transform
)
return trainset, testset
模型架构#
本教程将搜索全连接层的最佳尺寸和学习率。为此,Net 类将层大小 l1 和 l2 暴露为 Ray Tune 可搜索的可配置参数。
class Net(nn.Module):
def __init__(self, l1=120, l2=84):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, l1)
self.fc2 = nn.Linear(l1, l2)
self.fc3 = nn.Linear(l2, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
定义搜索空间#
接下来,定义要调优的超参数以及 Ray Tune 对其进行采样的方式。Ray Tune 提供多种 搜索空间分布 以适应不同的参数类型:loguniform、uniform、choice、randint、grid 等。您还可以使用 条件搜索空间 来表示参数之间复杂的依赖关系,或从任意函数进行采样。
以下是本教程的搜索空间
config = {
"l1": tune.choice([2**i for i in range(9)]),
"l2": tune.choice([2**i for i in range(9)]),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16]),
}
tune.choice() 接受一个值列表,并从中进行均匀采样。在此示例中,l1 和 l2 参数的值为 1 到 256 之间的 2 的幂,学习率则在 0.0001 到 0.1 之间进行对数刻度采样。在对数刻度上进行采样可以实现在相对规模上(而非绝对规模上)对多个数量级进行探索。
训练函数#
Ray Tune 要求提供一个接受配置字典并运行主训练循环的训练函数。随着 Ray Tune 运行不同的试验,它会更新每个试验的配置字典。
以下是完整的训练函数,随后将对关键的 Ray Tune 集成点进行解释
def train_cifar(config, data_dir=None):
net = Net(config["l1"], config["l2"])
device = config["device"]
net = net.to(device)
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
# Load checkpoint if resuming training
checkpoint = tune.get_checkpoint()
if checkpoint:
with checkpoint.as_directory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
checkpoint_state = torch.load(checkpoint_path)
start_epoch = checkpoint_state["epoch"]
net.load_state_dict(checkpoint_state["net_state_dict"])
optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
else:
start_epoch = 0
trainset, _testset = load_data(data_dir)
test_abs = int(len(trainset) * 0.8)
train_subset, val_subset = random_split(
trainset, [test_abs, len(trainset) - test_abs]
)
trainloader = torch.utils.data.DataLoader(
train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
)
valloader = torch.utils.data.DataLoader(
val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
)
for epoch in range(start_epoch, 10): # loop over the dataset multiple times
running_loss = 0.0
epoch_steps = 0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
epoch_steps += 1
if i % 2000 == 1999: # print every 2000 mini-batches
print(
"[%d, %5d] loss: %.3f"
% (epoch + 1, i + 1, running_loss / epoch_steps)
)
running_loss = 0.0
# Validation loss
val_loss = 0.0
val_steps = 0
total = 0
correct = 0
for i, data in enumerate(valloader, 0):
with torch.no_grad():
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
loss = criterion(outputs, labels)
val_loss += loss.cpu().numpy()
val_steps += 1
# Save checkpoint and report metrics
checkpoint_data = {
"epoch": epoch,
"net_state_dict": net.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
torch.save(checkpoint_data, checkpoint_path)
checkpoint = Checkpoint.from_directory(checkpoint_dir)
tune.report(
{"loss": val_loss / val_steps, "accuracy": correct / total},
checkpoint=checkpoint,
)
print("Finished Training")
关键集成点#
从配置字典中使用超参数#
Ray Tune 会用每个试验的超参数更新 config 字典。在此示例中,模型架构和优化器从 config 字典接收超参数。
报告指标和保存检查点#
最重要的集成是与 Ray Tune 通信。Ray Tune 使用验证指标来确定最佳超参数配置,并尽早停止表现不佳的试验,从而节省资源。
检查点功能使您能够在稍后加载训练好的模型、恢复超参数搜索,并提供容错能力。它也是某些 Ray Tune 调度器(例如 基于群体的训练 (PBT))的要求,这些调度器会在搜索期间暂停和恢复试验。
训练函数中的此代码会在开始时检查是否存在检查点,并在存在时加载模型和优化器状态
checkpoint = tune.get_checkpoint()
if checkpoint:
with checkpoint.as_directory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
checkpoint_state = torch.load(checkpoint_path)
start_epoch = checkpoint_state["epoch"]
net.load_state_dict(checkpoint_state["net_state_dict"])
optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
在每个 epoch 结束时,保存检查点并报告验证指标
checkpoint_data = {
"epoch": epoch,
"net_state_dict": net.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
torch.save(checkpoint_data, checkpoint_path)
checkpoint = Checkpoint.from_directory(checkpoint_dir)
tune.report(
{"loss": val_loss / val_steps, "accuracy": correct / total},
checkpoint=checkpoint,
)
Ray Tune 的检查点功能支持本地文件系统、云存储和分布式文件系统。有关更多信息,请参阅 Ray Tune 存储文档。
多 GPU 支持#
图像分类模型可以通过使用 GPU 显著加速。训练函数通过将模型包装在 nn.DataParallel 中来支持多 GPU 训练。
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
此训练函数支持在 CPU、单个 GPU、多个 GPU 或多个节点上进行训练,且无需更改代码。Ray Tune 会根据可用资源自动将试验分布到各节点。Ray Tune 还支持 分数 GPU,这样在模型、优化器和数据批次适合 GPU 内存的前提下,一个 GPU 可以在多个试验间共享。
验证集划分#
原始 CIFAR10 数据集仅包含训练和测试子集。这对于训练单个模型来说足够了,但对于超参数调优,需要一个验证子集。训练函数通过从训练子集中预留 20% 来创建一个验证子集。测试子集用于在搜索完成后评估最佳模型的泛化误差。
评估函数#
找到最佳超参数后,在留出的测试集上测试模型以评估泛化误差
def test_accuracy(net, device="cpu", data_dir=None):
_trainset, testset = load_data(data_dir)
testloader = torch.utils.data.DataLoader(
testset, batch_size=4, shuffle=False, num_workers=2
)
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
image_batch, labels = data
image_batch, labels = image_batch.to(device), labels.to(device)
outputs = net(image_batch)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
return correct / total
配置并运行 Ray Tune#
定义了训练和评估函数后,配置 Ray Tune 以运行超参数搜索。
早停调度器#
Ray Tune 提供调度器来提高超参数搜索的效率,方法是检测表现不佳的试验并提前停止它们。ASHAScheduler 使用异步连续减半算法 (ASHA) 来积极终止表现低下的试验。
scheduler = ASHAScheduler(
max_t=max_num_epochs,
grace_period=1,
reduction_factor=2,
)
Ray Tune 还提供 高级搜索算法,可以根据先前结果智能地选择下一组超参数,而不是仅仅依赖随机或网格搜索。示例包括 Optuna 和 BayesOpt。
资源分配#
通过将 resources 字典传递给 tune.with_resources,告诉 Ray Tune 为每个试验分配什么资源
tune.with_resources(
partial(train_cifar, data_dir=data_dir),
resources={"cpu": cpus_per_trial, "gpu": gpus_per_trial}
)
Ray Tune 会自动管理这些试验的放置并确保它们相互隔离,因此您无需手动将 GPU 分配给进程。
例如,如果您在由 20 台机器组成的集群上运行此实验(每台机器有 8 个 GPU),您可以设置 gpus_per_trial = 0.5,从而实现每个 GPU 调度两个并发试验。此配置可在整个集群中并行运行 320 个试验。
注意
要在没有 GPU 的情况下运行本教程,请设置 gpus_per_trial=0,并预料到运行时间会显著增加。
为了在开发过程中避免过长的运行时间,请从少量的试验和 Epoch 开始。
创建 Tuner#
Ray Tune API 是模块化且可组合的。将您的配置传递给 tune.Tuner 类以创建调整器对象,然后运行 tuner.fit() 开始训练。
tuner = tune.Tuner(
tune.with_resources(
partial(train_cifar, data_dir=data_dir),
resources={"cpu": cpus_per_trial, "gpu": gpus_per_trial}
),
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
scheduler=scheduler,
num_samples=num_trials,
),
param_space=config,
)
results = tuner.fit()
训练完成后,检索表现最好的试验,加载其检查点,并在测试集上进行评估。
总结#
def main(num_trials=10, max_num_epochs=10, gpus_per_trial=0, cpus_per_trial=2):
print("Starting hyperparameter tuning.")
ray.init(include_dashboard=False)
data_dir = os.path.abspath("./data")
load_data(data_dir) # Pre-download the dataset
device = "cuda" if torch.cuda.is_available() else "cpu"
config = {
"l1": tune.choice([2**i for i in range(9)]),
"l2": tune.choice([2**i for i in range(9)]),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16]),
"device": device,
}
scheduler = ASHAScheduler(
max_t=max_num_epochs,
grace_period=1,
reduction_factor=2,
)
tuner = tune.Tuner(
tune.with_resources(
partial(train_cifar, data_dir=data_dir),
resources={"cpu": cpus_per_trial, "gpu": gpus_per_trial}
),
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
scheduler=scheduler,
num_samples=num_trials,
),
param_space=config,
)
results = tuner.fit()
best_result = results.get_best_result("loss", "min")
print(f"Best trial config: {best_result.config}")
print(f"Best trial final validation loss: {best_result.metrics['loss']}")
print(f"Best trial final validation accuracy: {best_result.metrics['accuracy']}")
best_trained_model = Net(best_result.config["l1"], best_result.config["l2"])
best_trained_model = best_trained_model.to(device)
if gpus_per_trial > 1:
best_trained_model = nn.DataParallel(best_trained_model)
best_checkpoint = best_result.checkpoint
with best_checkpoint.as_directory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
best_checkpoint_data = torch.load(checkpoint_path)
best_trained_model.load_state_dict(best_checkpoint_data["net_state_dict"])
test_acc = test_accuracy(best_trained_model, device, data_dir)
print(f"Best trial test set accuracy: {test_acc}")
if __name__ == "__main__":
# Set the number of trials, epochs, and GPUs per trial here:
main(num_trials=10, max_num_epochs=10, gpus_per_trial=1)
Starting hyperparameter tuning.
2026-06-03 00:26:34,834 WARNING services.py:2213 -- WARNING: The object store is using /tmp/ray instead of /dev/shm because /dev/shm has only 2147471360 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2026-06-03 00:26:37,003 INFO worker.py:2012 -- Started a local Ray instance.
/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:2051: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
warnings.warn(
0%| | 0.00/170M [00:00<?, ?B/s]
0%| | 426k/170M [00:00<00:40, 4.23MB/s]
5%|▍ | 7.77M/170M [00:00<00:03, 44.8MB/s]
11%|█ | 19.1M/170M [00:00<00:01, 76.1MB/s]
17%|█▋ | 29.4M/170M [00:00<00:01, 86.3MB/s]
23%|██▎ | 39.4M/170M [00:00<00:01, 91.2MB/s]
30%|██▉ | 50.8M/170M [00:00<00:01, 98.9MB/s]
36%|███▌ | 60.9M/170M [00:00<00:01, 99.7MB/s]
42%|████▏ | 71.2M/170M [00:00<00:00, 101MB/s]
48%|████▊ | 82.6M/170M [00:00<00:00, 105MB/s]
55%|█████▍ | 93.1M/170M [00:01<00:00, 103MB/s]
61%|██████ | 103M/170M [00:01<00:00, 103MB/s]
67%|██████▋ | 115M/170M [00:01<00:00, 105MB/s]
73%|███████▎ | 125M/170M [00:01<00:00, 106MB/s]
80%|███████▉ | 136M/170M [00:01<00:00, 105MB/s]
86%|████████▌ | 147M/170M [00:01<00:00, 106MB/s]
92%|█████████▏| 157M/170M [00:01<00:00, 106MB/s]
99%|█████████▊| 168M/170M [00:01<00:00, 105MB/s]
100%|██████████| 170M/170M [00:01<00:00, 97.9MB/s]
╭────────────────────────────────────────────────────────────────────╮
│ Configuration for experiment train_cifar_2026-06-03_00-26-42 │
├────────────────────────────────────────────────────────────────────┤
│ Search algorithm BasicVariantGenerator │
│ Scheduler AsyncHyperBandScheduler │
│ Number of trials 10 │
╰────────────────────────────────────────────────────────────────────╯
View detailed results here: /var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2026-06-03_00-26-33_172623_4337/artifacts/2026-06-03_00-26-42/train_cifar_2026-06-03_00-26-42/driver_artifacts`
Trial status: 10 PENDING
Current time: 2026-06-03 00:26:43. Total running time: 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 PENDING 128 1 0.000393605 2 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰───────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_e5524_00000 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00000 config │
├──────────────────────────────────────────────────┤
│ batch_size 2 │
│ device cuda │
│ l1 128 │
│ l2 1 │
│ lr 0.00039 │
╰──────────────────────────────────────────────────╯
(func pid=5495) [1, 2000] loss: 2.399
(func pid=5495) [1, 4000] loss: 1.167
(func pid=5495) [1, 6000] loss: 0.769
(func pid=5495) [1, 8000] loss: 0.576
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:27:13. Total running time: 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰───────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [1, 10000] loss: 0.461
(func pid=5495) [1, 12000] loss: 0.384
(func pid=5495) [1, 14000] loss: 0.329
(func pid=5495) [1, 16000] loss: 0.288
(func pid=5495) [1, 18000] loss: 0.256
(func pid=5495) [1, 20000] loss: 0.230
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:27:43. Total running time: 1min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰───────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000000)
(func pid=5495) [2, 2000] loss: 2.303
(func pid=5495) [2, 4000] loss: 1.151
(func pid=5495) [2, 6000] loss: 0.768
(func pid=5495) [2, 8000] loss: 0.576
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:28:13. Total running time: 1min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.303092365074158 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 1 63.0144 2.30309 0.1011 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [2, 10000] loss: 0.461
(func pid=5495) [2, 12000] loss: 0.384
(func pid=5495) [2, 14000] loss: 0.329
(func pid=5495) [2, 16000] loss: 0.288
(func pid=5495) [2, 18000] loss: 0.256
(func pid=5495) [2, 20000] loss: 0.230
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:28:43. Total running time: 2min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.303092365074158 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 1 63.0144 2.30309 0.1011 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000001)
(func pid=5495) [3, 2000] loss: 2.303
(func pid=5495) [3, 4000] loss: 1.152
(func pid=5495) [3, 6000] loss: 0.768
(func pid=5495) [3, 8000] loss: 0.576
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:29:13. Total running time: 2min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3028803560733797 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 2 123.937 2.30288 0.096 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [3, 10000] loss: 0.461
(func pid=5495) [3, 12000] loss: 0.384
(func pid=5495) [3, 14000] loss: 0.329
(func pid=5495) [3, 16000] loss: 0.288
(func pid=5495) [3, 18000] loss: 0.256
(func pid=5495) [3, 20000] loss: 0.230
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:29:43. Total running time: 3min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3028803560733797 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 2 123.937 2.30288 0.096 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000002)
(func pid=5495) [4, 2000] loss: 2.303
(func pid=5495) [4, 4000] loss: 1.151
(func pid=5495) [4, 6000] loss: 0.767
(func pid=5495) [4, 8000] loss: 0.576
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:30:13. Total running time: 3min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.302904331731796 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 3 185.068 2.3029 0.0988 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [4, 10000] loss: 0.461
(func pid=5495) [4, 12000] loss: 0.384
(func pid=5495) [4, 14000] loss: 0.329
(func pid=5495) [4, 16000] loss: 0.288
(func pid=5495) [4, 18000] loss: 0.256
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:30:43. Total running time: 4min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.302904331731796 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 3 185.068 2.3029 0.0988 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [4, 20000] loss: 0.230
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000003)
(func pid=5495) [5, 2000] loss: 2.303
(func pid=5495) [5, 4000] loss: 1.152
(func pid=5495) [5, 6000] loss: 0.768
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:31:13. Total running time: 4min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3031228340148924 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 4 245.919 2.30312 0.1012 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [5, 8000] loss: 0.576
(func pid=5495) [5, 10000] loss: 0.461
(func pid=5495) [5, 12000] loss: 0.384
(func pid=5495) [5, 14000] loss: 0.329
(func pid=5495) [5, 16000] loss: 0.288
(func pid=5495) [5, 18000] loss: 0.256
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:31:43. Total running time: 5min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3031228340148924 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 4 245.919 2.30312 0.1012 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [5, 20000] loss: 0.230
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000004)
(func pid=5495) [6, 2000] loss: 2.303
(func pid=5495) [6, 4000] loss: 1.151
(func pid=5495) [6, 6000] loss: 0.768
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:32:13. Total running time: 5min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3031033515930175 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 5 306.857 2.3031 0.0955 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [6, 8000] loss: 0.576
(func pid=5495) [6, 10000] loss: 0.461
(func pid=5495) [6, 12000] loss: 0.384
(func pid=5495) [6, 14000] loss: 0.329
(func pid=5495) [6, 16000] loss: 0.288
(func pid=5495) [6, 18000] loss: 0.256
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:32:43. Total running time: 6min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3031033515930175 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 5 306.857 2.3031 0.0955 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [6, 20000] loss: 0.230
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000005)
(func pid=5495) [7, 2000] loss: 2.303
(func pid=5495) [7, 4000] loss: 1.151
(func pid=5495) [7, 6000] loss: 0.768
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:33:13. Total running time: 6min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3036095594882964 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 6 367.29 2.30361 0.1012 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [7, 8000] loss: 0.573
(func pid=5495) [7, 10000] loss: 0.448
(func pid=5495) [7, 12000] loss: 0.367
(func pid=5495) [7, 14000] loss: 0.310
(func pid=5495) [7, 16000] loss: 0.269
(func pid=5495) [7, 18000] loss: 0.236
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:33:44. Total running time: 7min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.3036095594882964 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 6 367.29 2.30361 0.1012 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [7, 20000] loss: 0.212
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000006)
(func pid=5495) [8, 2000] loss: 2.105
(func pid=5495) [8, 4000] loss: 1.049
(func pid=5495) [8, 6000] loss: 0.692
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:34:14. Total running time: 7min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.114957945179939 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 7 428.086 2.11496 0.1883 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [8, 8000] loss: 0.521
(func pid=5495) [8, 10000] loss: 0.413
(func pid=5495) [8, 12000] loss: 0.345
(func pid=5495) [8, 14000] loss: 0.294
(func pid=5495) [8, 16000] loss: 0.255
(func pid=5495) [8, 18000] loss: 0.225
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:34:44. Total running time: 8min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.114957945179939 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 7 428.086 2.11496 0.1883 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [8, 20000] loss: 0.204
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000007)
(func pid=5495) [9, 2000] loss: 2.018
(func pid=5495) [9, 4000] loss: 0.999
(func pid=5495) [9, 6000] loss: 0.664
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:35:14. Total running time: 8min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.0022196177482603 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 8 488.942 2.00222 0.1998 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [9, 8000] loss: 0.491
(func pid=5495) [9, 10000] loss: 0.389
(func pid=5495) [9, 12000] loss: 0.323
(func pid=5495) [9, 14000] loss: 0.273
(func pid=5495) [9, 16000] loss: 0.237
(func pid=5495) [9, 18000] loss: 0.211
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:35:44. Total running time: 9min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=2.0022196177482603 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 8 488.942 2.00222 0.1998 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [9, 20000] loss: 0.189
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000008)
(func pid=5495) [10, 2000] loss: 1.889
(func pid=5495) [10, 4000] loss: 0.942
(func pid=5495) [10, 6000] loss: 0.619
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:36:14. Total running time: 9min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=1.870682253921032 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 9 549.663 1.87068 0.2497 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [10, 8000] loss: 0.463
(func pid=5495) [10, 10000] loss: 0.373
(func pid=5495) [10, 12000] loss: 0.311
(func pid=5495) [10, 14000] loss: 0.263
(func pid=5495) [10, 16000] loss: 0.233
(func pid=5495) [10, 18000] loss: 0.205
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-06-03 00:36:44. Total running time: 10min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=1.870682253921032 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 RUNNING 128 1 0.000393605 2 9 549.663 1.87068 0.2497 │
│ train_cifar_e5524_00001 PENDING 2 16 0.00450586 8 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5495) [10, 20000] loss: 0.186
Trial train_cifar_e5524_00000 completed after 10 iterations at 2026-06-03 00:36:57. Total running time: 10min 14s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00000 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 60.83569 │
│ time_total_s 610.49862 │
│ training_iteration 10 │
│ accuracy 0.2146 │
│ loss 1.85962 │
╰────────────────────────────────────────────────────────────╯
(func pid=5495) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00000_0_batch_size=2,l1=128,l2=1,lr=0.0004_2026-06-03_00-26-42/checkpoint_000009)
Trial train_cifar_e5524_00001 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00001 config │
├──────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 2 │
│ l2 16 │
│ lr 0.00451 │
╰──────────────────────────────────────────────────╯
(func pid=7388) [1, 2000] loss: 1.988
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-06-03 00:37:14. Total running time: 10min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=1.8596160467505456 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00001 RUNNING 2 16 0.00450586 8 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7388) [1, 4000] loss: 0.925
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000000)
(func pid=7388) [2, 2000] loss: 1.821
(func pid=7388) [2, 4000] loss: 0.913
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000001)
(func pid=7388) [3, 2000] loss: 1.812
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-06-03 00:37:44. Total running time: 11min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=1.8596160467505456 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00001 RUNNING 2 16 0.00450586 8 2 34.1512 1.86154 0.2562 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7388) [3, 4000] loss: 0.911
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000002)
(func pid=7388) [4, 2000] loss: 1.810
(func pid=7388) [4, 4000] loss: 0.899
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000003)
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-06-03 00:38:14. Total running time: 11min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=1.8596160467505456 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00001 RUNNING 2 16 0.00450586 8 4 66.4551 1.90142 0.2509 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7388) [5, 2000] loss: 1.838
(func pid=7388) [5, 4000] loss: 0.898
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000004)
(func pid=7388) [6, 2000] loss: 1.832
(func pid=7388) [6, 4000] loss: 0.918
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000005)
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-06-03 00:38:44. Total running time: 12min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00001 with loss=1.794007800102234 and params={'l1': 2, 'l2': 16, 'lr': 0.004505860994583894, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00001 RUNNING 2 16 0.00450586 8 6 98.7458 1.79401 0.3131 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7388) [7, 2000] loss: 1.806
(func pid=7388) [7, 4000] loss: 0.905
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000006)
(func pid=7388) [8, 2000] loss: 1.769
(func pid=7388) [8, 4000] loss: 0.882
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000007)
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-06-03 00:39:14. Total running time: 12min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00001 with loss=1.77416847448349 and params={'l1': 2, 'l2': 16, 'lr': 0.004505860994583894, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00001 RUNNING 2 16 0.00450586 8 8 130.953 1.77417 0.3225 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7388) [9, 2000] loss: 1.751
(func pid=7388) [9, 4000] loss: 0.874
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000008)
(func pid=7388) [10, 2000] loss: 1.752
(func pid=7388) [10, 4000] loss: 0.873
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-06-03 00:39:44. Total running time: 13min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00000 with loss=1.8596160467505456 and params={'l1': 128, 'l2': 1, 'lr': 0.0003936045633222101, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00001 RUNNING 2 16 0.00450586 8 9 148.072 1.88573 0.2817 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00002 PENDING 4 8 0.015076 8 │
│ train_cifar_e5524_00003 PENDING 256 1 0.000242106 16 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_e5524_00001 completed after 10 iterations at 2026-06-03 00:39:46. Total running time: 13min 4s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00001 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 16.11117 │
│ time_total_s 164.18301 │
│ training_iteration 10 │
│ accuracy 0.3412 │
│ loss 1.72291 │
╰────────────────────────────────────────────────────────────╯
(func pid=7388) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00001_1_batch_size=8,l1=2,l2=16,lr=0.0045_2026-06-03_00-26-42/checkpoint_000009)
Trial train_cifar_e5524_00002 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00002 config │
├──────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 4 │
│ l2 8 │
│ lr 0.01508 │
╰──────────────────────────────────────────────────╯
(func pid=8401) [1, 2000] loss: 2.237
(func pid=8401) [1, 4000] loss: 1.052
(func pid=8401) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00002_2_batch_size=8,l1=4,l2=8,lr=0.0151_2026-06-03_00-26-42/checkpoint_000000)
Trial train_cifar_e5524_00002 completed after 1 iterations at 2026-06-03 00:40:08. Total running time: 13min 26s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00002 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000000 │
│ time_this_iter_s 17.61613 │
│ time_total_s 17.61613 │
│ training_iteration 1 │
│ accuracy 0.0998 │
│ loss 2.30927 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_e5524_00003 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00003 config │
├──────────────────────────────────────────────────┤
│ batch_size 16 │
│ device cuda │
│ l1 256 │
│ l2 1 │
│ lr 0.00024 │
╰──────────────────────────────────────────────────╯
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-06-03 00:40:14. Total running time: 13min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00001 with loss=1.722906385421753 and params={'l1': 2, 'l2': 16, 'lr': 0.004505860994583894, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00003 RUNNING 256 1 0.000242106 16 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00004 PENDING 32 16 0.0140813 16 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8586) [1, 2000] loss: 2.404
Trial train_cifar_e5524_00003 completed after 1 iterations at 2026-06-03 00:40:24. Total running time: 13min 41s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00003 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000000 │
│ time_this_iter_s 10.78123 │
│ time_total_s 10.78123 │
│ training_iteration 1 │
│ accuracy 0.0997 │
│ loss 2.3504 │
╰────────────────────────────────────────────────────────────╯
(func pid=8586) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00003_3_batch_size=16,l1=256,l2=1,lr=0.0002_2026-06-03_00-26-42/checkpoint_000000)
Trial train_cifar_e5524_00004 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00004 config │
├──────────────────────────────────────────────────┤
│ batch_size 16 │
│ device cuda │
│ l1 32 │
│ l2 16 │
│ lr 0.01408 │
╰──────────────────────────────────────────────────╯
(func pid=8736) [1, 2000] loss: 1.896
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000000)
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-06-03 00:40:44. Total running time: 14min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00001 with loss=1.722906385421753 and params={'l1': 2, 'l2': 16, 'lr': 0.004505860994583894, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00004 RUNNING 32 16 0.0140813 16 1 12.0182 1.79904 0.3323 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8736) [2, 2000] loss: 1.755
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000001)
(func pid=8736) [3, 2000] loss: 1.745
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000002)
(func pid=8736) [4, 2000] loss: 1.727
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000003)
(func pid=8736) [5, 2000] loss: 1.713
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000004)
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-06-03 00:41:14. Total running time: 14min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00004 with loss=1.6898325820922853 and params={'l1': 32, 'l2': 16, 'lr': 0.014081266300228953, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00004 RUNNING 32 16 0.0140813 16 5 46.2241 1.68983 0.3911 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8736) [6, 2000] loss: 1.719
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000005)
(func pid=8736) [7, 2000] loss: 1.732
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000006)
(func pid=8736) [8, 2000] loss: 1.731
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000007)
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-06-03 00:41:44. Total running time: 15min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00004 with loss=1.705219381904602 and params={'l1': 32, 'l2': 16, 'lr': 0.014081266300228953, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00004 RUNNING 32 16 0.0140813 16 8 72.1881 1.70522 0.3505 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00005 PENDING 8 32 0.00226625 8 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8736) [9, 2000] loss: 1.722
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000008)
(func pid=8736) [10, 2000] loss: 1.743
Trial train_cifar_e5524_00004 completed after 10 iterations at 2026-06-03 00:41:58. Total running time: 15min 15s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00004 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 8.67563 │
│ time_total_s 89.66228 │
│ training_iteration 10 │
│ accuracy 0.3904 │
│ loss 1.76806 │
╰────────────────────────────────────────────────────────────╯
(func pid=8736) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00004_4_batch_size=16,l1=32,l2=16,lr=0.0141_2026-06-03_00-26-42/checkpoint_000009)
Trial train_cifar_e5524_00005 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00005 config │
├──────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 8 │
│ l2 32 │
│ lr 0.00227 │
╰──────────────────────────────────────────────────╯
(func pid=9632) [1, 2000] loss: 2.001
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-06-03 00:42:14. Total running time: 15min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00001 with loss=1.722906385421753 and params={'l1': 2, 'l2': 16, 'lr': 0.004505860994583894, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00005 RUNNING 8 32 0.00226625 8 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9632) [1, 4000] loss: 0.822
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000000)
(func pid=9632) [2, 2000] loss: 1.504
(func pid=9632) [2, 4000] loss: 0.735
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000001)
(func pid=9632) [3, 2000] loss: 1.395
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-06-03 00:42:45. Total running time: 16min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.460151535654068 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00005 RUNNING 8 32 0.00226625 8 2 34.2869 1.46015 0.4805 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9632) [3, 4000] loss: 0.687
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000002)
(func pid=9632) [4, 2000] loss: 1.330
(func pid=9632) [4, 4000] loss: 0.673
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000003)
(func pid=9632) [5, 2000] loss: 1.304
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-06-03 00:43:15. Total running time: 16min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.286154672551155 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00005 RUNNING 8 32 0.00226625 8 4 66.6463 1.28615 0.5453 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9632) [5, 4000] loss: 0.653
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000004)
(func pid=9632) [6, 2000] loss: 1.277
(func pid=9632) [6, 4000] loss: 0.640
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000005)
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-06-03 00:43:45. Total running time: 17min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3504340207338332 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00005 RUNNING 8 32 0.00226625 8 6 98.7505 1.35043 0.5232 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9632) [7, 2000] loss: 1.259
(func pid=9632) [7, 4000] loss: 0.640
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000006)
(func pid=9632) [8, 2000] loss: 1.241
(func pid=9632) [8, 4000] loss: 0.620
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000007)
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-06-03 00:44:15. Total running time: 17min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.2835253473043442 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00005 RUNNING 8 32 0.00226625 8 8 131.184 1.28353 0.5503 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9632) [9, 2000] loss: 1.232
(func pid=9632) [9, 4000] loss: 0.627
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000008)
(func pid=9632) [10, 2000] loss: 1.239
(func pid=9632) [10, 4000] loss: 0.613
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-06-03 00:44:45. Total running time: 18min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3233039300918579 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00005 RUNNING 8 32 0.00226625 8 9 147.435 1.3233 0.5434 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00006 PENDING 128 64 0.00233169 4 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_e5524_00005 completed after 10 iterations at 2026-06-03 00:44:46. Total running time: 18min 3s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00005 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 16.01236 │
│ time_total_s 163.44693 │
│ training_iteration 10 │
│ accuracy 0.5447 │
│ loss 1.30352 │
╰────────────────────────────────────────────────────────────╯
(func pid=9632) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00005_5_batch_size=8,l1=8,l2=32,lr=0.0023_2026-06-03_00-26-42/checkpoint_000009)
Trial train_cifar_e5524_00006 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00006 config │
├──────────────────────────────────────────────────┤
│ batch_size 4 │
│ device cuda │
│ l1 128 │
│ l2 64 │
│ lr 0.00233 │
╰──────────────────────────────────────────────────╯
(func pid=10645) [1, 2000] loss: 1.964
(func pid=10645) [1, 4000] loss: 0.854
(func pid=10645) [1, 6000] loss: 0.541
(func pid=10645) [1, 8000] loss: 0.398
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:45:15. Total running time: 18min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [1, 10000] loss: 0.309
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000000)
(func pid=10645) [2, 2000] loss: 1.502
(func pid=10645) [2, 4000] loss: 0.748
(func pid=10645) [2, 6000] loss: 0.495
(func pid=10645) [2, 8000] loss: 0.371
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:45:45. Total running time: 19min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 1 32.883 1.56131 0.4314 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [2, 10000] loss: 0.291
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000001)
(func pid=10645) [3, 2000] loss: 1.380
(func pid=10645) [3, 4000] loss: 0.704
(func pid=10645) [3, 6000] loss: 0.477
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:46:15. Total running time: 19min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 2 63.9038 1.5635 0.4505 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [3, 8000] loss: 0.360
(func pid=10645) [3, 10000] loss: 0.283
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000002)
(func pid=10645) [4, 2000] loss: 1.347
(func pid=10645) [4, 4000] loss: 0.683
(func pid=10645) [4, 6000] loss: 0.455
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:46:45. Total running time: 20min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 3 94.8003 1.40337 0.5052 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [4, 8000] loss: 0.346
(func pid=10645) [4, 10000] loss: 0.276
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000003)
(func pid=10645) [5, 2000] loss: 1.312
(func pid=10645) [5, 4000] loss: 0.666
(func pid=10645) [5, 6000] loss: 0.450
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:47:15. Total running time: 20min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 4 125.822 1.35232 0.5257 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [5, 8000] loss: 0.336
(func pid=10645) [5, 10000] loss: 0.278
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000004)
(func pid=10645) [6, 2000] loss: 1.270
(func pid=10645) [6, 4000] loss: 0.673
(func pid=10645) [6, 6000] loss: 0.445
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:47:45. Total running time: 21min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 5 156.739 1.41589 0.5151 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [6, 8000] loss: 0.340
(func pid=10645) [6, 10000] loss: 0.270
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000005)
(func pid=10645) [7, 2000] loss: 1.280
(func pid=10645) [7, 4000] loss: 0.642
(func pid=10645) [7, 6000] loss: 0.441
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:48:15. Total running time: 21min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 6 187.642 1.40365 0.5154 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [7, 8000] loss: 0.334
(func pid=10645) [7, 10000] loss: 0.277
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000006)
(func pid=10645) [8, 2000] loss: 1.290
(func pid=10645) [8, 4000] loss: 0.640
(func pid=10645) [8, 6000] loss: 0.438
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:48:45. Total running time: 22min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 7 218.187 1.40006 0.5102 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [8, 8000] loss: 0.341
(func pid=10645) [8, 10000] loss: 0.273
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000007)
(func pid=10645) [9, 2000] loss: 1.262
(func pid=10645) [9, 4000] loss: 0.646
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:49:15. Total running time: 22min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 8 249.054 1.53305 0.4781 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [9, 6000] loss: 0.437
(func pid=10645) [9, 8000] loss: 0.335
(func pid=10645) [9, 10000] loss: 0.268
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000008)
(func pid=10645) [10, 2000] loss: 1.316
(func pid=10645) [10, 4000] loss: 0.643
Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDING
Current time: 2026-06-03 00:49:45. Total running time: 23min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00006 RUNNING 128 64 0.00233169 4 9 279.923 1.47054 0.5085 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00007 PENDING 256 8 0.000329713 4 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=10645) [10, 6000] loss: 0.443
(func pid=10645) [10, 8000] loss: 0.329
(func pid=10645) [10, 10000] loss: 0.261
Trial train_cifar_e5524_00006 completed after 10 iterations at 2026-06-03 00:50:01. Total running time: 23min 18s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00006 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 30.85755 │
│ time_total_s 310.78093 │
│ training_iteration 10 │
│ accuracy 0.4851 │
│ loss 1.5189 │
╰────────────────────────────────────────────────────────────╯
(func pid=10645) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00006_6_batch_size=4,l1=128,l2=64,lr=0.0023_2026-06-03_00-26-42/checkpoint_000009)
Trial train_cifar_e5524_00007 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00007 config │
├──────────────────────────────────────────────────┤
│ batch_size 4 │
│ device cuda │
│ l1 256 │
│ l2 8 │
│ lr 0.00033 │
╰──────────────────────────────────────────────────╯
(func pid=11958) [1, 2000] loss: 2.281
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:50:15. Total running time: 23min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [1, 4000] loss: 1.047
(func pid=11958) [1, 6000] loss: 0.629
(func pid=11958) [1, 8000] loss: 0.435
(func pid=11958) [1, 10000] loss: 0.326
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000000)
(func pid=11958) [2, 2000] loss: 1.572
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:50:45. Total running time: 24min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 1 32.5404 1.63059 0.4078 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [2, 4000] loss: 0.763
(func pid=11958) [2, 6000] loss: 0.495
(func pid=11958) [2, 8000] loss: 0.361
(func pid=11958) [2, 10000] loss: 0.284
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000001)
(func pid=11958) [3, 2000] loss: 1.368
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:51:15. Total running time: 24min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 2 63.3747 1.43499 0.4883 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [3, 4000] loss: 0.676
(func pid=11958) [3, 6000] loss: 0.445
(func pid=11958) [3, 8000] loss: 0.326
(func pid=11958) [3, 10000] loss: 0.260
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000002)
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:51:45. Total running time: 25min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00005 with loss=1.3035233695745467 and params={'l1': 8, 'l2': 32, 'lr': 0.002266249962427044, 'batch_size': 8, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 3 94.2179 1.30373 0.5323 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [4, 2000] loss: 1.215
(func pid=11958) [4, 4000] loss: 0.622
(func pid=11958) [4, 6000] loss: 0.406
(func pid=11958) [4, 8000] loss: 0.305
(func pid=11958) [4, 10000] loss: 0.244
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000003)
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:52:16. Total running time: 25min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.2764593084335327 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 4 124.972 1.27646 0.5417 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [5, 2000] loss: 1.134
(func pid=11958) [5, 4000] loss: 0.577
(func pid=11958) [5, 6000] loss: 0.377
(func pid=11958) [5, 8000] loss: 0.286
(func pid=11958) [5, 10000] loss: 0.227
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000004)
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:52:46. Total running time: 26min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1927819685190917 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 5 155.739 1.19278 0.5795 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [6, 2000] loss: 1.059
(func pid=11958) [6, 4000] loss: 0.524
(func pid=11958) [6, 6000] loss: 0.358
(func pid=11958) [6, 8000] loss: 0.270
(func pid=11958) [6, 10000] loss: 0.215
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000005)
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:53:16. Total running time: 26min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.2195624839290977 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 6 186.583 1.21956 0.5786 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [7, 2000] loss: 1.003
(func pid=11958) [7, 4000] loss: 0.493
(func pid=11958) [7, 6000] loss: 0.334
(func pid=11958) [7, 8000] loss: 0.257
(func pid=11958) [7, 10000] loss: 0.202
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000006)
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:53:46. Total running time: 27min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1873233204752207 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 7 218.089 1.18732 0.5904 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [8, 2000] loss: 0.913
(func pid=11958) [8, 4000] loss: 0.463
(func pid=11958) [8, 6000] loss: 0.320
(func pid=11958) [8, 8000] loss: 0.233
(func pid=11958) [8, 10000] loss: 0.197
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000007)
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:54:16. Total running time: 27min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1913595205791294 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 8 249.008 1.19136 0.595 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) [9, 2000] loss: 0.881
(func pid=11958) [9, 4000] loss: 0.432
(func pid=11958) [9, 6000] loss: 0.294
(func pid=11958) [9, 8000] loss: 0.220
(func pid=11958) [9, 10000] loss: 0.184
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:54:46. Total running time: 28min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1913595205791294 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 8 249.008 1.19136 0.595 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000008)
(func pid=11958) [10, 2000] loss: 0.787
(func pid=11958) [10, 4000] loss: 0.417
(func pid=11958) [10, 6000] loss: 0.276
(func pid=11958) [10, 8000] loss: 0.212
(func pid=11958) [10, 10000] loss: 0.171
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-06-03 00:55:16. Total running time: 28min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.2191801403423772 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00007 RUNNING 256 8 0.000329713 4 9 279.762 1.21918 0.5878 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00008 PENDING 4 2 0.00884237 2 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_e5524_00007 completed after 10 iterations at 2026-06-03 00:55:17. Total running time: 28min 34s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00007 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 30.70419 │
│ time_total_s 310.46617 │
│ training_iteration 10 │
│ accuracy 0.6036 │
│ loss 1.18664 │
╰────────────────────────────────────────────────────────────╯
(func pid=11958) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00007_7_batch_size=4,l1=256,l2=8,lr=0.0003_2026-06-03_00-26-42/checkpoint_000009)
Trial train_cifar_e5524_00008 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00008 config │
├──────────────────────────────────────────────────┤
│ batch_size 2 │
│ device cuda │
│ l1 4 │
│ l2 2 │
│ lr 0.00884 │
╰──────────────────────────────────────────────────╯
(func pid=13255) [1, 2000] loss: 2.316
(func pid=13255) [1, 4000] loss: 1.156
(func pid=13255) [1, 6000] loss: 0.771
(func pid=13255) [1, 8000] loss: 0.579
Trial status: 8 TERMINATED | 1 RUNNING | 1 PENDING
Current time: 2026-06-03 00:55:46. Total running time: 29min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1866376930650324 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00008 RUNNING 4 2 0.00884237 2 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00007 TERMINATED 256 8 0.000329713 4 10 310.466 1.18664 0.6036 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=13255) [1, 10000] loss: 0.463
(func pid=13255) [1, 12000] loss: 0.385
(func pid=13255) [1, 14000] loss: 0.330
(func pid=13255) [1, 16000] loss: 0.289
(func pid=13255) [1, 18000] loss: 0.257
(func pid=13255) [1, 20000] loss: 0.231
Trial status: 8 TERMINATED | 1 RUNNING | 1 PENDING
Current time: 2026-06-03 00:56:16. Total running time: 29min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1866376930650324 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00008 RUNNING 4 2 0.00884237 2 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00007 TERMINATED 256 8 0.000329713 4 10 310.466 1.18664 0.6036 │
│ train_cifar_e5524_00009 PENDING 256 2 0.000383082 4 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_e5524_00008 completed after 1 iterations at 2026-06-03 00:56:24. Total running time: 29min 41s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00008 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000000 │
│ time_this_iter_s 62.47919 │
│ time_total_s 62.47919 │
│ training_iteration 1 │
│ accuracy 0.0964 │
│ loss 2.30843 │
╰────────────────────────────────────────────────────────────╯
(func pid=13255) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00008_8_batch_size=2,l1=4,l2=2,lr=0.0088_2026-06-03_00-26-42/checkpoint_000000)
Trial train_cifar_e5524_00009 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00009 config │
├──────────────────────────────────────────────────┤
│ batch_size 4 │
│ device cuda │
│ l1 256 │
│ l2 2 │
│ lr 0.00038 │
╰──────────────────────────────────────────────────╯
(func pid=13516) [1, 2000] loss: 2.334
(func pid=13516) [1, 4000] loss: 1.132
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-06-03 00:56:46. Total running time: 30min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1866376930650324 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00009 RUNNING 256 2 0.000383082 4 │
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00007 TERMINATED 256 8 0.000329713 4 10 310.466 1.18664 0.6036 │
│ train_cifar_e5524_00008 TERMINATED 4 2 0.00884237 2 1 62.4792 2.30843 0.0964 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=13516) [1, 6000] loss: 0.724
(func pid=13516) [1, 8000] loss: 0.520
(func pid=13516) [1, 10000] loss: 0.404
Trial train_cifar_e5524_00009 completed after 1 iterations at 2026-06-03 00:57:01. Total running time: 30min 18s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_e5524_00009 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000000 │
│ time_this_iter_s 32.65647 │
│ time_total_s 32.65647 │
│ training_iteration 1 │
│ accuracy 0.1907 │
│ loss 2.02742 │
╰────────────────────────────────────────────────────────────╯
2026-06-03 00:57:01,514 INFO tune.py:1001 -- Wrote the latest version of all result files and experiment state to '/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42' in 0.0100s.
Trial status: 10 TERMINATED
Current time: 2026-06-03 00:57:01. Total running time: 30min 18s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: e5524_00007 with loss=1.1866376930650324 and params={'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_e5524_00000 TERMINATED 128 1 0.000393605 2 10 610.499 1.85962 0.2146 │
│ train_cifar_e5524_00001 TERMINATED 2 16 0.00450586 8 10 164.183 1.72291 0.3412 │
│ train_cifar_e5524_00002 TERMINATED 4 8 0.015076 8 1 17.6161 2.30927 0.0998 │
│ train_cifar_e5524_00003 TERMINATED 256 1 0.000242106 16 1 10.7812 2.3504 0.0997 │
│ train_cifar_e5524_00004 TERMINATED 32 16 0.0140813 16 10 89.6623 1.76806 0.3904 │
│ train_cifar_e5524_00005 TERMINATED 8 32 0.00226625 8 10 163.447 1.30352 0.5447 │
│ train_cifar_e5524_00006 TERMINATED 128 64 0.00233169 4 10 310.781 1.5189 0.4851 │
│ train_cifar_e5524_00007 TERMINATED 256 8 0.000329713 4 10 310.466 1.18664 0.6036 │
│ train_cifar_e5524_00008 TERMINATED 4 2 0.00884237 2 1 62.4792 2.30843 0.0964 │
│ train_cifar_e5524_00009 TERMINATED 256 2 0.000383082 4 1 32.6565 2.02742 0.1907 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Best trial config: {'l1': 256, 'l2': 8, 'lr': 0.0003297126639410268, 'batch_size': 4, 'device': 'cuda'}
Best trial final validation loss: 1.1866376930650324
Best trial final validation accuracy: 0.6036
(func pid=13516) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-06-03_00-26-42/train_cifar_e5524_00009_9_batch_size=4,l1=256,l2=2,lr=0.0004_2026-06-03_00-26-42/checkpoint_000000)
Best trial test set accuracy: 0.6036
结果#
您的 Ray Tune 试验摘要输出看起来应该类似于此。文本表格总结了各试验的验证表现,并突出了最佳的超参数配置。
Number of trials: 10/10 (10 TERMINATED)
+-----+--------------+------+------+-------------+--------+---------+------------+
| ... | batch_size | l1 | l2 | lr | iter | loss | accuracy |
|-----+--------------+------+------+-------------+--------+---------+------------|
| ... | 2 | 1 | 256 | 0.000668163 | 1 | 2.31479 | 0.0977 |
| ... | 4 | 64 | 8 | 0.0331514 | 1 | 2.31605 | 0.0983 |
| ... | 4 | 2 | 1 | 0.000150295 | 1 | 2.30755 | 0.1023 |
| ... | 16 | 32 | 32 | 0.0128248 | 10 | 1.66912 | 0.4391 |
| ... | 4 | 8 | 128 | 0.00464561 | 2 | 1.7316 | 0.3463 |
| ... | 8 | 256 | 8 | 0.00031556 | 1 | 2.19409 | 0.1736 |
| ... | 4 | 16 | 256 | 0.00574329 | 2 | 1.85679 | 0.3368 |
| ... | 8 | 2 | 2 | 0.00325652 | 1 | 2.30272 | 0.0984 |
| ... | 2 | 2 | 2 | 0.000342987 | 2 | 1.76044 | 0.292 |
| ... | 4 | 64 | 32 | 0.003734 | 8 | 1.53101 | 0.4761 |
+-----+--------------+------+------+-------------+--------+---------+------------+
Best trial config: {'l1': 64, 'l2': 32, 'lr': 0.0037339984519545164, 'batch_size': 4}
Best trial final validation loss: 1.5310075663924216
Best trial final validation accuracy: 0.4761
Best trial test set accuracy: 0.4737
大多数试验因节省资源而提前停止。表现最好的试验达到了约 47% 的验证准确率,测试集证实了这一点。
可观测性#
在运行大规模实验时,监控至关重要。Ray 提供了一个 仪表板,让您可以实时查看试验状态、检查集群资源使用情况并检查日志。
为了进行调试,Ray 还提供 分布式调试工具,允许您将调试器附加到集群中正在运行的试验上。
结论#
在本教程中,您学习了如何使用 Ray Tune 调整 PyTorch 模型的超参数。您了解了如何将 Ray Tune 集成到您的 PyTorch 训练循环中,定义超参数搜索空间,使用高效的调度器(如 ASHAScheduler)提前终止表现低下的试验,保存检查点并将指标报告给 Ray Tune,以及运行超参数搜索并分析结果。
Ray Tune 使您可以轻松地将实验从单机扩展到大型集群,帮助您高效地找到最佳的模型配置。
延伸阅读#
脚本总运行时间:(30 分钟 34.637 秒)