SREGym: A Benchmarking Platform for SRE Agents

🔍 Overview

SREGym is an AI-native platform to enable the design, development, and evaluation of AI agents for Site Reliability Engineering (SRE). The core idea is to create live system environments for SRE agents to solve real-world SRE problems. SREGym provides a comprehensive SRE benchmark suite with a wide variety of problems for evaluating SRE agents and also for training next-generation AI agents.

SREGym is inspired by our prior work on AIOpsLab and ITBench. It is architectured with AI-native usability and extensibility as first-class principles. The SREGym benchmark suites contain 86 different SRE problems. It supports all the problems from AIOpsLab and ITBench, and includes new problems such as OS-level faults, metastable failures, and concurrent failures. See our problem set for a complete list of problems.

📦 Installation

Requirements

Python >= 3.12
Helm
brew
kubectl
uv
kind (if running locally)

Recommendations

MCP Inspector to test MCP tools.
k9s to observe the cluster.

git clone --recurse-submodules https://github.com/SREGym/SREGym
cd SREGym
uv sync
uv run pre-commit install

🚀 Quickstart

Setup your cluster

Choose either a) or b) to set up your cluster and then proceed to the next steps.

a) Kubernetes Cluster (Recommended)

SREGym supports any kubernetes cluster that your kubectl context is set to, whether it's a cluster from a cloud provider or one you build yourself.

We have an Ansible playbook to setup clusters on providers like CloudLab and our own machines. Follow this README to set up your own cluster.

b) Emulated cluster

SREGym can be run on an emulated cluster using kind on your local machine. However, not all problems are supported.

# For x86 machines
kind create cluster --config kind/kind-config-x86.yaml

# For ARM machines
kind create cluster --config kind/kind-config-arm.yaml

⚙️ Usage

Running an Agent

Quick Start

To get started with the included Stratus agent:

Create your .env file:

mv .env.example .env

Open the .env file and configure your model and API key.
Run the benchmark:

python main.py --agent <agent-name> --model <model-id>

For example, to run the Stratus agent:

python main.py --agent stratus --model gpt-4o

Model Selection

SREGym supports multiple LLM providers. Specify your model using the --model flag:

python main.py --agent <agent-name> --model <model-id>

Available Models

Model ID	Provider	Model Name	Required Environment Variables
`gpt-4o`	OpenAI	GPT-4o	`OPENAI_API_KEY`
`gemini-2.5-pro`	Google	Gemini 2.5 Pro	`GEMINI_API_KEY`
`claude-sonnet-4`	Anthropic	Claude Sonnet 4	`ANTHROPIC_API_KEY`
`bedrock-claude-sonnet-4.5`	AWS Bedrock	Claude Sonnet 4.5	`AWS_PROFILE`, `AWS_DEFAULT_REGION`
`moonshot`	Moonshot	Moonshot	`MOONSHOT_API_KEY`
`watsonx-llama`	IBM watsonx	Llama 3.3 70B	`WATSONX_API_KEY`, `WX_PROJECT_ID`
`glm-4`	GLM	GLM-4	`GLM_API_KEY`
`azure-openai-gpt-4o`	Azure OpenAI	GPT-4o	`AZURE_API_KEY`, `AZURE_API_BASE`

Default: If no model is specified, gpt-4o is used by default.

Examples

OpenAI:

# In .env file
OPENAI_API_KEY="sk-proj-..."

# Run with GPT-4o
python main.py --agent stratus --model gpt-4o

Anthropic:

# In .env file
ANTHROPIC_API_KEY="sk-ant-api03-..."

# Run with Claude Sonnet 4
python main.py --agent stratus --model claude-sonnet-4

AWS Bedrock:

# In .env file
AWS_PROFILE="bedrock"
AWS_DEFAULT_REGION=us-east-2

# Run with Claude Sonnet 4.5 on Bedrock
python main.py --agent stratus --model bedrock-claude-sonnet-4.5

Note: For AWS Bedrock, ensure your AWS credentials are configured via ~/.aws/credentials and your profile has permissions to access Bedrock.

Acknowledgements

This project is generously supported by a Slingshot grant from the Laude Institute.

License

Licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 2,093 Commits
.github		.github
SREGym-applications @ 3c2b89b		SREGym-applications @ 3c2b89b
assets		assets
clients		clients
kind		kind
llm_backend		llm_backend
logger		logger
mcp_server		mcp_server
provisioner		provisioner
scripts		scripts
sregym		sregym
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
Problem List.md		Problem List.md
README.md		README.md
agents.yaml		agents.yaml
cli.py		cli.py
main.py		main.py
pyproject.toml		pyproject.toml
run-oracle.py		run-oracle.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SREGym: A Benchmarking Platform for SRE Agents

🔍 Overview

📦 Installation

Requirements

Recommendations

🚀 Quickstart

Setup your cluster

a) Kubernetes Cluster (Recommended)

b) Emulated cluster

⚙️ Usage

Running an Agent

Quick Start

Model Selection

Available Models

Examples

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 22

Uh oh!

Languages

License

SREGym/SREGym

Folders and files

Latest commit

History

Repository files navigation

SREGym: A Benchmarking Platform for SRE Agents

🔍 Overview

📦 Installation

Requirements

Recommendations

🚀 Quickstart

Setup your cluster

a) Kubernetes Cluster (Recommended)

b) Emulated cluster

⚙️ Usage

Running an Agent

Quick Start

Model Selection

Available Models

Examples

Acknowledgements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 22

Uh oh!

Languages

Packages