🔍Overview |
📦Installation |
🚀Quick Start |
⚙️Usage |
🤝Contributing |
📖Docs |
SREGym is inspired by our prior work on AIOpsLab and ITBench. It is architectured with AI-native usability and extensibility as first-class principles. The SREGym benchmark suites contain 86 different SRE problems. It supports all the problems from AIOpsLab and ITBench, and includes new problems such as OS-level faults, metastable failures, and concurrent failures. See our problem set for a complete list of problems.
- MCP Inspector to test MCP tools.
- k9s to observe the cluster.
git clone --recurse-submodules https://github.com/SREGym/SREGym
cd SREGym
uv sync
uv run pre-commit installChoose either a) or b) to set up your cluster and then proceed to the next steps.
SREGym supports any kubernetes cluster that your kubectl context is set to, whether it's a cluster from a cloud provider or one you build yourself.
We have an Ansible playbook to setup clusters on providers like CloudLab and our own machines. Follow this README to set up your own cluster.
SREGym can be run on an emulated cluster using kind on your local machine. However, not all problems are supported.
# For x86 machines
kind create cluster --config kind/kind-config-x86.yaml
# For ARM machines
kind create cluster --config kind/kind-config-arm.yamlTo get started with the included Stratus agent:
- Create your
.envfile:
mv .env.example .env-
Open the
.envfile and configure your model and API key. -
Run the benchmark:
python main.py --agent <agent-name> --model <model-id>For example, to run the Stratus agent:
python main.py --agent stratus --model gpt-4oSREGym supports multiple LLM providers. Specify your model using the --model flag:
python main.py --agent <agent-name> --model <model-id>| Model ID | Provider | Model Name | Required Environment Variables |
|---|---|---|---|
gpt-4o |
OpenAI | GPT-4o | OPENAI_API_KEY |
gemini-2.5-pro |
Gemini 2.5 Pro | GEMINI_API_KEY |
|
claude-sonnet-4 |
Anthropic | Claude Sonnet 4 | ANTHROPIC_API_KEY |
bedrock-claude-sonnet-4.5 |
AWS Bedrock | Claude Sonnet 4.5 | AWS_PROFILE, AWS_DEFAULT_REGION |
moonshot |
Moonshot | Moonshot | MOONSHOT_API_KEY |
watsonx-llama |
IBM watsonx | Llama 3.3 70B | WATSONX_API_KEY, WX_PROJECT_ID |
glm-4 |
GLM | GLM-4 | GLM_API_KEY |
azure-openai-gpt-4o |
Azure OpenAI | GPT-4o | AZURE_API_KEY, AZURE_API_BASE |
Default: If no model is specified, gpt-4o is used by default.
OpenAI:
# In .env file
OPENAI_API_KEY="sk-proj-..."
# Run with GPT-4o
python main.py --agent stratus --model gpt-4oAnthropic:
# In .env file
ANTHROPIC_API_KEY="sk-ant-api03-..."
# Run with Claude Sonnet 4
python main.py --agent stratus --model claude-sonnet-4AWS Bedrock:
# In .env file
AWS_PROFILE="bedrock"
AWS_DEFAULT_REGION=us-east-2
# Run with Claude Sonnet 4.5 on Bedrock
python main.py --agent stratus --model bedrock-claude-sonnet-4.5Note: For AWS Bedrock, ensure your AWS credentials are configured via ~/.aws/credentials and your profile has permissions to access Bedrock.
This project is generously supported by a Slingshot grant from the Laude Institute.
Licensed under the MIT license.
