Skip to content

brandonhimpfen/awesome-ai-safety-alignment

Repository files navigation

Awesome AI Safety & Alignment Awesome Lists

DOI
GitHub Sponsor   Buy Me a Coffee   Ko-Fi   PayPal

A curated list of research, frameworks, tools, evaluations, and resources focused on AI alignment, safety, robustness, red-teaming, model governance, and responsible development.

Support ongoing maintenance and curation via GitHub Sponsors.

Contents

Research Organizations

Safety Frameworks

Red Teaming & Threat Modeling

Evaluation & Benchmarks

  • HELM – Holistic evaluation of language models across safety and risk domains.
  • Anthropic Evaluations – Safety evaluations for frontier models.
  • OpenAI Evals – Framework for testing model safety, reasoning, and reliability.
  • Red Teaming Benchmarks – Community-driven safety evaluations.
  • ToxiGen – Datasets for evaluating harmful or toxic outputs.
  • SafetyBench – Benchmark framework for AI safety scenarios.

Model Governance & Policy

Datasets

  • JailbreakBench – Evaluation dataset for jailbreak susceptibility.
  • HarmBench – Multi-domain dataset for AI harm classification and safety testing.
  • RealToxicityPrompts – Adversarial or harmful prompts used in robustness evaluation.
  • AdvBench – Dataset for adversarial attacks and safety testing.

Learning Resources

Related Awesome Lists

Contribute

Contributions are welcome. Please ensure your submission fully follows the requirements outlined in CONTRIBUTING.md, including formatting, scope alignment, and category placement.

Pull requests that do not adhere to the contribution guidelines may be closed.

License

CC0

About

A curated list of research, frameworks, tools, evaluations, and resources focused on AI alignment, safety, robustness, red-teaming, model governance, and responsible development.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages