Principal Research Scientist at IBM Research AI and MIT-IBM Watson AI Lab, focusing on reliable AI, LLM governance, uncertainty quantification, and trustworthy machine learning.
View My GitHub Profile
Prasanna Sattigeri
I am a Principal Research Scientist at IBM Research AI and the MIT-IBM Watson AI Lab, where my primary focus is on developing reliable AI solutions.
My current projects establish both theoretical frameworks and practical systems that ensure large language models are reliable and trustworthy. I lead the Granite Guardian project — IBM’s state-of-the-art LLM safeguarding models.
Research Interests
- Generative Modeling and Large Language Models
- Uncertainty Quantification for AI systems
- Learning with Limited Data
- LLM Governance, Safety, and Alignment
- Human-AI Collaboration
- Agentic AI Systems
Open-Source Contributions
I lead and contribute to widely-adopted trustworthy AI toolkits:
Links
Recent News
2025
2024
- December 2024 — Released Granite Guardian — achieving AUC 0.871 on harmful content and 0.854 on RAG-hallucination benchmarks
- October 2024 — New preprint: “Building a Foundational Guardrail for General Agentic Systems” — safeguarding agentic AI via synthetic data
- October 2024 — New preprint: “Graph-based Uncertainty Metrics for Long-form LLM Outputs”
- December 2024 — Papers accepted at NeurIPS 2024:
- “Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?”
- “WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts”
- “Attack Atlas: Challenges and Pitfalls in Red Teaming GenAI”
- November 2024 — Papers accepted at EMNLP 2024:
- July 2024 — Paper accepted at ICML 2024: “Thermometer: Towards Universal Calibration for Large Language Models”
- June 2024 — Invited talk on LLM Governance and Alignment at the NAACL TrustNLP Workshop. Slides
- 2024 — Panel and talk on Reliable AI-assisted Decision Making at the National Academy of Sciences Decadal Survey
- 2024 — Speaking at MIT AI Conference on AI ethics and change management
2023
- December 2023 — Papers accepted at NeurIPS 2023:
- August 2023 — Invited talk on Uncertainty Calibration at KDD Workshop on Uncertainty Reasoning
- August 2023 — Panel on Generative AI and Safety at DSHealth Workshop, KDD
- August 2023 — Panel on Trustworthy LLMs at AI for Open Society Day, KDD
- February 2023 — Papers at AAAI 2023 and EACL 2023
Featured Research
Granite Guardian — State-of-the-Art LLM Safeguarding
I lead the Granite Guardian project at IBM Research, developing open-source models for LLM risk detection:
- #1 on GuardBench — First independent AI guardrail benchmark (86% accuracy across 40 datasets)
- #1 on REVEAL — Reasoning chain correctness evaluation (outperforms GPT-4o)
- #3 on LLM-AggreFact — Comprehensive fact-checking benchmark
- Covers social bias, profanity, violence, jailbreaking, and RAG hallucination risks
- Available on Hugging Face and GitHub
MIT-IBM Watson AI Lab Collaborations
- With Prof. Greg Wornell (MIT): Trustworthy Learning with Limited Data — uncertainty quantification and calibration for foundation models
- With Prof. David Sontag (MIT): Human-Centric AI — algorithms for shared decision making and human-AI team onboarding
Professional Service
- Associate Editor: Pattern Recognition (Elsevier)
- Senior Program Committee / Area Chair: AAAI, ICLR, NeurIPS, ICML
- Reviewer: NeurIPS, ICML, AAAI, ICLR, EMNLP, ACL, IEEE TPAMI