Prasanna Sattigeri

Principal Research Scientist at IBM Research AI and MIT-IBM Watson AI Lab, focusing on reliable AI, LLM governance, uncertainty quantification, and trustworthy machine learning.

View My GitHub Profile

Prasanna Sattigeri

I am a Principal Research Scientist at IBM Research AI and the MIT-IBM Watson AI Lab, where my primary focus is on developing reliable AI solutions.

My current projects establish both theoretical frameworks and practical systems that ensure large language models are reliable and trustworthy. I lead the Granite Guardian project — IBM’s state-of-the-art LLM safeguarding models.

Research Interests

Generative Modeling and Large Language Models
Uncertainty Quantification for AI systems
Learning with Limited Data
LLM Governance, Safety, and Alignment
Human-AI Collaboration
Agentic AI Systems

Open-Source Contributions

I lead and contribute to widely-adopted trustworthy AI toolkits:

Granite Guardian — LLM safeguarding for risks, jailbreaking, and hallucination detection (#1 on GuardBench)
AI Fairness 360 — Detecting and mitigating bias in ML models (2,300+ GitHub stars)
AI Explainability 360 — Explaining AI decisions
Uncertainty Quantification 360 — Quantifying uncertainty in AI predictions
ICX360 — Multi-level explanations for generative language models

Links

Recent News

2025

April 2025 — IBM Research Blog: Granite Guardian tops third-party AI benchmark — Granite Guardian holds 6 of top 10 spots on GuardBench, scoring 86% across 40 datasets
April 2025 — Paper accepted at ACL 2025: “Multi-Level Explanations for Generative Language Models”
April 2025 — Paper accepted at NAACL 2025: “Evaluating the Prompt Steerability of Large Language Models”
April 2025 — Paper accepted at NAACL 2025 Industry Track: “Granite Guardian: Comprehensive LLM Safeguarding”
March 2025 — IBM Research Blog: IBM Granite now has adapters designed to control AI outputs — LLM calibration work from MIT-IBM Watson AI Lab
February 2025 — New preprint: “On the Trustworthiness of Generative Foundation Models” — comprehensive guideline and assessment (66 co-authors)
February 2025 — New preprint: “Agentic AI Needs a Systems Theory” — position paper on holistic approaches to agentic AI
February 2025 — IBM Research Blog: How we slimmed down Granite Guardian — Granite Guardian 3.2 5B and MoE 3B models

2024

December 2024 — Released Granite Guardian — achieving AUC 0.871 on harmful content and 0.854 on RAG-hallucination benchmarks
October 2024 — New preprint: “Building a Foundational Guardrail for General Agentic Systems” — safeguarding agentic AI via synthetic data
October 2024 — New preprint: “Graph-based Uncertainty Metrics for Long-form LLM Outputs”
December 2024 — Papers accepted at NeurIPS 2024:
- “Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?”
- “WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts”
- “Attack Atlas: Challenges and Pitfalls in Red Teaming GenAI”
November 2024 — Papers accepted at EMNLP 2024:
- “Language Models in Dialogue: Conversational Maxims for Human-AI Interactions”
- “Value Alignment from Unstructured Text” (Industry Track)
July 2024 — Paper accepted at ICML 2024: “Thermometer: Towards Universal Calibration for Large Language Models”
June 2024 — Invited talk on LLM Governance and Alignment at the NAACL TrustNLP Workshop. Slides
2024 — Panel and talk on Reliable AI-assisted Decision Making at the National Academy of Sciences Decadal Survey
2024 — Speaking at MIT AI Conference on AI ethics and change management

2023

December 2023 — Papers accepted at NeurIPS 2023:
- “Efficient Equivariant Transfer Learning from Pretrained Models”
- “Effective Human-AI Teams via Learned Natural Language Rules and Onboarding”
August 2023 — Invited talk on Uncertainty Calibration at KDD Workshop on Uncertainty Reasoning
August 2023 — Panel on Generative AI and Safety at DSHealth Workshop, KDD
August 2023 — Panel on Trustworthy LLMs at AI for Open Society Day, KDD
February 2023 — Papers at AAAI 2023 and EACL 2023

Featured Research

Granite Guardian — State-of-the-Art LLM Safeguarding

I lead the Granite Guardian project at IBM Research, developing open-source models for LLM risk detection:

#1 on GuardBench — First independent AI guardrail benchmark (86% accuracy across 40 datasets)
#1 on REVEAL — Reasoning chain correctness evaluation (outperforms GPT-4o)
#3 on LLM-AggreFact — Comprehensive fact-checking benchmark
Covers social bias, profanity, violence, jailbreaking, and RAG hallucination risks
Available on Hugging Face and GitHub

MIT-IBM Watson AI Lab Collaborations

With Prof. Greg Wornell (MIT): Trustworthy Learning with Limited Data — uncertainty quantification and calibration for foundation models
With Prof. David Sontag (MIT): Human-Centric AI — algorithms for shared decision making and human-AI team onboarding

Professional Service

Associate Editor: Pattern Recognition (Elsevier)
Senior Program Committee / Area Chair: AAAI, ICLR, NeurIPS, ICML
Reviewer: NeurIPS, ICML, AAAI, ICLR, EMNLP, ACL, IEEE TPAMI