Live Demo:
Inspiration
AI incident evidence is scattered across incidents, reports, and multiple taxonomy systems (MIT, GMF, CSET), with uneven coverage and changing snapshot schemas. We were inspired to build a workflow that makes this data usable, reproducible, and responsibly interpreted for researchers, policy teams, and oversight stakeholders.
What it does
AI Incident Observatory ingests AI Incident Database snapshots, normalizes schema and incident IDs, links incidents to reports, and aligns taxonomy labels at the incident level. It produces reproducible analysis through seven notebooks and provides an interactive Streamlit dashboard for filtering, trend exploration, and exporting CSV/PNG/PDF outputs. It is explicitly descriptive, with built-in guardrails against causal overclaiming.
How we built it
We built the system in Python using pandas, matplotlib, and streamlit, organized into modular components:
src/io.pyfor snapshot extraction and table loadingsrc/transform.pyfor normalization, joins, and taxonomy alignmentsrc/notebook_utils.pyfor shared notebook helperssrc/dashboard.pyfor interactive exploration and exports
We structured the analysis as a reproducible 7-notebook pipeline (01 to 07) and ensured deterministic outputs, schema-flexible column detection, and graceful handling of optional/missing tables.
Challenges we ran into
The biggest challenge was data heterogeneity across snapshots: inconsistent field names, partial taxonomy coverage, and optional tables that may be absent. We also had to handle reporting bias and date/lag ambiguities without overstating conclusions. Another challenge was keeping notebook analysis and dashboard behavior consistent so interactive results match reproducible pipeline logic.
Accomplishments that we're proud of
We delivered an end-to-end, reproducible system that combines technical robustness with responsible interpretation:
- A complete notebook workflow from validation to responsible interpretation
- A dashboard that adapts to available data and avoids invalid charting paths
- Defensive checks and graceful degradation instead of brittle failures
- Exportable artifacts (figures, CSV summaries, PDF reports) for real-world use
- Clear Responsible AI documentation integrated into the project, not bolted on later
What we learned
We learned that in AI safety analytics, data engineering quality is as important as visualization quality. Coverage-aware interpretation is essential when taxonomies are uneven. We also learned that transparent, deterministic pipelines build trust: users can rerun, audit, and verify results rather than relying on opaque workflows.
What's next for AI Incident Observatory
Next, we plan to:
- Add cross-snapshot drift analysis to track how categories and coverage change over time
- Add taxonomy consistency diagnostics for label agreement and ambiguity detection
- Expand structured data quality reports (missingness, duplicates, schema drift)
- Improve governance metadata for each exported chart (source fields, filters, snapshot hash)
- Package standardized reporting bundles for policy/research publication workflows
Built With
- and
- cloud
- community
- deployment
- docker
- git
- github
- jupyterlab-notebooks
- matplotlib/networkx
- pandas/numpy/pyarrow
- python
- scikit-learn
- streamlit
- via
Log in or sign up for Devpost to join the conversation.