Skip to content

sprice/bffm-xgb

Repository files navigation

BFFM-XGB: Big Five From 20 Questions

Open-source pipeline for training XGBoost quantile regression models that predict Big Five personality scores from partial questionnaire responses. Trained on ~603k respondents from the IPIP-BFFM dataset with sparsity augmentation, the 15 exported ONNX models (5 domains x 3 quantiles) produce percentile scores with calibrated 90% prediction intervals from as few as 20 items.

What's here:

  • Models — Pre-trained ONNX models, public domain, on HuggingFace
  • Pipeline — Reproducible end-to-end training: data download through ONNX export and publication figures
  • Inference packages — Python and TypeScript libraries for running predictions
  • Live demobig5.shawnprice.com

Comparison with the Mini-IPIP

The Mini-IPIP is the standard short personality test in psychology research (Donnellan et al., 2006). Both approaches use 20 items (4 per domain) to recover the full 50-item IPIP-BFFM scale scores. All r values are Pearson correlations with the full 50-item scale on a held-out test set (N = 90,499).

Mini-IPIP BFFM-XGB-20
Items 20 (4 per domain) 20 (4 per domain)
Item selection Expert-curated brevity Top-4 by within-domain r
Scoring Simple scale averaging XGBoost quantile regression
Overall r .906 .927
MAE (percentile pts) 9.2 8.2
90% prediction intervals ✓ (89.5% coverage)

Per-domain accuracy at K = 20:

Domain Mini-IPIP α Mini-IPIP r BFFM-XGB-20 r
Extraversion .77 .939 .947
Agreeableness .70 .911 .920
Conscientiousness .69 .909 .919
Emotional Stability .68 .929 .937
Intellect/Imagination .65 .842 .910

At 15 items, BFFM-XGB already matches the Mini-IPIP's 20-item accuracy (r = .908 vs .906).

Quick Start

Language Directory Install Docs
Python python/ pip install onnxruntime numpy scipy pytest Inference guide
TypeScript typescript/ npm ci Inference guide

Give it answers (1–5 scale, reverse-scored), get percentiles with 90% confidence intervals. See docs/inference.md for full code examples.

Reproduce

make setup    # Python, TypeScript, and web dependencies
make all      # Full pipeline: download through figures
make test     # All tests: lib, inference, and web

See docs/pipeline.md for pipeline stages, training variants, hyperparameter tuning, and research evaluation.

Directory Structure

bffm-xgb/
├── artifacts/          Pipeline artifacts and static reference data
├── configs/            YAML training configurations (reference + ablation variants)
├── data/               Downloaded and processed data (gitignored)
├── docs/               Documentation (inference, pipeline, research, infrastructure)
├── figures/            Generated publication figures (gitignored, regenerable)
├── infra/              Terraform configs for AWS CPU/GPU instances
├── lib/                Shared Python library
├── models/             Trained model checkpoints (gitignored)
├── notes/              Research notes (auto-generated data sections)
├── output/             Exported ONNX models by variant (reference/, ablation_*/)
├── pipeline/           Numbered pipeline scripts (01 through 13)
├── python/             Python inference package with tests
├── scripts/            Pipeline utilities (research summary, notes, provenance, backup, deployment)
├── templates/          Jinja2 templates for generated outputs
├── tests/              Unit tests for lib/ modules (pytest)
├── typescript/         TypeScript inference package with tests (vitest)
├── web/                React + Hono web assessment app (deployed to HuggingFace Spaces)
├── .env.example        Template for HF_TOKEN (required for `make upload-hf`)
├── .gitignore
├── LICENSE.md          MIT License
├── Makefile            Orchestrates the full pipeline
├── NOTICES.md          Third-party attributions (CC0, IPIP, OSPP)
├── pyproject.toml      pytest configuration (pythonpath, testpaths)
├── requirements.txt    Python dependencies
└── README.md           This file

Documentation

  • Inference Guide — Python/TypeScript usage, code examples, reverse-scoring
  • Pipeline Guide — Full reproduction, pipeline stages, training, research evaluation
  • Research Notes — Model architecture, sparsity augmentation, norms, data, limitations
  • Infrastructure Guide — AWS remote training (CPU/GPU, spot/on-demand)
  • Web App — React + Hono assessment app
  • Model Cards — ONNX model details and provenance
  • NOTES.md — Auto-generated research notes with cross-variant evaluation data

Limitations

  • Norms are derived from self-selected online respondents (OSPP); they may not represent the general population
  • Models are trained on English-language IPIP items only
  • Accuracy degrades with fewer items; 20 items is the recommended minimum for reliable scoring
  • Intended for educational use only

License

MIT. See NOTICES.md for third-party attributions.

About

XGBoost model pipeline for Big Five personality scoring from sparse IPIP-BFFM responses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors