OverSite is an online assessment (OA) platform for technical hiring—similar to CodeSignal or HackerRank—with a key difference: candidates can use AI during the assessment, and the platform reports on how they use it. Recruiters get both correctness and behavioral insights: how candidates prompt the AI, review suggestions, and verify their work.
Candidates code in a browser-based IDE with an embedded AI assistant. Sessions are scored for behavioral signals (e.g. over-reliant vs. strategic AI use) and summarized in recruiter-facing reports. OverSite is built for the reality that AI-assisted coding is the norm—assess and hire for how people work with AI, not in spite of it.
- IDE & session environment: VS Code–like workspace (Monaco editor, file explorer, terminal) with phased flows (Orientation → Implementation → Verification). Candidates open a problem, edit code, run tests, and use an in-panel AI chat—all in the browser.
- Real-time telemetry: Editor events (edits, deltas), execute events (terminal runs, exit codes), panel focus, and AI suggestion accept/reject/modify decisions are logged with timestamps. Cursor-style diff UI captures how candidates use AI suggestions before committing.
- Behavioral scoring: A multi-model pipeline that evaluates candidate behavior. It uses an XGBoost classifier for structural signals (e.g., prompt frequency) and reports telemetry like acceptance rates and deliberation time. A Gemini-based judge provides a session summary and highlights key behavioral evidence.
OverSite evaluates candidates across three primary dimensions using a multi-model scoring system:
- Behavioral Patterns: An XGBoost classifier trained on the Microsoft CUPS dataset.
- Classifier Path: Uses a subset of 3 structural features (prompt frequencies, iteration depth).
- Evidence Path: Reports telemetry (Acceptance Rate, Deliberation Time, Edit Rate) and metrics like
deliberation_to_action_ratioto the Gemini judge. - Calibration: Uses Platt Scaling to ensure output probabilities are reliable for final scoring.
- Prompt Quality: An XGBoost model trained on the AI2 WildChat dataset. It evaluates prompts based on engineering specificity, distinguishing between skilled prompting and simple copy-pasting.
- Critical Review: A heuristic engine measuring the Levenshtein Distance between AI-proposed code and the final submission. This measures how much a candidate modified or verified AI suggestions.
- Input / Ingestion: Candidates start a session, and the frontend streams editor events, execution results, panel focus, and AI chat interactions.
- Processing / Validation: The backend stores events in SQLite. At session end, the scoring engine extracts features, applies the XGBoost models, and runs the rule-based logic. A Gemini call generates the session summary.
- Execution / State Update: Session data, files, and
session_scores(results, summary, feature importance) are persisted. Admins can trigger re-scoring if needed. - Output / Response: Recruiters view session scores and the Gemini-generated summary in the admin dashboard. The candidate-facing UI manages the active IDE session.
- Fallback mode: If ML models are unavailable,
SCORING_FALLBACK_MODE=trueuses rule-based behavioral patterns instead. - Session lifecycle: Assessments have set time limits. Scoring runs at session end; only the Gemini-based summary runs asynchronously.
- Technical hiring & OAs: Run take-home or timed coding assessments with an embedded AI assistant; evaluate both correctness and how candidates use AI (decomposition, prompting, verification, critical review).
- Behavioral differentiation: Produce consistent, defensible behavioral labels and narratives (e.g. over_reliant vs. strategic) from real session data instead of post-hoc interviewer judgment.
- Recruiter dashboards: Admin view for session lists, detailed scores, and Gemini-generated summaries to support hiring.
- SETUP.md: Installation, environment configuration, and startup instructions.
- ARCHITECTURE.md: System design, data flow, glossary, and design decisions.
- API.md: REST (JSON) API and session/auth reference.
- TESTING.md: Testing guidelines.
- STYLE.md: Coding standards, testing guidelines, and repository conventions.
See LICENSE file for details.




