Inspiration
Technical interviews have changed. Companies no longer just ask candidates to solve algorithm puzzles; now, they give real codebases and expect effective use of AI tools. This shift creates a new challenge: if everyone uses AI, how do you distinguish great engineers from those who just copy-paste? OverSite answers this by observing not only what candidates produce, but how they get there—their prompts, accepted suggestions, code changes, and verification steps. In 2026, the best engineers aren’t the ones who avoid AI, but those who know how to work with it.
What it does
OverSite is an AI-assisted interview evaluation platform supporting every stage of hiring. Companies assign realistic coding tasks and track AI usage in a controlled environment; candidates complete problems in a web IDE with built-in tools; and admins monitor progress, review scores, and analyze AI collaboration quality. OverSite measures prompt clarity, iteration, validation, collaboration, and code quality, shifting the focus from “Can you solve this alone?” to “Can you build effectively with AI?”
How we built it
During each interview, we capture the candidate’s “work trail”: their prompts and the AI’s responses, what they changed in the editor, when they ran code or tests, and how their solution evolved over time. We log those actions as a clean event stream and roll them up into signals our rubric can score, such as how often they verify, how much they iterate, and whether they critically review AI suggestions rather than blindly accept them.
At session end, candidate behavior is scored using 15 behavioral signals (like AI suggestion acceptance, deliberation time, and verification), prompt quality, and is labeled as over-reliant, balanced, or strategic. An LLM judge produces a narrative summary. We trained the model using real-world Copilot telemetry, WildChat prompts, and SWE-bench Lite sessions labeled for judgment quality, ensuring the system rewards thoughtful AI use over speed.
Challenges we ran into
Defining “good usage” was our first major hurdle: while measuring correctness is simple, measuring how someone uses AI is not, so we formalized what effective collaboration looks like, including prompt quality, validation practices, and critical evaluation of suggestions. Balancing fairness and AI assistance required careful prompt engineering and behavioral constraints to find the right guardrails. Building a realistic, multi-file web IDE that stayed lightweight, responsive, and secure—especially for the accept/reject workflow with code diff highlighting—was technically demanding. Scoring explainability was the final challenge, as we wanted more than a black-box score; the system needed to justify evaluations with clear reasoning and concrete examples from the candidate’s actual chat history and interactions.
Accomplishments that we're proud of
We are particularly proud of building something that genuinely reflects how engineering works today. We use real-world behavioral data to tackle a problem that hiring teams are actively facing. Successfully reframing AI from a “cheating tool” into an evaluable, measurable skill is the core achievement we’re most excited about.
We built a complete interview flow for both candidates and admins, end-to-end. We created a structured rubric for AI collaboration quality, designed a clean and intuitive IDE layout that feels familiar to working engineers, and developed explainable scoring with narrative feedback rather than opaque metrics.
What we learned
Developing OverSite reinforced that AI fluency is not binary. It exists on a spectrum, and people fall at very different points on it. The strongest candidates don’t just copy AI output; they guide it, question it, and refine it. How someone approaches a problem often tells you more than the final answer they submit.
We also learned that when scoring is transparent and explainable, it builds trust on both sides of the hiring process. More broadly, this project made clear that the future of technical interviews will thoughtfully incorporate AI assistance. It’s important to stay ahead of that shift rather than pretend it isn’t happening.
What's next for OverSite
Looking ahead, we plan to expand OverSite with more advanced usage analytics, including prompt embeddings and behavior clustering. We will build role-specific evaluation rubrics for frontend, backend, data science, and other specializations. A session replay feature will let reviewers watch how candidates worked in real time. Ideally, we can gather more targeted training data from user sessions.
Log in or sign up for Devpost to join the conversation.