Inspiration

Most people have probably seen the Wikipedia link jumping game at some point, starting on one page and trying to reach a completely unrelated target just by clicking links. It reveals something surprising: almost everything is connected. That game sparked the idea of TraceBack. We also noticed that history is almost always taught linearly, where event A caused event B, which caused event C. But history doesn't actually work that way. It's a branching, interconnected web of ideas, people, and consequences. We wanted to build something that made that web visible and turned the Wikipedia link game into something educational and meaningful for students.

What it does

TraceBack takes two historical topics and finds the hidden chain connecting them using Wikipedia as a repository of knowledge. Every Wikipedia page is a node, every link between pages is an edge. Using a bidirectional search, we find meaningful paths between any two ideas. An AI then narrates each step, explaining how one idea led to the next. Each edge in the path is tagged with a relationship type, classified as causal, temporal, thematic, or geographic, so students can see not just that two ideas connect, but how they connect. The result is an exciting, interactive map of how history actually works.

How we built it

TraceBack is built on a JavaScript backend with a REST API that handles graph traversal and classification logic. We scrape Wikipedia HTML directly to extract the relevant page links and build the graph in real time. Bidirectional BFS finds paths efficiently across thousands of nodes. The frontend gives users a clean interactive interface to explore connections and edge types. The AI narrative generation runs through a custom inference API, which takes the path nodes as context and produces step by step historical explanations. Each link is also passed to the model to classify the relationship type between the two connected pages.

Challenges we ran into

Wikipedia is enormous, so scraping relevant links from raw HTML was harder than expected. Disambiguation pages, redirect loops, and irrelevant cross links added significant noise to the graph. Getting the bidirectional BFS to terminate efficiently was a challenge. On the AI side, prompting the model to produce historically grounded narratives rather than generic summaries took significant iteration.

Accomplishments that we're proud of

We’re most proud of the dynamic node graph visual. Seeing a path from the Black Death to the Renaissance and understanding that the first step is causal while the last is thematic changes how you read the connection, and the node physics makes it fun to play around with the graph.

What we learned

We learned that history is full of connections that feel obvious in hindsight but are actually surprising when you trace them step by step.

What's next for TraceBack

The immediate priority is a more efficient search algorithm, since BFS across Wikipedia can get slow for distant topics. Beyond that, we want to expand the knowledge graph beyond Wikipedia and scrape the broader web, pulling from historical databases and encyclopedias to find more accurate connections. We also want to make the experience far more interactive by introducing a realtime chatbot layer, so students can ask follow up questions about any node or connection and have a conversation with the history they just uncovered. The goal is to turn TraceBack from a search tool into a full learning companion.

Built With

Share this project:

Updates