Inspiration

Ever since ChatGPT burst onto the scene, conversations have revolved around the idea that AI could eventually replace various professions. However, one crucial point often overlooked is that AI still lacks the holistic intelligence and grounded understanding that humans bring to the table. While models like GPT might excel at tasks such as performing well on standardized tests like the MCAT, these tests only measure a fraction of human intelligence and reasoning capabilities. This disconnect underscored the need for tools that don’t just generate responses but truly bridge the gap between AI’s raw computational power and the depth of human understanding.

As model parameters skyrocket exponentially, we’ve noticed that quality improvements are only marginal—sometimes as little as a 2 % increase in performance. This diminishing return made us realize that throwing more parameters at a problem isn’t the solution. Instead, we need systems that can explain their reasoning, align more closely with human thought processes, and offer ethical transparency. Whitebox was born from this insight: an AI that’s not just smarter, but more explainable, responsible, and ultimately more useful in the real world.

What It Does

Whitebox bridges the gap between AI and human reasoning by transforming credible texts, such as medical textbooks, into structured graphs. This transformation not only enhances the AI’s ability to understand and reason through complex material but also provides a level of explainability that traditional AI models often lack. By breaking the content down into interconnected nodes, Whitebox enables users to trace the AI’s thought process step by step, unlocking ethical applications in fields where transparency is critical.

In high-stakes environments like healthcare, law, and finance, the black-box nature of traditional AI models creates trust and ethical barriers. Whitebox addresses this by making the AI’s decision-making process fully visible and understandable, allowing users to see exactly how conclusions were reached. This transparency ensures that the AI can be confidently applied to areas where explainability and ethics are non-negotiable. By making information more accessible, actionable, and ethical, Whitebox is unlocking AI’s potential to transform industries that demand responsible, transparent technology.

How we built it

We developed Whitebox through a multi-layered approach designed to handle complex medical literature and extract meaningful insights. At the core of the system, we employed summary generation and title classification techniques, powered by a fine-tuned LLaMA 3.1:7B-uncensored model. This allowed us to efficiently process the monumental "Davidson's Principles and Practices of Medicine" and convert it into a highly structured graph database in Neo4j, consisting of approximately 130,000 relationships and 60,000 nodes.

To enable rapid and intelligent retrieval of information, we integrated MiniLM weights, which allowed us to run optimized Cypher queries against the graph. This provided a seamless way to extract key data points. For deeper analysis and processing, the retrieved data was then fed into Phi 3:3B-instruct, a smaller yet highly effective model, which enabled us to maintain robust reasoning and explanation capabilities without overwhelming computational resources.

By combining these layers—data generation, graph structuring, retrieval, and in-depth analysis—we built a powerful yet efficient AI platform capable of tackling massive datasets with precision, all while maintaining explainability and transparency.

Challenges we ran into

During development, we faced a multitude of challenges, many of which significantly impacted our workflow and timelines. One of the most frustrating setbacks involved having to recreate the database twice—an endeavor that took around 30 hours in total. This was due to the database clearing itself unexpectedly, forcing us to rebuild everything from scratch(apparently a known bug with the aura instance). Additionally, we encountered an issue where our Python environment became completely bricked at the root level, leading to a time-consuming restoration process that drained valuable development hours.

On top of these hurdles, we experienced severe internet connectivity problems that lasted for about 30 hours over the weekend, during what should have been our peak productivity time. To mitigate the downtime, we had to physically transport our main server from our base to the 1819 Innovation Hub at 11 p.m., where we set up shop and worked overnight in one of the conference rooms just to maintain our development pace.

Another major bottleneck was working with Neo4j, specifically in visualizing the graph database. Many of the graph visualization libraries compatible with Neo4j are either overly complex or poorly documented. The best solution we found was G6-AntV, but its documentation was primarily in Chinese, with only a rough English translation available. To work around this, we set up a Flask instance to process the data and send it to the frontend via APIs, allowing us to display the graph properly.

Moreover, Neo4j Aura—the free online instance we used—presented its own set of complications. Recently, it developed an issue where the dump file would not dockerize without an enterprise (paid) version. To resolve this, we had to pull the dump file from the Aura instance and convert it from a BLOCK file into a usable format using the open-source version of Neo4j, adding another layer of complexity to the setup.

Setting up the actual database to efficiently explain the thought process of the AI, allow for concept traversal, and perform legitimate stack traces took days of trial and error. The fine-tuning required to make the database both efficient and interpretable by the AI was a delicate balancing act, often leading to hours of testing and tweaking.

Finally, on Sunday night, while attempting to collaborate remotely, GitHub went down, causing significant delays in merging our changes. This outage led to a night full of frustrating merge conflicts, which we had to resolve once GitHub services were restored. Despite these setbacks, the team’s resilience and ability to overcome each of these issues played a crucial role in moving the project forward.

Accomplishments That We're Proud Of

Despite the numerous challenges we faced, we successfully developed the database, frontend, and backend components of Whitebox. While we encountered difficulties with dockerizing the entire system, we managed to create a functional setup where all components work as intended.

This hackathon was particularly significant because it was the first time we approached development the "right way"—without relying on shortcuts or temporary fixes. We couldn’t be more proud of delivering our initial project goals within a tight 10-day timeframe, all while balancing the demands of the first week of school.

Although the system is not yet a full-fledged, monetizable stack, we believe that we captured the essence of looking toward the "future of data." As large language models (LLMs) begin to hit their ceiling—where adding more parameters results in diminishing returns—we see the future in smaller, more specialized assistants. These assistants leverage real data rather than making educated guesses, facilitating more meaningful and productive conversations.

We believe that our solution addresses this emerging challenge with LLMs. Within our graph nodes, for any node exceeding a certain weight, we propose deploying a hyper-specialized agent. This agent feeds into a general "hollow model"—one that knows English but lacks specific knowledge—creating a system where specialized data drives real conversation. This approach bridges the gap between traditional LLMs and the need for specialized, data-driven dialogue systems.

Throughout the project, we gained invaluable insights into Neo4j and graph databases, as well as a deeper understanding of AI architecture. Working through the complexities of building a functional system from the ground up—while facing the various challenges we encountered—helped us sharpen our skills in database management, AI integration, and backend/frontend coordination.

We learned firsthand how crucial it is to approach development with proper planning and structure. This hackathon taught us the importance of doing things "the right way," and not taking shortcuts. By doing so, we gained a stronger grasp on the importance of scalable, maintainable code.

On the AI side, we explored the limits of large language models and recognized the growing necessity for specialized, data-driven assistants. We now have a deeper appreciation for how to balance the need for specialized agents within an AI system, ensuring they complement general models effectively.

Biggest Takeaway: Do not use Neo4j if you actually want to create an interactive visualization of your database. Holy!!! The frustration we experienced trying to make Neo4j work with visualizations was a major eye-opener. The lack of intuitive tools and clear documentation made it one of the toughest parts of the project. While Neo4j is powerful for database management, its visualization capabilities left much to be desired.

What's Next for Whitebox

The name Whitebox is no coincidence—it reflects our mission to tackle one of the most critical issues in AI today: explainability. Traditional AI models are often referred to as "black boxes" due to their lack of transparency. This prevents them from being used in high-stakes fields like healthcare, law, and finance, where ethical concerns and the inability to justify decisions are major roadblocks.

With Whitebox, we aim to crack open that black box. Our vision is to provide a transparent AI system that not only generates answers but also offers a clear stack trace of how those outputs were derived. By making the decision-making process fully visible, we can bridge the gap between AI's incredible potential and the trust required for its adoption in critical domains.

One of our next steps is to deploy specialized RAG (Retrieval-Augmented Generation) agents on nodes with higher weights in our graph database. This will allow Whitebox to deliver detailed, explainable answers backed by data, ensuring that users understand not just what the AI says but how it arrived at those conclusions.

Our goal is to build an AI platform that is both powerful and trustworthy—one that can be used confidently in areas where AI’s impact could be transformative, but is currently constrained by ethical concerns and a lack of transparency. Whitebox represents a crucial step forward, making AI not only more intelligent but also more accountable.

And we’re doing this without relying on massive models with 10 trillion parameters. Instead, we focus on efficiency, using targeted, specialized agents to deliver explainable, high-quality results. By emphasizing smart architecture over sheer scale, Whitebox proves that true innovation doesn't come from throwing more parameters at a problem, but from creating systems that are transparent, data-driven, and designed for trust.

Want to end off with a special thanks to everyone, a very high enegy and the people around were so supportive (cheers @wellnesssbritt lol! we were going to pass out from sleep before you pumped us up) Another thanks to Byron who also got us hyped up at the very end!!!!!

Built With

Share this project:

Updates