Inspiration

When doing research, either a thesis, a seminar, or any other research, one of the first things a person does is go on Google Scholar and look for publications, read through their title and abstract and decide whether they are relevant or not. Also, when a person sets up their filter, more often than not there is going to be thousands of publication as result, and screening them all is a very long and tiring task and it would be beneficial to have a better way to do this.

What it does

Since there could be thousands of publication, we can't just scrape them all and paste them into LLM's context, we need a different approach. We built TreeLit, a system that, given some arXiv filters, loads thousands of papers and builds a tree. Each leaf is one publication and inner nodes can be seen as cluster-centroids that are the summaries of the child-nodes of the tree. Inner nodes (clusters) are generated using k-means clustering for each tree-level. This tree is then returned to the user and they can explore their research topic in a more structured way, having sub-topics separated as sub-trees. For more visual-oriented people, the system also generates a scatter plot of all publication while being able to select different levels of the tree and see the clustering process. User can see the results in a nice UI built with Flutter, which also lets then explore their topic in Web as well as on iOS and Android.

After the tree has been built, the user has a chat where they can ask questions (using both text and speech) about the whole tree. This is a fast process since the system doesn't have to retrieve all relevant published papers, but it only retrieves summaries from various levels of the tree and generates the response for the user. For each answer that the AI agent returns, it also returns a list of summaries that it used to generate the response, making the system more trustworthy.

How we built it

  • For the UI/UX, we used a new and rising multi-platform framework Flutter. This made building for 3 platforms (Web, iOS, and Android) at the same time straightforward.
  • For the backend functionalities, we created a dockerized FastAPI server running on our local machine while being exposed to public using ngrok. This structure made the separation of concerns very clear.

Frontend:

  • When the user opens the app, they should type in their topic and wanted date range. When they submit the topic, the KeyBERT model in the backend will expand the user query with additional keywords, and together with the arXiv categories are returned to the user for validation. User can also see, for the given filters, how many publications can be returned.
  • After the user confirms the filter, then the tree building process starts. The progress of the process is streamed to the frontend so the user is aware on what step it is currently on.
  • After the tree has been built, it is returned to the frontend and it is displayed in the UI, together with the scatter plot of all clustered publications. User can navigate through the tree in a UX-friendly way, seeing the summaries of each inner-node and titles or each publication together with their link. Also, they can hover over scatter plot and see the publication (and even click on their link).
  • The chat interface is also shown allowing user to ask questions while both typing or sending a voice message which will be transcribed using a small english VOSK model.
  • When the AI agent returns a response, the most relevant summaries are shown, and the user can look for them in the tree using the search field and validate the agent's response.

Tree building:

  • For retrieving publications from arXiv, we used OAI-PMH protocol, recommended by arXiv API documentation.
  • Embeddings of the title+abstract of each publication and each summary are generated using MiniLM-L6-v2 model, and are stored in a Chroma local database for quick later search
  • Summaries were initially generated using PRIMERA model, but since we were running it on our laptop, it was very slow. Then, we switched to GPT-4o-mini and it produced better results while being faster.
  • Clustering of each tree-level is done using k-means clustering. The hyperparameter k for each level is determined heuristically, and should be improved in the future (or switch to using some hierarchical clustering methods)

Runtime:

  • Each query is embedded using MiniLM-L6-v2 and a cosine similarity is computed between the query-embedding and all inner-node summary embeddings. Top candidates from each level are taken and given as context to a GPT-4o-mini model
  • Result from GPT together with the given summaries are returned to the frontend and showed to the user.

Challenges we ran into

  • Clustering method: k-means is easy and fast, but the problem is that it is difficult to determine hyperparameter k automatically for any number of papers and without knowing the papers. Tuning is possible but it slows down the system.
  • GPU + Docker: had many issues with exposing my GPU to the docker container but managed to make it work and still, using PRIMERA(https://huggingface.co/allenai/PRIMERA) model was very slow so it was a waste of time essentially.
  • Process of tree building is very slow: for ~200 papers it is around 5 minutes, and since we are paying for GPT compute, we didn't want to go to more papers during development. We also cached the response so it is not always computed over and over.
  • Sharing data between chat view and tree view, and when user clicks in the chat on a reference, that it goes to the location in the tree.

Accomplishments that we're proud of

  • Made a system that works from the beginning to the end
  • Communication within the team was excellent
  • Having a clear system with clear separation of concerns
  • Putting some focus on UI/UX and not just agent architecture, it really feels like an app that we would use (Savo will for his thesis :))
  • Attempting to use low-compute models and not just using GPT-5 for everything.

What we learned

  • Connecting Python and AI with Flutter: we've never done this tech stack for AI agents, so that was quite interesting and useful
  • Spending some time in the beginning for requirements really saves us a lot of time in the future with team communication (the last hackathon was a bit messier)
  • How to expose a GPU to a Docker container
  • Found new models that we haven't seen before
  • New widgets in Flutter not used before: for scatter plot, chat, and tree view

What's next for TreeLit

  • Explore more about different clustering methods and how to finetune them
  • Improve the tree building execution speed (parallelize calls to the transformer)
  • User-specific database (login and registration): currently only test user
  • When the tree is built, some statements from all papers (with references) can be returned so the user has something to start with and explore
  • Pull more data into the tree and remove publications (or sub-trees) from the tree
  • There are also hyperparameters like how long the summaries should be. Longer summaries mean slower tree-building and slower runtime, but more detailed answers. Should explore this area and find a nice balance

Built With

Share this project:

Updates