Ariadne

A quick visualization of a paper graph, with the user node positions shown in green.
Tech stack flow chart for Ariadne

There is A LOT of research out there. Looking for information that you want (or need) to read about is harder more than ever.

Ariadne is a graph neural network (GNN) based embedding system to assist with literature search. Our demo shows one use case of this system: a paper recommendation website that dynamically adjusts to the user's interests.

How we built it

Tech Stack:

React
Vite
Auth0
FastAPI
Hugging Face Transformers
PyTorch + Geometric

We use the OGBN-arXiv dataset, which provides a large graph with papers as nodes and citations as edges, along with basic embeddings based on the abstract of each paper. For a more semantically-aware embedding, we first fetch the abstracts of papers in the dataset from the arXiv API, which are re-embedded using a Hugging Face sentence embedding pipeline through Qwen 3 0.6B, then concatenated to the basic embeddings provided in the dataset.

Our GNN node embedder model is a function mapping a paper (the center node) and its surrounding nodes to a newly-predicted embedding vector for the center node. It consists mainly of SAGEConv with message-passing. It was trained using a contrastive loss that pulls the embeddings of linked nodes closer together.

The result is a lightweight model that predicts the embedding of a node based solely on the embeddings of its neighbourhood. We update all nodes in the graph with this new embedding scheme.

We then add a user node to the network. Every time the user clicks on the link to a paper, we connect the user node to that paper to indicate interest. We then use our GNN embedder model to assign and update an embedding that represents a user's current interest profile. A nearest-neighbour search is conducted on the vector database (FAISS) to suggest papers the user should read next.

(see image gallery for tech flow chart)

Challenges we ran into

This was our first time implementing machine learning in the graph modality. Fetching abstracts and keeping track of the identities of papers turned out to be a very complex task.

It was also difficult to put together and deploy the frontend and backend under these time constraints, since most of our time had been focused on understanding and modelling the data. We are glad that everything came together nicely by the end.

What's next?

For one, we would like to expand the scope of Ariadne to larger datasets, namely OGBN-papers100M, across more diverse fields of study. We would also create the detailed graph visualizations that we did not have the time to fully implement this weekend.

Finally, we see potential in the current Ariadne system to be adapted to similar literature management applications for readers and authors alike, such as cataloging new publications or suggesting related works.