Inspiration
After countless hours of staring at Quizlet sets for days on end, trying to painstakingly discern the difference between phrases like algún día and al día, my friends and I still couldn’t retain more than half a day of content after a week of lectures in the Cornell Spanish class SPAN 2090. However, the problem wasn’t with the class itself but rather the medium, as the traditional methods we were trained to utilize like flashcards or casual review starving information from brains that were meant to explore and navigate. Even the ancient Greeks knew this, utilizing the method of “loci” to memorize long speeches by mapping different ideas onto landmarks of a familiar path, like steps or stalls of an agora, making it easy to remember their speech by mentally retracing a route. So, if spatial reasoning truly was the key to a strong memory, what if we could use AI to build such a memory palace, a route we could retrace just like the Greeks to learn virtually any subject?
What it does
At its core, Loci is trying to turn studying into something a lot more memorable. Taking in source material through raw text or PDFs, Loci extracts the important concepts and turns those concepts into objects that both hold the core idea and the description the user would need to know about it. Loci then places those objects into a 3D environment the user can travel through, helping people remember abstract information through the memory palace technique, or the idea that people often remember better when information is attached to a place or any kind of visual metaphor. The 3D environment that the user walks through is generated with a world model, and the 3D objects are generated by a different AI model to look like a metaphor that represents its core idea, helping users memorize quicker and build a better memory palace.
How we built it
We built this project using Vite+, React for frontend, Mongo DB for databases, Auth0 for authorization, and TanStack Start (typescript) for the backend, which consists of server-side functions that take in raw text/pdf files, uses K2 Think V2 to identify the important concepts and their descriptions from that text, calls Gemini to assign object metaphors for each concept, and send a prompt describing how to construct that object to an open sourced diffusion model from HuggingFace (Microsoft TRELLIS) running on a cloud GPU. To generate the memory palace layout, we used Marble. We then placed the objects in random parts of the layout.
Tech Stack
TanStack Start for full-stack app shell and server functions React 19 for the UI TanStack Router and React Query for routing and asynchronous client state Three.js for the actual 3D scene Auth0 for authentication and actually handling the users during the session MongoDB Atlas for concept, room, and asset metadata Zod for runtime validation Gradio client for talking to the diffusion model endpoint hosted on RunPod AWS S3 SDK for uploading generated .glb files to R2/S3-compatible storage Pdf-parse to actually ingest PDF Vite + Vite Plus for dev/building tooling Vitest + Testing Library + jsdom for tests TypeScript for end-to-end
Challenges we ran into
The biggest challenge we faced was finding the right world model and prompting it correctly to create the room environment we wanted. Our first instinct was to use Hunyuan3D as our world model, as it’s super capable for its greater accessibility compared to other models, but a problem we ran into was Hunyuan being optimized for generating isolated 3D objects rather than entire environments to roam around in. When we tried to produce coherent spaces with walls, floors, depth, and furniture, the outputs were extremely confusing, with rooms collapsing into themselves. We decided to switch to the technology made by World Labs, which is built for the very purpose of spatial environments, which led us to learn how to use this model correctly. We had to tune our descriptions to make sure room boundaries were specified, placement of non-anchor objects was relatively placed with walls, and the space was still navigable with all the objects. Then, the next challenge came from rendering the output, as the environments built by this world model were Gaussian splatting output as .splat or .spc files. Since there were no vertices, faces, or UV maps with these files, rendering them required a specialized renderer built for Gaussian splatting. We tackled this with a two-layer approach, where our world model renders a background splat environment via a custom Three.js Gaussian splatting renderer while all the memory anchor objects load as standard .glb files that are composited on top, which preserves the photorealistic environment while maintaining the graph of objects between each other for interactions. Finally, one of the last significant challenges we faced was that since our world model output contains no explicit geometry, there’s no collision mesh to directly query from, so it was hard to place the objects inside the room without collisions. We fixed this by inferring the room’s navigable space through a grid of vertical raycasts downward across the scene’s bounding volume in order to approximate the heightmap from the floor, and then construct a 2D grid of occupancy to show the overall placeable surface area. This made object placement into a problem of spatial arrangement, where each .glb anchor’s bounding box is computed on load, so we had to sample positions that might work from the occupancy grid and check against the objects that have already been placed to prevent collision. Additionally, objects are scaled down if needed and the result is a room that doesn’t feel overcrowded.
Accomplishments
The technical achievement we were proudest of was ditching the third-party AI APIs when it came to generating our 3D objects from textual input that we would place in our room, instead spinning up our own on-demand cloud GPU instance via RunPod and using it to self-host our text-to-3D HuggingFace model directly. This meant that every inference call Loci made was hitting our endpoint, giving us full control over the model, its parameters, and its behavior. The advantages we gained from this over standard API integration were extremely significant, as there were no rate limits throttling our pipeline mid-session and no per-token costs skyrocketing when the size of our inputs scaled. We were able to control the exact model weights, tune our inference parameters freely, and swap the model without dependence on any third-party. This change also brought a lot of performance gains, causing our inference latency to drop when we didn’t have to share compute with other users, only contrfibuting to Loci’s user experience.
What we learned
The biggest lesson for us came from working hands-on with world models and understanding the kinds of environments they actually produce and the limitations they impose. With models like Marble from World Labs, it took a lot of experimentation to determine the precise prompting needed to generate a 3D point cloud that truly resembled the kind of explorable rooms we envisioned for our memory palaces. Also, after evaluating the leading world models for what we should use in our project, we learned that Marble would be especially effective because of its use of Gaussian splatting to represent scenes. Rather than relying solely on traditional geometry, Marble models a scene as a continuous field of 3D Gaussians, each holding its own opacity, color, and covariance. While this allows us to perform photorealistic rendering, it also means the model produces a radiance field rather than a clean, editable mesh. As a result, the environments were visually appealing but not actually structured for interaction or modification, requiring us to build those capabilities ourselves.
What's next for Loci
There’s a lot more in store for Loci, as we plan to extend this work by enabling procedurally generated environments of the real world aided by photos of specific areas at several different angles, so memory palaces can be dynamically constructed around any area of preference. While this could have been integrated with World Labs’ Marble, it takes quite a bit of performance and time to generate its environments, so we would likely have to tweak around and fine-tune a different world environment to be fast enough for a program like this. We also wanted to transform the Loci experience into one scene containing multiple playable rooms, with each room containing a memory palace for a specific subject, so the user can collect their studying topics into different areas.
Built With
- amazon-web-services
- auth0
- claude
- cloudflare
- codex
- gemini
- huggingface
- k2
- mongodb
- react
- tanstack
- three.js
- typescript
- vite
- zod
Log in or sign up for Devpost to join the conversation.