PAP3R

Inspiration

Academic collaboration is broken. Researchers spend months trying to find the right co-authors through conferences, cold emails, and word of mouth. We wanted to fix that by building something that understands your research and finds the right minds instantly.

What it does

PAP3R is an AI-powered co-author discovery platform. You describe your research in plain text, and PAP3R returns a ranked list of scholars whose work aligns with yours using a composite similarity score combining semantic embeddings, topic overlap, and citation graph analysis. From there you can handpick scholars into a collaboration session, chat with an LLM that has context about all their papers, generate concrete project ideas, and draft outreach emails automatically. Every interaction updates a live knowledge graph showing how your research connects to topics, scholars, and institutions.

How we built it

We built a FastAPI backend with five core systems working together. Actian VectorAI DB stores 500 scholar profiles and 1,393 paper embeddings as 384-dimensional vectors using the MiniLM model, and handles fast approximate nearest neighbor search with geo filtering. The similarity engine scores candidates using a weighted composite of cosine similarity (60%), Jaccard topic overlap (20%), and bibliographic coupling (20%). Supermemory manages all long-term context including per-user memory, multi-scholar chat sessions, and the knowledge graph. Modal hosts Qwen3-4B on serverless GPUs for LLM inference including match explanations, RAG over scholar papers, and project idea generation. Gemini 2.0 Flash generates personalized collaboration emails. The frontend is React and Vite deployed on Aedify with GitHub auto-deploy.

Challenges we ran into

Getting Actian VectorAI DB running on Apple Silicon M4 was the first major blocker. The Docker image is amd64-only and requires Rosetta emulation through Docker VMM, not Apple Virtualization Framework, which took significant debugging to figure out. The search API returned payloads as None by default, requiring a separate get() call for each result. OpenAlex's author search endpoint returned very few results with topic-based filters, so we switched to citation-count-based fetching to reliably ingest 500 scholars. Wiring Supermemory as a context proxy between FastAPI and the Modal LLM endpoint required careful session ID management to prevent context bleed between users.

Accomplishments that we're proud of

We built a fully functional end-to-end pipeline from raw OpenAlex data to semantic search to LLM-powered collaboration in under 36 hours. The similarity engine returns genuinely relevant scholars like Christopher Manning, Yann LeCun, and Michael Jordan for NLP queries without any manual curation. The hybrid memory architecture combining in-memory sliding windows for short-term context with Supermemory for long-term recall gives the chat system real persistence across sessions. The email generation feature produces contextually relevant outreach drafts tailored to each scholar's specific research topics.

What we learned

Vector databases are fundamentally different from relational databases and require rethinking schema design entirely. Payloads are flexible dicts, relationships are handled in application code, and similarity is a query-time operation not a stored property. Supermemory's proxy architecture is a powerful pattern for adding memory to any LLM pipeline without changing the underlying model. Multi-agent coordination across four people on a single codebase requires strict folder ownership boundaries to avoid merge conflicts, especially on shared files like main.py.

What's next for PAP3R

Expanding to 10,000 scholars via continuous OpenAlex ingestion, adding NSF and NIH grant discovery alongside co-author recommendations, building ORCID OAuth for live profile sync, a conference recommender showing who to talk to at NeurIPS or EMNLP based on your research, and a team formation optimizer for matching research triads and quads rather than just pairs.