RAlign

The Story of RAlign: Fixing the Research Handshake

Inspiration

It started with a simple but frustrating reality that many students experience.

Every semester, thousands of students say, “I want to do research.”
At the same time, professors say, “I need capable research assistants.”

Yet somehow, these two groups rarely connect effectively.

Students send cold emails with little guidance. Professors receive hundreds of messages that often miss the mark in terms of preparation, experience, or alignment with their research. The result is a broken handshake between motivated students and researchers who genuinely need help.

We asked ourselves a simple question:

What if we could turn “I want to do research” into “I’m ready for your lab”?

That question became RAlign.

RAlign helps students discover where they truly fit in the research ecosystem while giving professors a clearer signal of which students are prepared to contribute.

What We Learned

Building RAlign taught us several important lessons about AI, systems design, and collaboration.

Data Integration

Real-world data rarely arrives in perfect formats. Working with the Nebula API required careful data transformation, validation, and normalization before it could be used in our system.

Interpretable Matching

We discovered that matching students to research opportunities cannot be reduced to a single score. Instead of using a black-box model, we built a multi-factor matching system that remains transparent and explainable.

AI as an Assistant

Rather than replacing the student, we designed AI to coach and support them. Tools like Gemini Live help students practice interviews and craft thoughtful outreach messages tailored to a professor’s research.

Collaborative Development

Building a full-stack system in a short time required strong team coordination. We relied on shared APIs, modular components, and disciplined version control to ensure different parts of the system worked together seamlessly.

How We Built It

RAlign is designed as a modular full-stack system with several interconnected components.

Backend

Built with FastAPI, the backend handles data ingestion, processing, and the research matching logic.

Frontend

The main user interface is built with React, allowing students to explore research opportunities, view compatibility insights, and navigate the platform easily.

Interview Preparation

We built a Next.js application powered by Gemini Live that simulates research interviews. Students can practice answering questions and receive real-time feedback.

Cold Email Generation

A dedicated microservice generates personalized outreach emails, helping students communicate with professors in a more thoughtful and research-aware way.

The Core Matching Engine

Instead of opaque AI rankings, RAlign uses a transparent weighted scoring model.

Our compatibility score is calculated as:

$$ \text{Score} = w_{skill} \cdot S_{skill} + w_{research} \cdot S_{research} + w_{experience} \cdot S_{experience} + w_{background} \cdot S_{background} $$

Where each factor represents a different dimension of compatibility:

Skill Match: overlap between student skills and lab requirements
Research Interest Match: alignment between student interests and professor focus
Experience Alignment: relevant prior projects or coursework
Academic Fit: background preparation for the lab's research area

This approach ensures that students understand why a lab is recommended, not just that it was recommended.

Challenges We Faced

Building RAlign came with several technical and design challenges.

Data Normalization

The Nebula API provided valuable data, but its structure often differed from our internal schema. We had to design a normalization layer to ensure consistent processing.

Fuzzy Skill Matching

Students and professors often describe similar skills using different terminology. To address this, we implemented skill clustering, allowing our system to detect conceptual overlaps even when keywords differ.

Interpretability vs. Complexity

While more complex models could potentially improve prediction accuracy, they would reduce transparency. We intentionally prioritized interpretable models so users can understand the reasoning behind matches.

Latency and User Experience

Real-time interview simulations require fast responses. We optimized API calls and response flows to maintain a smooth, conversational experience.