The Why Behind Signify

Over 50% of Deaf students reach age 18 with reading comprehension at or below a fourth-grade level. For many Deaf students, this is not about intelligence. It is about access. Most classrooms are still built around spoken and written English, even though sign language is often a Deaf student’s first language.

Live captions do not fully solve this problem. Captions present fast, academic English text and require students to continuously translate a second language while also following slides, visuals, and classroom discussion. They also remove key linguistic information that exists in sign languages, such as spatial structure and non-manual cues (facial expression and emphasis), which are essential for meaning. As a result, captions provide access to words, but not to language in the way Deaf students naturally process information.

At the policy level, this gap has also been shaped by legislation. In Board of Education v. Rowley, the case of Amy Rowley, the Supreme Court of the United States ruled that schools are only required to provide “adequate” access, not access that allows Deaf students to fully reach their potential. That standard still influences what support students receive today.

After researching existing accessibility and sign language tools, we found that most systems focus on text-to-sign or pre-recorded content, rather than live classroom instruction. We built Signify to address this gap. Signify delivers live classroom speech directly into ASL as the lesson is happening, so Deaf students can follow instruction in real time and in their own language.

What it does

Signify is an automated AI English to ASL converter that takes English from a Zoom recording (integrated as an app in Zoom) and converts it into ASL. It also has various modes, the most powerful being a podcast mode that converts a PDF, slideshow, or presentation into a mini natural podcast in ASL between two AI agents (developed via Open AI's API) in ASL that can help users not only understand the content better, but also have more fun learning it.

We used Zoom to build the app because we designed it so that both the instructor and the student can join the same live session and access Signify simultaneously, which lets our system capture the lecture audio in real time and deliver synchronized ASL output directly inside the classroom workflow without requiring any additional software or setup.

How we built it

We split development into four parallel tracks: frontend & UI/UX, backend and API orchestration, system integration, and the ML / language pipeline.

Our system is implemented as an embedded app inside Zoom, which captures live classroom audio and supports optional lecture file uploads (PDFs and slide exports). Audio streams and uploaded assets are forwarded to our backend services hosted on Render.

On the backend, we perform document and content preprocessing using PDFParse and JSZip to extract and normalize structured lecture text. For live lectures, speech is first transcribed and aligned with the slide and document content to preserve instructional structure and turn boundaries.

The cleaned and segmented lecture content is then passed through OpenAI APIs (using GPT-4o-mini) with a custom, ASL-optimized prompting pipeline. A major technical challenge was designing a transformation layer that converts standard English instructional material into ASL Gloss, a structured intermediate representation that reflects ASL word order and discourse structure rather than English syntax. This restructuring step is critical for accessibility and required iterative prompt engineering and constraint-based formatting to ensure consistency across dialogue turns.

Each finalized ASL-Gloss segment is then dispatched to the Sign.mt generation API, which returns an ASL avatar video for that segment.

Finally, the in-browser Zoom app synchronizes the original lecture video and the generated ASL avatar stream. The client supports low-latency segment playback, buffering, rewind, and speed control, allowing Deaf students to view the instructor and the ASL rendering side-by-side while the lecture is still in progress.

Challenges we ran into

Permission issues!! We needed access to the Zoom RTMS (but the Zoom team came in clutch). After figuring out various security issues, a major hurdle we had to go over was configuring iframes. In the end, we refactored an entire open source Github into a specific API that we needed and plugged into our application. Executing and optimizing the real-time 3D ASL avatar rendering pipeline was a major technical challenge. We addressed this by refactoring the rendering and networking layers to run asynchronously, parallelizing video generation and delivery, and introducing lightweight buffering to reduce latency, allowing the 3D ASL visualizations to stream smoothly during live sessions.

Shoutout (to the goats)

Thank you to Zoom, Render, and OpenAI for giving us credits. Thank you especially to the Render/Zoom team for helping us debug and giving us feedback.

What we learned

  • Facts about inaccessible educational systems that are shocking and not very well known -- especially since we assumed d/Deaf people could just "use live captions to read / understand"
  • Working collaboratively in a highly fast paced environment with people we have never worked with before
    • Juggling many different parts of an application at once (ASL translation server, backend server, frontend zoom app, zoom RTMS, websockets, etc.)
    • Deploying code on cloud platforms like render

What's next for Signify

  • Getting feedback from deaf students, measuring improvement in learning, improving based on findings
  • Polishing app UI and getting the 3D visualization
  • Deploying it across educational institutions (starting at highschools in Urbana/Austin Area)
  • Integrating it with more conferencing platforms (e.g. Microsoft Teams)

Built With

Share this project:

Updates