DyslexicAssist: Turn Any Page Into Your Page


Inspiration

DyslexicAssist grew from seeing bright students lose time, confidence, and independence at the moment of need—staring at a printed worksheet, a dense paragraph, or a menu they couldn’t easily decode or adapt. Late or missed identification (broad dyslexic traits ≈15–20%, strict diagnoses ≈3–7%) leaves a large population under-supported. I wanted a single tool that removes three recurring barriers simultaneously:

  1. Format Access – physical print vs. customizable screen.
  2. Visual Crowding – tightly packed letters/lines that slow decoding.
  3. Linguistic Complexity – sentences and vocabulary that overload working memory.

Instead of forcing users to juggle multiple apps (scanner → reader → TTS → simplifier), DyslexicAssist fuses them into one continuous capture-to-comprehension flow.


What I Learned

Product & Accessibility Design

  • Evidence matters: increased letter spacing can reduce visual crowding for some readers—so flexibility > prescribing a single “dyslexia font.”
  • Minimal, consistent navigation (large feature tiles + bottom bar) reduces search time and cognitive load.
  • Providing optional color/phoneme cues (with non-color fallback like underline/bold) respects diverse perceptual preferences and accessibility guidelines.

Engineering & Architecture

  • Component isolation in React Native + Expo sped iteration: Scanner, Adaptive Reader, TTS Karaoke, and Summarizer each evolved independently but compose cleanly.
  • Lightweight personalization (multi-armed bandit over discrete layout settings gated by comprehension) is a pragmatic early alternative to heavy ML.
  • Retrieval-Augmented Generation (RAG) significantly improves trust—grounded prompts reduced hallucination risk versus naive summarization.

Process & Iteration

  • Early scope trimming (e.g., static AR snapshot before continuous tracking) preserved reliability.
  • Instrumenting words-correct-per-minute (WCPM) and comprehension first gave objective signals to judge whether UI tweaks actually helped.
  • Rapid cycles: implement → micro user test → measure WCPM / replay counts → adjust; far more productive than speculative polishing.

How I Built It

Tech Stack Overview

  • Frontend: React Native + Expo (cross-platform, hot reload).
  • Vision Layer: Google Vision Document Text Detection → structured blocks/paragraphs.
  • Post-Processing: Perspective (homography) correction, hyphen merge, line/paragraph reconstruction, canonical text string.
  • Adaptive Layout Engine: Applies user-selected font (e.g., Lexend), letter & line spacing, background tint; computes new line wraps.
  • Smart Scanner (AR Re-flow): Captures image → OCR → reflow → overlays accessible text back onto the page region.
  • TTS Karaoke: TTS audio + word timing metadata → synchronized word (and optional phoneme) highlighting; tap-to-repeat & adjustable speed.
  • Simplify & Summarize: Retrieval chunking (semantic embeddings) feeds a grounded prompt to OpenAI gpt-4o-mini to produce grade-level rewrite + concise bullet summary; entity/number diff check for fidelity.
  • Personalization Metrics: Session WCPM, comprehension quiz score, replay counts, setting combination ID → stored locally; bandit recommends better setting bundles over time.
  • Storage: Local async store caches user settings, recent scans, simplified variants for offline continuity.

Key Data Flow
Camera Capture → OCR & Cleanup → Canonical Text → (a) Adaptive Reader (live metrics) (b) Summarizer (RAG) (c) TTS Alignment → Unified Output / Dashboard


Challenges I Faced

Challenge Impact Risk Resolution / Lesson
Debated “one magic dyslexia font” myth Could mislead design & limit personalization Implemented adjustable parameter set (font, spacing, tint) + data-driven recommendations.
OCR variability (lighting, perspective, hyphenation) Fragmented text → poor summaries & timing Added homography correction, artifact cleanup, confidence gating, and line reconstruction.
Latency (OCR + LLM) in a single flow User drop-off during “processing” Parallelized preprocessing; showed progressive status; cached previous layout & settings.
Avoiding hallucinated simplifications Inaccurate assistance undermines trust Adopted RAG grounding + entity/number diff check + low-temperature prompts.
Balancing speed vs comprehension Risk of optimizing for speed alone Coupled WCPM logging with a post-read comprehension quiz; bandit discards high-speed / low-comprehension arms.
Feature Creep (continuous AR tracking, gaze metrics) Diluted polish within 1-week window Enforced MVP: static snapshot AR first; backlog future enhancements.
Highlight Accessibility (color dependence) Excluding color-blind or contrast-sensitive users Dual signals (color + underline/bold); optional phoneme coloring toggle.
Cost & Offline Concerns Reliance on cloud APIs for core flows Designed modular abstraction to later swap in on-device OCR / distilled summarizer models.

Built With

Share this project:

Updates