The on-prem agentic operating layer for the emergency department. LA Hacks 2026 · UCLA · Track: Catalyst for Care
Five specialist AI agents on a $4K box in the hospital, plus a patient-side companion in your pocket. Records never leave the building. Records never leave the patient's body. Every claim has a receipt. Every receipt is doctor-correctable.
US emergency departments are the most chaotic, highest-stakes coordination environments in healthcare — and the numbers aren't opinion:
| Metric | Reality |
|---|---|
| Doctor time on documentation | ~2 hours of every 4-hour shift |
| Median ED length of stay | 4–6 hours; 12+ at high-volume hospitals |
| ED boarding (waiting for an inpatient bed) | 36% of admissions board >4 hours |
| Documentation-related burnout in EM physicians | Highest of any specialty |
| Annual US healthcare spend on coordination failures | ~$200B |
| Annual US prior-auth waste | ~$25B |
These aren't separate problems. They're the same problem from different angles: the ED has rich data and brilliant clinicians, but the data doesn't reach the clinicians at the moment they need it.
Existing "ambient AI" scribes (Abridge, Suki, Nuance DAX) all ship encounter audio to AWS — unusable for hospitals that legally or culturally cannot send PHI to the cloud. Epic and Cerner ship AI on a 7-year roadmap. Until 2025, sustained 70B-class inference cost $30K+ in GPUs. The ASUS Ascent GX10 at $4K is genuinely new hardware economics, and it opened the window for ATLAS.
ATLAS targets four ED pain points simultaneously, on one substrate:
- Documentation burden. Scribe writes the H&P note from live encounter audio, structured and ready to sign — saving ~2 hours of every 4-hour shift. Burnout reduction is real and measurable; staff retention ROI is the line CIOs care about.
- Inter-department coordination. Quartermaster watches imaging, labs, and pharmacy queues, predicts bottlenecks from historical wait times, and pings the right department before a patient is stuck waiting.
- Discharge prior auth. Advocate builds the PA packet during the visit so the patient leaves the ED with it already submitted. Standard wait of 18 days collapses to ~6 hours. $25B/year of industry waste, addressed visit-by-visit.
- Handoff quality. Conductor turns the full encounter context into a structured SBAR handoff at shift change — eliminating the verbal-handoff patient-safety hazard.
And underneath all four: PHI never leaves the building. Cloud AI products ship your body to Amazon's servers. ATLAS runs entirely on a $4K box on the hospital floor, with zero outbound calls during inference. The same engine extends to ICU coordination, OR scheduling, and ambulatory clinic — the ED is the slice that makes the demo undeniable.
On the patient side, the Companion app mirrors this guarantee: encounter audio, paper records, and insurance documents never leave the patient's phone. The hospital and the patient hold parallel ledgers of the same encounter, and neither ledger ever crosses the firewall to the cloud.
The GX10 is the heart of ATLAS. 128 GB unified memory holds Qwen3-32B (heavy reasoning), Qwen3-8B (high-frequency coordination), Whisper-large-v3 (STT), and Qwen3-Embedding-8B simultaneously, with petaflop FP16/FP8 enabling 5 concurrent agent streams.
Why on-prem matters — the HIPAA story:
- HIPAA's Privacy Rule and Security Rule treat encounter audio, vitals, chart text, lab values, and imaging as Protected Health Information (PHI). Every cloud transmission is a Business Associate Agreement, an audit surface, and a breach-disclosure obligation.
- ATLAS runs entirely on the GX10 on the hospital LAN. Zero outbound calls during inference. No BAAs to negotiate, no audit surface to defend, no cloud breach pathway.
- State-level legislation banning cloud transmission of certain PHI categories is proliferating in 2026; on-prem is the only future-proof posture for many systems.
- Network-resilient by design: ED internet is famously flaky, and an ambulance bay can't depend on a working uplink. ATLAS doesn't.
Inference runtime: vLLM primary (excellent Qwen3 + Blackwell ARM CUDA
support), Ollama backup for bulletproof demo recovery. The orchestrator
abstracts model backends behind a single ATLAS_LLM_BACKEND flag —
vllm, ollama, or mock — with zero code or prompt changes between them.
We use Fetch.ai's stack at two layers, satisfying the full challenge spec:
Layer 1 — ATLAS via Agentverse uAgent + OmegaClaw Telegram bot (PHI-side):
- One uAgent registered:
atlas-companion - OmegaClaw is the attending physician's chat front-end via Telegram
- Attending texts: "Bed 4 status?" → ASI:One reasons over the intent →
routes to the
atlas-companionuAgent → uAgent calls our internal orchestrator on the GX10 → returns a synthesized, PHI-redacted status line - PHI never touches ASI:One. ASI:One only sees the routing intent and the natural-language response after our local agents have prepared it. Patient identifiers, vitals, and chart text stay on-prem.
Layer 2 — Pulse via ASI:One direct (cloud literature side, no PHI):
- Four specialist Agentverse uAgents:
pubmed-fetcher,fda-alerts,guidelines-watch,differential-educator - ASI:One reasons over the doctor's literature query → routes to the right specialist → returns curated answer with citations
- Zero patient data ever flows here — strict architectural firewall (separate routes, separate sessions, separate auth scopes, no PHI fields in the request schema)
Registered Agentverse profiles:
The Companion app is a Kotlin Android app powered by the ZETIC Melange SDK, which dispatches inference automatically to NPU hardware (Qualcomm HTP/DSP, Google Tensor, MediaTek APU) without per-vendor code.
| Role | Model | Notes |
|---|---|---|
| Summarization + document extraction + insurance chat | Gemma 3 4B Instruct (Melange-supported) | All three text-generation features |
| LLM fallback | LiquidAI LFM2.5 1.2B Instruct (Melange-supported) | Drop-in if Gemma 3 4B can't sustain TTFT |
| STT (encounter audio) | Whisper-tiny (ONNX) | Multilingual ambient encounter audio |
| Embeddings (RAG) | all-MiniLM-L6-v2 (ONNX) | 384-dim, ~25 MB |
| OCR | Google ML Kit | Native Android, on-device |
| Triage acuity classifier | Distilled BERT-class (ONNX) | Sub-second on-device |
| Layer | Tech |
|---|---|
| Orchestrator + agents | FastAPI + Python, premise/evidence-binding contract enforced at the orchestrator |
| Database | Postgres + TimescaleDB + pgvector (encounters, orders, results, PA policies, agent_traces audit log, clinical taxonomy) |
| Speech-to-text | Whisper-large-v3 on the GX10 |
| Heavy reasoning agents | Qwen3-32B, FP16 (Scribe note structuring, Advocate, Conductor, Reality Check) |
| High-frequency agent | Qwen3-8B, FP16 (Quartermaster) |
| Web dashboard | React + Vite + shadcn/ui ("Show Our Work" UI) |
| Mobile companion | Kotlin + Jetpack Compose + ZETIC Melange |
| Cloud literature (Pulse) | Fetch.ai ASI:One + Agentverse uAgents |
All clinical grounding (HPO, ICD-10-CM, CPT, RxNorm, LOINC, curated PA policies) is loaded locally — no external API calls during inference.
ATLAS isn't a single chatbot wearing a healthcare wig. It's five specialist agents that genuinely cooperate, each scoped to a real ED workflow.
Whisper-large-v3 streams the encounter audio; Qwen3-32B (non-thinking mode for speed) structures it into a SOAP note, extracts orders, drafts Rx, and emits structured facts. Every fact must reference a transcript timestamp — hallucinations are caught by the contract, not by hope.
Qwen3-8B watches the order/results event bus, cross-references department capacity and historical wait times, and surfaces predicted bottlenecks before the patient is stuck. "Last 3 chest-pain CT-PEs at this hour waited >60min — proactively escalating to radiology."
Qwen3-32B in thinking mode matches chart values to insurer policy criteria (BCBS / Aetna / Cigna / UHC), assembles the PA packet, computes an approval probability, and exposes the full premise chain. The doctor can click any premise, mark it "incorrect," and Advocate regenerates live in under 3 seconds. Stops "patient leaves the ED, waits 18 days for the pharmacy" from happening.
Qwen3-32B turns the full encounter context into a structured SBAR handoff at shift change or admission, plus discharge instructions and follow-up orders. The verbal-handoff safety hazard goes away.
Qwen3-32B in thinking mode reviews every output of the other four agents, flags hallucinations, surfaces missing evidence, catches contradictions, and overrides confidence scores. If Scribe's diagnosis disagrees with Advocate's policy match, Reality Check pauses the chain and asks the human. This is the agent compliance teams care about.
Every claim from every agent is a JSON object with mandatory
evidence_refs, policy_refs, premise_chain, and confidence. The
orchestrator rejects any output missing them. The dashboard surfaces
all of it: 📋 Evidence · 🧠 Premise chain · ❓ What we don't know ·
🚩 Reality Check flags · ⏮️ Replay · ✏️ Correct premise. Compliance gets
an audit trail. Joint Commission gets reproducibility. Doctors get the
ability to challenge any output and watch it correct itself live.
Pulse is a separate product surface that lives next to ATLAS but never touches patient data. It's the doctor's literature companion, powered by Fetch.ai's ASI:One.
A dedicated tab in the doctor's web dashboard, plus a standalone Telegram
bot. The chat UI looks like a clinical-grade ChatGPT — but every answer
arrives with explicit citations from PubMed / FDA / published guidelines,
and the network tab shows traffic on /pulse/*, never /api/*. The
channel separation is the firewall's UX surface.
- PubMed search with structured summary — "Summarize the latest CGRP
biologic guidelines for cluster headache (2025)" routes via ASI:One to
the
pubmed-fetcherAgentverse uAgent - FDA drug recall + safety alert feed —
fda-alertsagent surfaces active recalls and black-box warnings relevant to a query - "What's new this week" digests —
guidelines-watchcurates ACC, AHA, NEJM, Lancet updates by specialty - Hypothetical differential education —
differential-educatoranswers "what's on the differential for X presentation in Y demographic" in general terms, never tied to a real patient - Drug interaction explainer — built on public pharmacology, not patient meds
- Never receives patient identifiers, vitals, lab values, or chart text
- Never queries about specific patients ("the patient in bed 4")
- Never shares session state, cookies, or auth scopes with ATLAS
- The Pulse request schema has no PHI fields — even a malicious caller cannot smuggle PHI in
- ATLAS runs on localhost; Pulse is the only outbound surface
If a doctor asks a "patient-flavored" question, the UI nudges them to re-phrase it as a hypothetical: "Pulse can't see your patient. Want to ask about general management of [condition]?" Two-tier privacy (PHI on-prem, public literature in the cloud) is exactly how hospital compliance teams think about clinical AI.
The Kotlin Android Companion is the patient-side dual of ATLAS Advocate: where Advocate handles prior auth for the doctor at discharge, the Companion app helps the patient navigate everything that lands on their side of that same insurance flow. Two main features anchor the demo.
Patient taps record at the start of the visit (with explicit on-device consent prompt). The full pipeline runs locally:
- Whisper-tiny streams encounter audio to a transcript on the phone
- The on-device LLM (Gemma 3 4B Instruct via ZETIC Melange) produces a
structured
patient_summaryJSON: chief complaint, plain-English summary, diagnoses, medications with dose + instructions, follow-ups - The summary view renders for the patient — "You have cluster headache. Start Emgality 240mg starter pack. Follow up with neurology in 7 days."
- Audio never leaves the phone. Even ATLAS doesn't see the patient-side recording. The hospital ledger and the patient ledger are intentionally parallel.
Patient chats in plain English: "Is Emgality covered? What's my copay? What does this denial letter mean?" The pipeline:
- The patient's Summary of Benefits and Coverage (SBC), Explanations of Benefits (EOB), denial letters, and current encounter summary are chunked and embedded with all-MiniLM-L6-v2 (384-dim ONNX)
- Query embedding → brute-force cosine retrieval over a Kotlin list of ~hundreds of chunks (microseconds; no ChromaDB — see §10.7 of the System Overview for why ChromaDB doesn't fit on Android)
- Top-k chunks → on-device LLM produces an answer with mandatory chunk citations inline
- Citation-or-refuse contract: if the retrieved chunks don't actually support the answer, the assistant refuses rather than fabricates — "I can't find that in your documents — call your insurer."
- Every reply ends with a fixed disclaimer: "Educational only; verify with your insurer."
The Companion app also includes a document vault (camera + ML Kit OCR for paper reports, structured field extraction by the on-device LLM) and a light intake screen (distilled BERT acuity classifier; pings the GX10 with a minimal intake bundle so ATLAS knows the patient is en route).
backend/ FastAPI orchestrator + 5 agents (Scribe / Quartermaster / Advocate / Conductor / Reality Check)
backend/db Postgres + TimescaleDB + pgvector schema
backend/pulse Fetch.ai ASI:One literature service (PHI-firewalled)
backend/scripts seed, prewarm, smoke, record_replay, eval_rc
web/ React + shadcn ED dashboard ("Show Our Work" UI)
mobile/ Kotlin Patient Companion app (ZETIC Melange on-device)
# 0. db
docker compose up -d db
# 1. backend
cd backend
pip install -r requirements.txt
cp .env.example .env # then fill ASI_ONE_API_KEY, AGENTVERSE_API_KEY, TELEGRAM_BOT_TOKEN
python scripts/seed.py
# 2. orchestrator on :8000 (Vite dev proxy maps /api → :8000)
ATLAS_LLM_BACKEND=ollama uvicorn orchestrator:app --reload --port 8000
# 3. Pulse on :8002 (separate process — that IS the firewall surface)
# Pulse uses ASI:One; ATLAS_LLM_BACKEND is ignored here.
uvicorn pulse.server:app --reload --port 8002
# 4. web on :5173
cd ../web && npm i && npm run devATLAS_LLM_BACKEND selects the runtime: vllm (production, GX10),
ollama (laptop dev / demo backup), or mock (CI). The orchestrator
contract is identical across backends — agent prompts and structured
outputs are unchanged.
✅ Decision support for clinicians · workflow automation · documentation assistance · coordination orchestration · administrative form generation
❌ Not a medical device · not autonomous (no auto-execution without clinician sign-off) · not a replacement for clinical judgment · not a PHI cloud relay
ATLAS sits in the Clinical Decision Support (CDS) safe harbor under the 21st Century Cures Act: used by healthcare professionals, evidence-based, traceable, and structurally reviewable via "Show Our Work." We're the workflow layer, not the medicine.