ThirdParty is a relationship mirror and guided journaling app for TreeHacks 2026.
- Next.js 14 App Router
- TypeScript
- Anthropic SDK + strict JSON validation with zod
- Local JSON storage in
data/
-
Install dependencies
npm install
-
Environment
cp .env.example .env.local
Edit
.env.localand set at least:OPENAI_API_KEY=sk-...(required for Voice tab transcription + diarization)ANTHROPIC_API_KEY=...(for mediator/reflections; optional)
Never commit
.env.localor paste keys into the repo. -
Run the app
npm run dev
Open http://localhost:3003. Use the Voice tab to upload or record and transcribe with speaker labels.
-
Optional: real speaker IDs (ECAPA-TDNN)
For persistent “who is this voice?” across sessions (not just placeholder labels):
-
Install Python 3 and pip, then run the embedder in a second terminal:
npm run embedder
Or manually:
cd services/speaker_embedder pip install -r requirements.txt uvicorn app:app --host 0.0.0.0 --port 5000 -
In
.env.localadd (or uncomment):SPEAKER_EMBEDDER_URL=http://localhost:5000/embed -
Restart
npm run dev. The Voice page will show “ECAPA-TDNN” when the embedder is reachable.
-
-
Optional: audio conversion (webm/mp3 → WAV)
The app uses ffmpeg-static (installed with
npm install) to convert uploads to 16 kHz mono WAV for best embedder results. No separate ffmpeg install needed. If conversion fails (e.g. unsupported format), transcription still runs; speaker IDs may be less accurate without the embedder.
Same as Quick start above: npm install → copy .env.example to .env.local and set keys → npm run dev → optional npm run embedder + SPEAKER_EMBEDDER_URL.
The app now includes a conversation-awareness detector and recording pipeline:
POST /api/conversationAwareness/listen:- body:
{ "listeningEnabled": true | false }
- body:
GET /api/conversationAwareness/state:- returns detector state, recent sessions, and recent events
POST /api/conversationAwareness/ingestSignal:- body:
{ "source": "microphone" | "meta_glasses" | "phone_camera", "audioLevel": 0..1, "presenceScore": 0..1, "speakerHints": [{ personTag, speakingScore }] }
- body:
POST /api/conversationAwareness/uploadClip:- body:
{ "sessionId": "...", "audioBase64": "...", "mimeType": "audio/webm" }
- body:
POST /api/metaGlasses/ingest:- body:
{ "deviceId": "...", "audioLevel": 0..1, "speakerHints": [{ personTag, speakingScore }] }
- body:
- Facial recognition is not implemented.
- Identity is based on consented person tags and speaker hints only.
- Raw captured audio is stored locally in
data/awareness/clipsand is not shared by the shared-session flow. - Phone camera mode computes co-presence and motion scores only. It does not identify people and does not persist video frames.
- Go to
/timeline - Tap the gear icon to open
/settings - Start listening to activate microphone monitoring, optional phone camera co-presence monitoring, and detector-triggered recording
- Use the Meta glasses signal panel to ingest device-side speaker hints
Two pipelines:
-
OpenAI + speaker memory (recommended)
OpenAIgpt-4o-transcribe-diarizefor transcription + diarization (speaker turns). Then speaker embeddings + clustering (cosine similarity, centroid updates) to build persistent “who is this voice?” across sessions. No Azure Speaker Recognition; open-world discovery. See docs/voice-pipeline.md. -
Pyannote diarization + speaker memory (optional)
Run local pyannote diarization service and set:VOICE_DIARIZATION_BACKEND=pyannotePYANNOTE_DIARIZER_URL=http://localhost:5010/diarizeThis uses pyannote for speaker-turn detection and keeps the same speaker clustering/persistent profiles pipeline.
-
Google + Azure (optional)
Google Speech-to-Text for diarization; Azure Speaker Recognition to identify enrolled speakers only.
See Quick start above. In short:
- Set OPENAI_API_KEY in
.env.local(never commit it). - Real speaker IDs (optional): Run
npm run embedderin a second terminal (or run the Python service manually; see Quick start). Set SPEAKER_EMBEDDER_URL=http://localhost:5000/embed in.env.local.
Details: docs/speaker-embedding-analysis.md. - Audio conversion: The app uses ffmpeg-static (installed with npm) to convert uploads to WAV 16 kHz mono; no separate ffmpeg install needed.
- In a second terminal run:
npm run diarizer- or follow
services/pyannote/README.md
- In
.env.localset:VOICE_DIARIZATION_BACKEND=pyannotePYANNOTE_DIARIZER_URL=http://localhost:5010/diarize
- Keep
OPENAI_API_KEYoptional for fallback behavior if the local pyannote service is unavailable.
-
Google Cloud
- Create a project and enable the Speech-to-Text API.
- Create a service account, download a JSON key, and set in
.env.local:GOOGLE_APPLICATION_CREDENTIALS=/absolute/path/to/your-key.json
- Or use
gcloud auth application-default loginand setGOOGLE_CLOUD_PROJECT=your-project-id.
-
Azure
- Create a Speech resource and in
.env.localset:AZURE_SPEECH_KEY=your-keyAZURE_SPEECH_REGION=westus(or your region).
- Create a Speech resource and in
-
Copy .env.example to
.env.localand fill in the keys.
- Voice tab: Choose “OpenAI + speaker memory” (default) or “Google + Azure”. Upload or record → “Transcribe & identify”. With OpenAI: segments get stable speaker IDs over time; you can name speakers via
PATCH /api/voice/speakers. With Google+Azure: enroll people in People → person → “Enroll voice”, then transcribe to match to those enrolled.
TreeHacks 2026 project