Find your Bollywood celebrity doppelgänger using AI-powered facial recognition and vector search!
Upload a photo of yourself, and the app instantly finds which Bollywood celebrity you resemble the most. Using advanced facial recognition AI and vector similarity search, the app compares your facial features against 12,000+ celebrity images to find your top matches.
Key Features:
- 📸 Upload any photo with a visible face
- 🎯 Get top 3 celebrity matches with similarity scores
- 👨/👩 Filter results by gender (Male/Female)
- ⚡ Real-time results powered by vector search
- 🎬 100 Bollywood celebrities including Shah Rukh Khan, Deepika Padukone, Ranveer Singh, and more!
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER UPLOADS PHOTO │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: FACE DETECTION & EMBEDDING (Local - Your Machine) │
│ ───────────────────────────────────────────────────────────────────────── │
│ • InsightFace AI model detects the face in your photo │
│ • Extracts 512 facial landmark points and features │
│ • Converts face into a 512-dimensional vector (embedding) │
│ • This vector is a mathematical representation of your facial features │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 2: VECTOR SIMILARITY SEARCH (Couchbase Capella) │
│ ───────────────────────────────────────────────────────────────────────── │
│ • Your face embedding is sent to Couchbase │
│ • Vector Search finds the most similar celebrity embeddings │
│ • Uses dot product similarity to calculate match scores │
│ • Returns top matches ranked by similarity │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 3: DISPLAY RESULTS │
│ ───────────────────────────────────────────────────────────────────────── │
│ • Shows your photo alongside top 3 celebrity matches │
│ • Displays similarity percentage for each match │
│ • Celebrity name and gender badge shown │
└─────────────────────────────────────────────────────────────────────────────┘
A face embedding is a numerical representation of facial features. The AI model analyzes:
- Face shape and structure
- Eye position, size, and shape
- Nose characteristics
- Mouth and lip features
- Jawline contours
- Overall facial proportions
These features are encoded into a 512-number array (vector). Similar-looking faces have similar vectors, which allows us to find matches using mathematical distance calculations.
The match percentage represents how similar your facial features are to a celebrity:
- 90%+ = Very strong resemblance
- 70-90% = Notable similarity
- 50-70% = Some shared features
- Below 50% = Minimal similarity
The score is calculated using cosine similarity (dot product) between your face embedding and celebrity embeddings.
The cloud-hosted NoSQL database that stores all celebrity data.
Each celebrity image is stored as a JSON document:
{
"type": "celebrity_face",
"celebrity_id": 4,
"celebrity_name": "Shah Rukh Khan",
"filename": "0004_srk_42.jpg",
"gender": "male",
"embedding": [0.023, -0.045, 0.089, ... ] // 512 numbers
}Couchbase's vector search capability enables:
- Index:
celebrity_face_index- indexes theembeddingfield - Dimensions: 512 (matching InsightFace output)
- Similarity Metric: Dot Product
- Optimized for: Recall (finding best matches)
The vector search performs Approximate Nearest Neighbor (ANN) search to efficiently find similar faces among 12,000+ embeddings in milliseconds.
Organized data structure:
Bucket: celebrities
└── Scope: scope
└── Collection: celeb (12,094 documents)
| Component | Technology | Purpose |
|---|---|---|
| Frontend | HTML, CSS, JavaScript | User interface |
| Backend | FastAPI (Python) | REST API server |
| Face AI | InsightFace (buffalo_l) | Face detection & embeddings |
| Database | Couchbase Capella | Document & vector storage |
| Search | Couchbase Vector Search | Similarity matching |
| Runtime | Apple M4 Mac | Local embedding generation |
Bollywood Celebrity Faces Dataset
| Metric | Value |
|---|---|
| Total Images | 12,094 |
| Celebrities | 100 |
| Male Celebrities | 49 (~5,085 images) |
| Female Celebrities | 51 (~6,874 images) |
| Embedding Dimensions | 512 |
Sample Celebrities:
- Shah Rukh Khan, Salman Khan, Aamir Khan
- Deepika Padukone, Priyanka Chopra, Alia Bhatt
- Ranveer Singh, Hrithik Roshan, Ranbir Kapoor
- Kareena Kapoor, Katrina Kaif, Anushka Sharma
- ...and 88 more!
┌──────────────────────────────────────────────────────────────────────────┐
│ CLIENT (Browser) │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ • Upload photo │ │
│ │ • Select gender filter │ │
│ │ • View results │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
│
▼ HTTP POST /api/find-lookalike
┌──────────────────────────────────────────────────────────────────────────┐
│ FASTAPI SERVER (Local) │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 1. Receive uploaded image │ │
│ │ 2. Pass to InsightFace model │ │
│ │ 3. Get 512-dim face embedding │ │
│ │ 4. Query Couchbase vector search │ │
│ │ 5. Return matched celebrities │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────────────┐
│ INSIGHTFACE (Local) │ │ COUCHBASE CAPELLA (Cloud) │
│ ┌───────────────────┐ │ │ ┌───────────────────────────┐ │
│ │ buffalo_l model │ │ │ │ Vector Search Index │ │
│ │ Face detection │ │ │ │ 12,094 celebrity docs │ │
│ │ 512-dim embedding │ │ │ │ Dot product similarity │ │
│ └───────────────────┘ │ │ └───────────────────────────┘ │
└─────────────────────────┘ └─────────────────────────────────┘
- ✅ Your photo never leaves your machine for AI processing
- ✅ Only the numerical embedding (512 numbers) is sent to the cloud
- ✅ Embeddings cannot be reversed to reconstruct faces
- ✅ No photos are stored on servers
# 1. Activate environment
source venv/bin/activate
# 2. Start the server
uvicorn app.main:app --reload --port 8000
# 3. Open browser
open http://localhost:8000celebapp/
├── app/
│ ├── main.py # FastAPI application
│ ├── embedding.py # InsightFace face embedding
│ └── couchbase_client.py # Couchbase vector search
├── static/
│ └── index.html # Web UI
├── scripts/
│ ├── process_all_bollywood.py # Dataset processing
│ └── add_bollywood_gender.py # Gender labeling
├── data/
│ ├── bollywood_full_processed/ # Celebrity images
│ └── bollywood_full_embeddings/ # Pre-computed embeddings
├── requirements.txt
└── README.md
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Web UI |
/health |
GET | Health check |
/api/find-lookalike |
POST | Find celebrity matches |
/api/celebrity-image/{filename} |
GET | Get celebrity image |
- Couchbase Capella - Cloud database with vector search
- InsightFace - State-of-the-art face recognition
- FastAPI - Modern Python web framework
- Apple Silicon - Optimized for M-series Macs
Made with ❤️ using Couchbase Vector Search