ADREI (AI-Driven Real Estate Intelligence) is an advanced property analysis platform that uses a sophisticated multi-agent AI framework with Bright Data APIs for web scraping, Google Gemini AI for intelligent data extraction and analysis, and Mapbox for location-based filtering.
- Intelligent Query Caching:
- AI detects similar queries (e.g., "apartments near UC Berkeley" = "housing near Berkeley campus")
- Automatically reuses previously scraped data
- Skips Stage 1 & 2 entirely for duplicate searches
- Saves API costs and time!
- Four-Stage Processing:
- Stage 0: Check query cache with AI similarity detection
- Stage 1: Quick scan for address, price, and URL (if not cached)
- Stage 2: Parallel deep scrape - Sends all requests to Bright Data simultaneously (6-8x faster!) + AI Detail Extraction (bedrooms, bathrooms, amenities, utilities, contact info) stored in metadata
- Stage 3: Distance-based filtering with AI-powered location understanding + Mapbox geocoding
- Stage 4: AI-powered property management quality analysis & scoring (0-100)
- Intelligent Search: Uses Bright Data's SERP API to search Google for apartment listings
- Multi-Site Scraping: Automatically visits and scrapes multiple apartment listing websites
- AI-Powered Parsing: Leverages Google Gemini to extract structured data from diverse HTML layouts
- Location Intelligence:
- AI corrects spelling mistakes and typos automatically
- Expands abbreviations (e.g., "MIT" β full name with location)
- Calculate walking distances from any location (universities, workplaces, etc.)
- Fallback to straight-line distance if route calculation fails
- Smart Caching:
- Avoids re-scraping already visited apartments
- Caches geocoded coordinates for all apartments
- Caches distance calculations between apartment + target location pairs
- Caches management analysis results permanently
- Up to 90% API cost savings on repeat queries!
- Unique IDs: Each apartment gets a deterministic ID for linking data from multiple sources
- Price Filtering: Filter by price (e.g., "under $2000")
- Rich Data Storage: Saves complete HTML and comprehensive metadata for future reference
- Basic info: address, price, URL
- Apartment details: bedrooms, bathrooms, square feet, deposit, lease length
- Amenities: pool, gym, parking, etc.
- Utilities: water, electricity, internet (included or tenant pays)
- Contact info: phone, email, office hours
- Pet policy and parking details
- Management Analysis:
- AI-powered reputation research
- Quality scoring (0-100 scale)
- Strengths, concerns, and recommendations
- Permanently cached to avoid re-analysis
- π Detailed Apartment Viewer:
- Query "look into apartment 3" to see ALL details for a specific apartment
- Shows bedrooms, bathrooms, amenities, utilities, contact info
- Displays comprehensive scores from all research agents
- Shows management analysis with key strengths and concerns
- π 4 Specialized Research Agents (Micro-Location Analysis):
- Security Agent: Street lighting, crime, foot traffic, security features (5 sub-metrics)
- Accessibility Agent: Sidewalks, healthcare, transit, wheelchair access (5 sub-metrics)
- Pet Friendliness Agent: Parks, vets, pet stores, walking quality (5 sub-metrics)
- Lifestyle Agent: Dining, nightlife, entertainment, shopping, fitness (5 sub-metrics)
- Each agent analyzes specific street + 2-3 block radius (not city-wide!)
- Provides overall score + multiple sub-metrics + key findings + detailed reasoning
- Answers ANY question: "is it safe?", "good for pets?", "wheelchair accessible?", "good nightlife?"
- Cached permanently (instant results on re-query)
- High Performance:
- Parallel processing in Stage 2 (all apartments scraped simultaneously)
- 6-8x faster than sequential scraping
- Bright Data cloud handles concurrent requests efficiently
STAGE 0: INTELLIGENT QUERY CACHE
User Query β Gemini AI (Normalize Query) β Check Query Cache β
Compare with Previous Queries β [MATCH?] β Load Cached Apartments
β [NO MATCH]
STAGE 1: DISCOVERY
SERP API (Google Search) β Filter Relevant Sites β
Web Unlocker (Scrape) β Gemini AI (Extract Address/Price/URL) β
Price Filter
STAGE 2: PARALLEL COLLECTION
Batch All Apartments β Parallel Requests to Bright Data Cloud β
Wait for All Responses β Save Raw Markdown + Metadata β Generate Unique IDs β
Save Query to Cache
STAGE 3: LOCATION FILTERING (Optional)
User Distance Query β Gemini AI (Interpret & Correct Location) β
Mapbox Geocoding (Target Location) β Mapbox Geocoding (Each Apartment) β
Calculate Walking Distances β Filter by Radius β Save Distance Data
STAGE 4: MANAGEMENT ANALYSIS (Optional)
User Details Query β Load Apartment Markdown β Gemini AI (Extract Management Name) β
Gemini AI (Analyze Reputation & Quality) β Score 0-100 β
Save Analysis to Metadata β Display Results
-
Bright Data Account (https://brightdata.com)
- API Key
- SERP Zone configured
- Web Unlocker Zone configured
-
Google AI Studio Account (https://ai.google.dev/)
- Google API Key for Gemini
-
Mapbox Account (https://www.mapbox.com/)
- Mapbox Access Token (for geocoding and distance calculations)
-
Node.js (v16 or higher)
cd "C:\Studies\Project\CAL HACKS 12.0\Room Hunt"npm installThis will install:
dotenv- Environment variable managementnode-fetch- HTTP client for API calls@google/generative-ai- Google Gemini AI SDK@mapbox/mapbox-sdk- Mapbox Geocoding & Directions APIreadline- CLI interface
Create a .env file in the project root with the following:
# Bright Data API Configuration
BRIGHTDATA_API_KEY="your_brightdata_api_key_here"
BRIGHTDATA_SERP_ZONE="your_serp_zone_id_here"
BRIGHTDATA_UNLOCKER_ZONE_ID="your_unlocker_zone_id_here"
# Google AI Configuration
GOOGLE_API_KEY="your_google_api_key_here"
# Mapbox Configuration (Optional - hardcoded token included, but you can override)
MAPBOX_ACCESS_TOKEN="your_mapbox_access_token_here"Bright Data:
- Log in to your Bright Data dashboard
- Go to Zones β Find your SERP Zone β Copy the Zone ID
- Go to Zones β Find your Web Unlocker Zone β Copy the Zone ID
- Go to Settings β API β Copy your API Key
Google Gemini:
- Visit https://ai.google.dev/
- Click Get API Key
- Create a new project or select existing
- Copy your API key
Mapbox:
- Visit https://www.mapbox.com/
- Sign up for a free account
- Go to Account β Access tokens
- Copy your default public token (or create a new one)
- Note: Free tier includes 100,000 requests/month
Test Bright Data connectivity:
npm testThis runs the demo script to verify your API credentials work.
Start the ADREI web server:
npm run webThen open your browser to: http://localhost:3000
Server file: web-server.js
Features:
- π¨ Professional white-themed UI with intelligent chat assistant
- π Dynamic apartment cards with comprehensive property data
- π¬ Interactive ADREI Assistant (same intelligence as CLI)
- π Real-time updates via Server-Sent Events
- π Detailed property viewer with full analysis
- π Direct links to original listings
npm startor
node apartment-finder.jsThe application understands two types of queries:
Enter your query: apartments for rent near Chico State
Enter your query: 2 bedroom apartments in San Francisco under $3000
Enter your query: studio apartments near UCLA under $2000
Enter your query: affordable housing in Seattle
Enter your query: within 1 mile of UC Berkeley
Enter your query: within 1 mile of UC Berkley
(AI auto-corrects: "Berkley" β "Berkeley")
Enter your query: less than 2 miles from chico state
(AI expands: β "California State University Chico, Chico, California")
Enter your query: within 0.5 miles of Stanford
Enter your query: under 3 kilometers from MIT
(AI expands: β "Massachusetts Institute of Technology, Cambridge, Massachusetts")
β¨ The AI handles typos and abbreviations automatically!
Enter your query: give me details
Enter your query: show me a report
Enter your query: provide information
Enter your query: analyze management
Enter your query: show reviews
Enter your query: quality score
β¨ AI analyzes property management reputation and scores 0-100!
Enter your query: look into apartment 3
Enter your query: show me apartment 1
Enter your query: details for apartment 5
Enter your query: apartment 2 details
β¨ Shows ALL info: beds/baths, amenities, utilities, contact, management analysis, research scores!
Enter your query: is it safe?
β Security Agent researches crime, police presence, street safety for EACH apartment
Enter your query: good for pets?
β Pet Friendliness Agent researches dog parks, vets, pet stores nearby
Enter your query: wheelchair accessible?
β Accessibility Agent researches sidewalks, transit, healthcare access
Enter your query: good nightlife?
β Lifestyle Agent researches dining, entertainment, nightlife
Enter your query: are there homeless in this area?
β Security Agent analyzes homelessness issues for each specific street
β¨ Each agent scores 0-100 + provides detailed findings for EACH apartment's street!
- Query Normalization: Gemini AI extracts search intent (location, price, bedrooms, keywords)
- Similarity Detection: Compares with previous queries to find matches
- "apartments near UC Berkeley" matches "housing near Berkeley campus"
- "rentals in Berkeley under $2000" matches "apartments in Berkeley under $2500"
- Cache Hit: If similar query found, loads apartments instantly (skips Stage 1 & 2!)
- Cache Miss: Proceeds to Stage 1 & 2
- Search Phase: Searches Google for relevant apartment listing websites
- Filtering Phase: Filters results to apartment-focused sites (excludes Zillow, social media, etc.)
- Quick Scan: Scrapes top sites and extracts basic info:
- Address
- Price
- Listing URL
- Price Filtering: Filters by price if specified (e.g., "under $2000")
- Parallel Deep Scrape:
- Checks cache for previously scraped apartments
- Batches all uncached apartments
- Sends parallel requests to Bright Data (all at once!)
- Waits for all responses simultaneously
- Typically 6-8x faster than sequential scraping
- Data Storage: Saves raw HTML and metadata with unique apartment ID
- Query Caching: Saves this query's intent and results for future similar searches
- Parse Query: Extracts distance (e.g., "1 mile") and target location (e.g., "UC Berkley")
- AI Interpretation: Gemini AI corrects typos and clarifies location (e.g., "UC Berkley" β "University of California Berkeley, Berkeley, California")
- Geocode Target: Converts corrected location to GPS coordinates using Mapbox
- Geocode Apartments: Converts each apartment address to coordinates
- Calculate Distances: Uses Mapbox Directions API to calculate walking distance (falls back to straight-line if needed)
- Filter Results: Shows only apartments within specified radius
- Save Distance Data: Stores distance info with same apartment ID for data linking
- Display: Shows filtered results with distance and walking time
- Detect Query: Recognizes keywords like "details", "report", "summary", "analysis", "reviews", etc.
- Load Apartments: Loads current filtered/scraped apartment list from
apartments_index.json - Check Cache: For each apartment, checks if management analysis already exists in metadata
- Extract Management Name: Uses Gemini AI to identify property management company from markdown
- AI Analysis: Gemini AI researches and evaluates management on multiple factors:
- Reputation & trustworthiness
- Responsiveness to tenant issues
- Property maintenance quality
- Lease terms fairness
- Communication quality
- Overall tenant satisfaction
- Generate Score: AI assigns quality score (0-100) with rating (Excellent/Good/Fair/Below Average/Poor)
- Create Report: Generates summary, strengths, concerns, and recommendation
- Save & Cache: Permanently stores analysis in apartment metadata (never re-analyzed)
- Display: Shows all apartments with management scores and reports
================================================================================
FINAL RESULTS
================================================================================
π All data saved to: ./apartment_data/
Each apartment has:
- Raw Markdown file (.md) - cleaner than HTML!
- Metadata file (.json) with address, price, URL, analysis
Summary of apartments:
1. [NEW] 123 Main St, Berkeley, CA 94704
π ID: apt_8f3a9c2b
π° Price: $2,500/month
π URL: https://...
2. [CACHED] 456 Oak Ave, Berkeley, CA 94705
π ID: apt_1d4e5f6a
π° Price: $2,800/month
π URL: https://...
================================================================================
STAGE 3: DISTANCE FILTERING
================================================================================
[Filter] Distance: 1 mile
[Filter] Location: UC Berkeley
[1/10] 123 Main St, Berkeley, CA 94704
π Distance: 0.45 miles
β±οΈ Walking time: 9 min walk
β
Within range!
[2/10] 456 Oak Ave, Berkeley, CA 94705
π Distance: 1.2 miles
β±οΈ Walking time: 24 min walk
β Out of range (1.2 miles > 1 mile)
================================================================================
FINAL RESULTS
================================================================================
Summary of apartments:
1. [NEW] 123 Main St, Berkeley, CA 94704
π ID: apt_8f3a9c2b
π° Price: $2,500/month
π Distance: 0.45 miles
β±οΈ Walking: 9 min walk
π URL: https://...
================================================================================
STAGE 4: AI-POWERED MANAGEMENT ANALYSIS
================================================================================
[AI Agent] Analyzing 2 property managements...
This may take a few moments...
================================================================================
[1/2] 123 Main St, Berkeley, CA 94704
Price: $2,500/month
Management: Berkeley Property Management LLC
[AI Agent] Analyzing management: "Berkeley Property Management LLC"
[AI Agent] β Score: 82/100 (Good)
[Storage] β
Saved management analysis (Score: 82/100)
================================================================================
[Stage 4 Summary]
β Total properties: 2
πΎ From cache: 0
π Newly analyzed: 2
β οΈ Skipped: 0
================================================================================
FINAL RESULTS
================================================================================
Summary of apartments:
1. [NEW] 123 Main St, Berkeley, CA 94704
π ID: apt_8f3a9c2b
π° Price: $2,500/month
π Distance: 0.45 miles
β±οΈ Walking: 9 min walk
π’ Management: Berkeley Property Management LLC
π Quality Score: 82/100 (Good)
π Well-established company with responsive maintenance team. Generally positive
tenant reviews with some concerns about lease renewal processes.
β
Strengths: Quick maintenance response, Professional staff
β οΈ Concerns: Lease renewal fees, Limited parking
π URL: https://...
Room Hunt/
βββ apartment-finder.js # Main orchestration & CLI
βββ api-services.js # API integrations: Bright Data, Gemini, Mapbox π
βββ cache-manager.js # Data storage & caching logic π
βββ brightdata_api_demo.js # Reference implementation
βββ package.json # Node.js dependencies
βββ .env # API keys (YOU CREATE THIS)
βββ README.md # This file
βββ MODULE_STRUCTURE.md # Module organization guide π
βββ SETUP_CHECKLIST.md # Setup guide
βββ PROJECT_SUMMARY.md # Technical documentation
βββ STAGE3_DISTANCE_FILTERING.md # Distance filtering guide
βββ STAGE4_MANAGEMENT_ANALYSIS.md # Management analysis guide π
βββ INTELLIGENT_QUERY_CACHING.md # Query caching details
βββ apartment_data/ # Generated data directory
βββ query_cache.json # Cache of previous searches
βββ apartments_index.json # Master registry of all apartments
βββ apt_8f3a9c2b.md # Raw Markdown for apartment π
βββ apt_8f3a9c2b.json # Metadata (address, price, URL, management_analysis) π
βββ apt_8f3a9c2b__distance.json # Distance data (if filtered)
βββ apt_1d4e5f6a.md
βββ apt_1d4e5f6a.json
βββ ...
The codebase has been split into 3 focused modules for better maintainability:
-
api-services.js(440 lines)- All external API integrations (Bright Data, Gemini AI, Mapbox)
- Exports: CONFIG, model, API functions
-
cache-manager.js(265 lines)- Data storage, caching, and file management
- Handles query cache, URL cache, and file operations
-
apartment-finder.js(653 lines)- Main orchestration logic and CLI interface
- Imports from both modules above
Benefits:
- β Clear separation of concerns
- β Easy to test individual components
- β Better maintainability (vs. 1358-line monolith)
- β Faster to locate specific functionality
See MODULE_STRUCTURE.md for detailed documentation!
query_cache.json: Stores previous search queries with normalized intents and apartment IDs (enables intelligent query reuse)apartments_index.json: Master list of all scraped apartments with their unique IDsapt_XXXXXXXX.md: Complete raw Markdown from the apartment listing page (cleaner than HTML!)apt_XXXXXXXX.json: Extracted metadata including:- Basic info: address, price, URL, scrape timestamp
- Distance data: coordinates, walking distance/time (if Stage 3 used)
- Management analysis: company name, score, rating, summary, strengths, concerns (if Stage 4 used)
apt_XXXXXXXX__distance.json: (Legacy) Distance calculations (now stored in main JSON)
In apartment-finder.js, modify the searchGoogle() call:
const serpResults = await searchGoogle(enhancedQuery, 10); // Change 10 to desired numberIn apartment-finder.js, adjust the slice:
const topResults = filteredResults.slice(0, 5); // Change 5 to desired numberIn apartment-finder.js, modify parseHtmlWithAI():
const maxLength = 30000; // Change to adjust token usageSolution: Ensure your .env file exists and has correct variable names
Solution:
- Verify your SERP Zone is active in Bright Data dashboard
- Check your API key has sufficient credits
- Ensure
BRIGHTDATA_SERP_ZONEis correctly set
Solution:
- Verify your Web Unlocker Zone is active
- Check you have credits available
Solution:
- Verify
GOOGLE_API_KEYis correct - Ensure Generative Language API is enabled
- Check you haven't exceeded API quota
Solution:
- Some sites are heavily JavaScript-rendered
- Try different search queries
- The site structure may not have clear listing information
Old Sequential Approach:
- 10 apartments Γ (3-5 sec each + 3 sec delay) = 60-80 seconds
- 50 apartments = 5+ minutes π΄
New Parallel Approach:
- 10 apartments scraped simultaneously = ~5-10 seconds β‘
- 50 apartments scraped simultaneously = ~10-15 seconds π
- 6-8x faster overall!
How it works:
- Filter out cached apartments (instant)
- Send all uncached apartments to Bright Data at once
- Bright Data cloud processes them in parallel
- Save all results when complete
- Cost Awareness:
- Each SERP request and Web Unlocker request consumes Bright Data credits
- Parallel processing doesn't increase cost - same # of requests, just faster!
- Mapbox Geocoding API: Free tier includes 100,000 requests/month
- Google Gemini: Generous free tier for testing
- Rate Limits: Be mindful of API rate limits on Bright Data, Google AI, and Mapbox
- HTML Complexity: Very JavaScript-heavy sites might not return full content in initial HTML
- Token Limits: Large HTML files are truncated before sending to Gemini (100,000 chars limit)
- Caching: Already-scraped apartments are skipped to save API costs
- Unique IDs: SHA256-based deterministic IDs allow linking data from multiple sources
- Distance Calculations: Uses walking routes via Mapbox Directions API for real-world distances
- Ethical Scraping: Respect website terms of service and robots.txt
| Feature | JavaScript | Python (Deprecated) |
|---|---|---|
| Setup | β Simple npm install | β Complex LangChain versions |
| Dependencies | β 3 packages | β 5+ packages with conflicts |
| API Compatibility | β Stable | β Breaking changes |
| Error Handling | β Straightforward | β Import errors |
| Performance | β Fast async/await | β Similar |
-
Add caching to avoid re-scraping same URLsβ Implemented -
Add filtering options (price range)β Implemented -
Add support for location-based radius searchesβ Implemented with Mapbox -
Generate unique IDs for data linkingβ Implemented - Export results to CSV/Excel file
- Implement pagination support to get more listings from each site
- Add support for Bright Data Data Collector API for specific sites
-
Implement parallel scraping for faster executionβ Implemented - Add more filters (bedrooms, bathrooms, amenities)
- Add email/SMS alerts for new matching listings
- Add transit time calculations (bus, subway, etc.)
ADREI (AI-Driven Real Estate Intelligence System) was developed for CAL HACKS 12.0.
This project is for educational and demonstration purposes.
- Bright Data for providing powerful web scraping infrastructure
- Google for Gemini AI API
- Mapbox for geocoding and directions APIs
- Node.js community for excellent async patterns