Skip to content

AmolBhalerao8/CalHacks12

Repository files navigation

ADREI - AI-Driven Real Estate Intelligence System

ADREI (AI-Driven Real Estate Intelligence) is an advanced property analysis platform that uses a sophisticated multi-agent AI framework with Bright Data APIs for web scraping, Google Gemini AI for intelligent data extraction and analysis, and Mapbox for location-based filtering.

🌟 Features

  • Intelligent Query Caching:
    • AI detects similar queries (e.g., "apartments near UC Berkeley" = "housing near Berkeley campus")
    • Automatically reuses previously scraped data
    • Skips Stage 1 & 2 entirely for duplicate searches
    • Saves API costs and time!
  • Four-Stage Processing:
    • Stage 0: Check query cache with AI similarity detection
    • Stage 1: Quick scan for address, price, and URL (if not cached)
    • Stage 2: Parallel deep scrape - Sends all requests to Bright Data simultaneously (6-8x faster!) + AI Detail Extraction (bedrooms, bathrooms, amenities, utilities, contact info) stored in metadata
    • Stage 3: Distance-based filtering with AI-powered location understanding + Mapbox geocoding
    • Stage 4: AI-powered property management quality analysis & scoring (0-100)
  • Intelligent Search: Uses Bright Data's SERP API to search Google for apartment listings
  • Multi-Site Scraping: Automatically visits and scrapes multiple apartment listing websites
  • AI-Powered Parsing: Leverages Google Gemini to extract structured data from diverse HTML layouts
  • Location Intelligence:
    • AI corrects spelling mistakes and typos automatically
    • Expands abbreviations (e.g., "MIT" β†’ full name with location)
    • Calculate walking distances from any location (universities, workplaces, etc.)
    • Fallback to straight-line distance if route calculation fails
  • Smart Caching:
    • Avoids re-scraping already visited apartments
    • Caches geocoded coordinates for all apartments
    • Caches distance calculations between apartment + target location pairs
    • Caches management analysis results permanently
    • Up to 90% API cost savings on repeat queries!
  • Unique IDs: Each apartment gets a deterministic ID for linking data from multiple sources
  • Price Filtering: Filter by price (e.g., "under $2000")
  • Rich Data Storage: Saves complete HTML and comprehensive metadata for future reference
    • Basic info: address, price, URL
    • Apartment details: bedrooms, bathrooms, square feet, deposit, lease length
    • Amenities: pool, gym, parking, etc.
    • Utilities: water, electricity, internet (included or tenant pays)
    • Contact info: phone, email, office hours
    • Pet policy and parking details
  • Management Analysis:
    • AI-powered reputation research
    • Quality scoring (0-100 scale)
    • Strengths, concerns, and recommendations
    • Permanently cached to avoid re-analysis
  • πŸ†• Detailed Apartment Viewer:
    • Query "look into apartment 3" to see ALL details for a specific apartment
    • Shows bedrooms, bathrooms, amenities, utilities, contact info
    • Displays comprehensive scores from all research agents
    • Shows management analysis with key strengths and concerns
  • πŸ†• 4 Specialized Research Agents (Micro-Location Analysis):
    • Security Agent: Street lighting, crime, foot traffic, security features (5 sub-metrics)
    • Accessibility Agent: Sidewalks, healthcare, transit, wheelchair access (5 sub-metrics)
    • Pet Friendliness Agent: Parks, vets, pet stores, walking quality (5 sub-metrics)
    • Lifestyle Agent: Dining, nightlife, entertainment, shopping, fitness (5 sub-metrics)
    • Each agent analyzes specific street + 2-3 block radius (not city-wide!)
    • Provides overall score + multiple sub-metrics + key findings + detailed reasoning
    • Answers ANY question: "is it safe?", "good for pets?", "wheelchair accessible?", "good nightlife?"
    • Cached permanently (instant results on re-query)
  • High Performance:
    • Parallel processing in Stage 2 (all apartments scraped simultaneously)
    • 6-8x faster than sequential scraping
    • Bright Data cloud handles concurrent requests efficiently

πŸ—οΈ Architecture

                        STAGE 0: INTELLIGENT QUERY CACHE
User Query β†’ Gemini AI (Normalize Query) β†’ Check Query Cache β†’ 
Compare with Previous Queries β†’ [MATCH?] β†’ Load Cached Apartments
                                    ↓ [NO MATCH]
                                    
                        STAGE 1: DISCOVERY
SERP API (Google Search) β†’ Filter Relevant Sites β†’ 
Web Unlocker (Scrape) β†’ Gemini AI (Extract Address/Price/URL) β†’ 
Price Filter

                        STAGE 2: PARALLEL COLLECTION
Batch All Apartments β†’ Parallel Requests to Bright Data Cloud β†’ 
Wait for All Responses β†’ Save Raw Markdown + Metadata β†’ Generate Unique IDs β†’
Save Query to Cache

                        STAGE 3: LOCATION FILTERING (Optional)
User Distance Query β†’ Gemini AI (Interpret & Correct Location) β†’ 
Mapbox Geocoding (Target Location) β†’ Mapbox Geocoding (Each Apartment) β†’ 
Calculate Walking Distances β†’ Filter by Radius β†’ Save Distance Data

                        STAGE 4: MANAGEMENT ANALYSIS (Optional)
User Details Query β†’ Load Apartment Markdown β†’ Gemini AI (Extract Management Name) β†’
Gemini AI (Analyze Reputation & Quality) β†’ Score 0-100 β†’ 
Save Analysis to Metadata β†’ Display Results

πŸ“‹ Prerequisites

Required Accounts & API Keys

  1. Bright Data Account (https://brightdata.com)

    • API Key
    • SERP Zone configured
    • Web Unlocker Zone configured
  2. Google AI Studio Account (https://ai.google.dev/)

    • Google API Key for Gemini
  3. Mapbox Account (https://www.mapbox.com/)

    • Mapbox Access Token (for geocoding and distance calculations)
  4. Node.js (v16 or higher)

πŸš€ Setup Instructions

1. Navigate to Project

cd "C:\Studies\Project\CAL HACKS 12.0\Room Hunt"

2. Install Dependencies

npm install

This will install:

  • dotenv - Environment variable management
  • node-fetch - HTTP client for API calls
  • @google/generative-ai - Google Gemini AI SDK
  • @mapbox/mapbox-sdk - Mapbox Geocoding & Directions API
  • readline - CLI interface

3. Configure Environment Variables

Create a .env file in the project root with the following:

# Bright Data API Configuration
BRIGHTDATA_API_KEY="your_brightdata_api_key_here"
BRIGHTDATA_SERP_ZONE="your_serp_zone_id_here"
BRIGHTDATA_UNLOCKER_ZONE_ID="your_unlocker_zone_id_here"

# Google AI Configuration
GOOGLE_API_KEY="your_google_api_key_here"

# Mapbox Configuration (Optional - hardcoded token included, but you can override)
MAPBOX_ACCESS_TOKEN="your_mapbox_access_token_here"

How to Get Your API Keys:

Bright Data:

  1. Log in to your Bright Data dashboard
  2. Go to Zones β†’ Find your SERP Zone β†’ Copy the Zone ID
  3. Go to Zones β†’ Find your Web Unlocker Zone β†’ Copy the Zone ID
  4. Go to Settings β†’ API β†’ Copy your API Key

Google Gemini:

  1. Visit https://ai.google.dev/
  2. Click Get API Key
  3. Create a new project or select existing
  4. Copy your API key

Mapbox:

  1. Visit https://www.mapbox.com/
  2. Sign up for a free account
  3. Go to Account β†’ Access tokens
  4. Copy your default public token (or create a new one)
  5. Note: Free tier includes 100,000 requests/month

4. Test the Setup (Optional)

Test Bright Data connectivity:

npm test

This runs the demo script to verify your API credentials work.

πŸ’» Usage

Option 1: Web Interface - ADREI Web UI (NEW! πŸŽ‰)

Start the ADREI web server:

npm run web

Then open your browser to: http://localhost:3000

Server file: web-server.js

Features:

  • 🎨 Professional white-themed UI with intelligent chat assistant
  • 🏠 Dynamic apartment cards with comprehensive property data
  • πŸ’¬ Interactive ADREI Assistant (same intelligence as CLI)
  • πŸ“Š Real-time updates via Server-Sent Events
  • πŸ” Detailed property viewer with full analysis
  • πŸ”— Direct links to original listings

Option 2: Command Line Interface (CLI)

npm start

or

node apartment-finder.js

Example Queries

The application understands two types of queries:

Stage 1 & 2: Initial Apartment Search

Enter your query: apartments for rent near Chico State

Enter your query: 2 bedroom apartments in San Francisco under $3000

Enter your query: studio apartments near UCLA under $2000

Enter your query: affordable housing in Seattle

Stage 3: Distance Filtering (After Initial Search)

Enter your query: within 1 mile of UC Berkeley

Enter your query: within 1 mile of UC Berkley
(AI auto-corrects: "Berkley" β†’ "Berkeley")

Enter your query: less than 2 miles from chico state
(AI expands: β†’ "California State University Chico, Chico, California")

Enter your query: within 0.5 miles of Stanford

Enter your query: under 3 kilometers from MIT
(AI expands: β†’ "Massachusetts Institute of Technology, Cambridge, Massachusetts")

✨ The AI handles typos and abbreviations automatically!

Stage 4: Management Quality Analysis (After Filtering)

Enter your query: give me details

Enter your query: show me a report

Enter your query: provide information

Enter your query: analyze management

Enter your query: show reviews

Enter your query: quality score

✨ AI analyzes property management reputation and scores 0-100!

πŸ†• Detailed Apartment Viewer (View Single Apartment)

Enter your query: look into apartment 3

Enter your query: show me apartment 1

Enter your query: details for apartment 5

Enter your query: apartment 2 details

✨ Shows ALL info: beds/baths, amenities, utilities, contact, management analysis, research scores!

πŸ†• Specialized Research Agents (Micro-Location Analysis)

Enter your query: is it safe?
β†’ Security Agent researches crime, police presence, street safety for EACH apartment

Enter your query: good for pets?
β†’ Pet Friendliness Agent researches dog parks, vets, pet stores nearby

Enter your query: wheelchair accessible?
β†’ Accessibility Agent researches sidewalks, transit, healthcare access

Enter your query: good nightlife?
β†’ Lifestyle Agent researches dining, entertainment, nightlife

Enter your query: are there homeless in this area?
β†’ Security Agent analyzes homelessness issues for each specific street

✨ Each agent scores 0-100 + provides detailed findings for EACH apartment's street!

What Happens:

Stage 0: Intelligent Query Cache Check

  1. Query Normalization: Gemini AI extracts search intent (location, price, bedrooms, keywords)
  2. Similarity Detection: Compares with previous queries to find matches
    • "apartments near UC Berkeley" matches "housing near Berkeley campus"
    • "rentals in Berkeley under $2000" matches "apartments in Berkeley under $2500"
  3. Cache Hit: If similar query found, loads apartments instantly (skips Stage 1 & 2!)
  4. Cache Miss: Proceeds to Stage 1 & 2

Stage 1 & 2: Initial Search (If Not Cached)

  1. Search Phase: Searches Google for relevant apartment listing websites
  2. Filtering Phase: Filters results to apartment-focused sites (excludes Zillow, social media, etc.)
  3. Quick Scan: Scrapes top sites and extracts basic info:
    • Address
    • Price
    • Listing URL
  4. Price Filtering: Filters by price if specified (e.g., "under $2000")
  5. Parallel Deep Scrape:
    • Checks cache for previously scraped apartments
    • Batches all uncached apartments
    • Sends parallel requests to Bright Data (all at once!)
    • Waits for all responses simultaneously
    • Typically 6-8x faster than sequential scraping
  6. Data Storage: Saves raw HTML and metadata with unique apartment ID
  7. Query Caching: Saves this query's intent and results for future similar searches

Stage 3: Distance Filtering

  1. Parse Query: Extracts distance (e.g., "1 mile") and target location (e.g., "UC Berkley")
  2. AI Interpretation: Gemini AI corrects typos and clarifies location (e.g., "UC Berkley" β†’ "University of California Berkeley, Berkeley, California")
  3. Geocode Target: Converts corrected location to GPS coordinates using Mapbox
  4. Geocode Apartments: Converts each apartment address to coordinates
  5. Calculate Distances: Uses Mapbox Directions API to calculate walking distance (falls back to straight-line if needed)
  6. Filter Results: Shows only apartments within specified radius
  7. Save Distance Data: Stores distance info with same apartment ID for data linking
  8. Display: Shows filtered results with distance and walking time

Stage 4: Management Quality Analysis

  1. Detect Query: Recognizes keywords like "details", "report", "summary", "analysis", "reviews", etc.
  2. Load Apartments: Loads current filtered/scraped apartment list from apartments_index.json
  3. Check Cache: For each apartment, checks if management analysis already exists in metadata
  4. Extract Management Name: Uses Gemini AI to identify property management company from markdown
  5. AI Analysis: Gemini AI researches and evaluates management on multiple factors:
    • Reputation & trustworthiness
    • Responsiveness to tenant issues
    • Property maintenance quality
    • Lease terms fairness
    • Communication quality
    • Overall tenant satisfaction
  6. Generate Score: AI assigns quality score (0-100) with rating (Excellent/Good/Fair/Below Average/Poor)
  7. Create Report: Generates summary, strengths, concerns, and recommendation
  8. Save & Cache: Permanently stores analysis in apartment metadata (never re-analyzed)
  9. Display: Shows all apartments with management scores and reports

πŸ“Š Output Format

Stage 1 & 2: Initial Search Results

================================================================================
FINAL RESULTS
================================================================================

πŸ“ All data saved to: ./apartment_data/

Each apartment has:
  - Raw Markdown file (.md) - cleaner than HTML!
  - Metadata file (.json) with address, price, URL, analysis

Summary of apartments:

1. [NEW] 123 Main St, Berkeley, CA 94704
   πŸ†” ID: apt_8f3a9c2b
   πŸ’° Price: $2,500/month
   πŸ”— URL: https://...

2. [CACHED] 456 Oak Ave, Berkeley, CA 94705
   πŸ†” ID: apt_1d4e5f6a
   πŸ’° Price: $2,800/month
   πŸ”— URL: https://...

Stage 3: Distance Filtering Results

================================================================================
STAGE 3: DISTANCE FILTERING
================================================================================

[Filter] Distance: 1 mile
[Filter] Location: UC Berkeley

[1/10] 123 Main St, Berkeley, CA 94704
  πŸ“ Distance: 0.45 miles
  ⏱️  Walking time: 9 min walk
  βœ… Within range!

[2/10] 456 Oak Ave, Berkeley, CA 94705
  πŸ“ Distance: 1.2 miles
  ⏱️  Walking time: 24 min walk
  βœ— Out of range (1.2 miles > 1 mile)

================================================================================
FINAL RESULTS
================================================================================

Summary of apartments:

1. [NEW] 123 Main St, Berkeley, CA 94704
   πŸ†” ID: apt_8f3a9c2b
   πŸ’° Price: $2,500/month
   πŸ“ Distance: 0.45 miles
   ⏱️  Walking: 9 min walk
   πŸ”— URL: https://...

Stage 4: Management Analysis Results

================================================================================
STAGE 4: AI-POWERED MANAGEMENT ANALYSIS
================================================================================

[AI Agent] Analyzing 2 property managements...
This may take a few moments...

================================================================================
[1/2] 123 Main St, Berkeley, CA 94704
Price: $2,500/month
   Management: Berkeley Property Management LLC

[AI Agent] Analyzing management: "Berkeley Property Management LLC"
[AI Agent] βœ“ Score: 82/100 (Good)
[Storage] βœ… Saved management analysis (Score: 82/100)

================================================================================
[Stage 4 Summary]
  βœ“ Total properties: 2
  πŸ’Ύ From cache: 0
  πŸ†• Newly analyzed: 2
  ⚠️  Skipped: 0

================================================================================
FINAL RESULTS
================================================================================

Summary of apartments:

1. [NEW] 123 Main St, Berkeley, CA 94704
   πŸ†” ID: apt_8f3a9c2b
   πŸ’° Price: $2,500/month
   πŸ“ Distance: 0.45 miles
   ⏱️  Walking: 9 min walk
   🏒 Management: Berkeley Property Management LLC
   πŸ“Š Quality Score: 82/100 (Good)
   πŸ“ Well-established company with responsive maintenance team. Generally positive 
       tenant reviews with some concerns about lease renewal processes.
      βœ… Strengths: Quick maintenance response, Professional staff
      ⚠️  Concerns: Lease renewal fees, Limited parking
   πŸ”— URL: https://...

πŸ“ Project Structure

Room Hunt/
β”œβ”€β”€ apartment-finder.js        # Main orchestration & CLI 
β”œβ”€β”€ api-services.js            # API integrations: Bright Data, Gemini, Mapbox πŸ†•
β”œβ”€β”€ cache-manager.js           # Data storage & caching logic πŸ†•
β”œβ”€β”€ brightdata_api_demo.js     # Reference implementation
β”œβ”€β”€ package.json               # Node.js dependencies
β”œβ”€β”€ .env                       # API keys (YOU CREATE THIS)
β”œβ”€β”€ README.md                  # This file
β”œβ”€β”€ MODULE_STRUCTURE.md        # Module organization guide πŸ†•
β”œβ”€β”€ SETUP_CHECKLIST.md         # Setup guide
β”œβ”€β”€ PROJECT_SUMMARY.md         # Technical documentation
β”œβ”€β”€ STAGE3_DISTANCE_FILTERING.md  # Distance filtering guide
β”œβ”€β”€ STAGE4_MANAGEMENT_ANALYSIS.md # Management analysis guide πŸ†•
β”œβ”€β”€ INTELLIGENT_QUERY_CACHING.md  # Query caching details
└── apartment_data/            # Generated data directory
    β”œβ”€β”€ query_cache.json       # Cache of previous searches
    β”œβ”€β”€ apartments_index.json  # Master registry of all apartments
    β”œβ”€β”€ apt_8f3a9c2b.md        # Raw Markdown for apartment πŸ†•
    β”œβ”€β”€ apt_8f3a9c2b.json      # Metadata (address, price, URL, management_analysis) πŸ†•
    β”œβ”€β”€ apt_8f3a9c2b__distance.json  # Distance data (if filtered)
    β”œβ”€β”€ apt_1d4e5f6a.md
    β”œβ”€β”€ apt_1d4e5f6a.json
    └── ...

πŸ—οΈ Modular Architecture (NEW!)

The codebase has been split into 3 focused modules for better maintainability:

  1. api-services.js (440 lines)

    • All external API integrations (Bright Data, Gemini AI, Mapbox)
    • Exports: CONFIG, model, API functions
  2. cache-manager.js (265 lines)

    • Data storage, caching, and file management
    • Handles query cache, URL cache, and file operations
  3. apartment-finder.js (653 lines)

    • Main orchestration logic and CLI interface
    • Imports from both modules above

Benefits:

  • βœ… Clear separation of concerns
  • βœ… Easy to test individual components
  • βœ… Better maintainability (vs. 1358-line monolith)
  • βœ… Faster to locate specific functionality

See MODULE_STRUCTURE.md for detailed documentation!

Data Files Explained

  • query_cache.json: Stores previous search queries with normalized intents and apartment IDs (enables intelligent query reuse)
  • apartments_index.json: Master list of all scraped apartments with their unique IDs
  • apt_XXXXXXXX.md: Complete raw Markdown from the apartment listing page (cleaner than HTML!)
  • apt_XXXXXXXX.json: Extracted metadata including:
    • Basic info: address, price, URL, scrape timestamp
    • Distance data: coordinates, walking distance/time (if Stage 3 used)
    • Management analysis: company name, score, rating, summary, strengths, concerns (if Stage 4 used)
  • apt_XXXXXXXX__distance.json: (Legacy) Distance calculations (now stored in main JSON)

πŸ”§ Configuration

Adjusting Search Results

In apartment-finder.js, modify the searchGoogle() call:

const serpResults = await searchGoogle(enhancedQuery, 10);  // Change 10 to desired number

Limiting Sites to Scrape

In apartment-finder.js, adjust the slice:

const topResults = filteredResults.slice(0, 5);  // Change 5 to desired number

HTML Truncation Limit

In apartment-finder.js, modify parseHtmlWithAI():

const maxLength = 30000;  // Change to adjust token usage

πŸ› οΈ Troubleshooting

Issue: "Missing Bright Data API keys"

Solution: Ensure your .env file exists and has correct variable names

Issue: "SERP API returned 4xx/5xx"

Solution:

  • Verify your SERP Zone is active in Bright Data dashboard
  • Check your API key has sufficient credits
  • Ensure BRIGHTDATA_SERP_ZONE is correctly set

Issue: "Web Unlocker returned error"

Solution:

  • Verify your Web Unlocker Zone is active
  • Check you have credits available

Issue: "Gemini AI ERROR"

Solution:

  • Verify GOOGLE_API_KEY is correct
  • Ensure Generative Language API is enabled
  • Check you haven't exceeded API quota

Issue: "No listings extracted"

Solution:

  • Some sites are heavily JavaScript-rendered
  • Try different search queries
  • The site structure may not have clear listing information

⚑ Performance

Parallel Scraping (Stage 2)

Old Sequential Approach:

  • 10 apartments Γ— (3-5 sec each + 3 sec delay) = 60-80 seconds
  • 50 apartments = 5+ minutes 😴

New Parallel Approach:

  • 10 apartments scraped simultaneously = ~5-10 seconds ⚑
  • 50 apartments scraped simultaneously = ~10-15 seconds πŸš€
  • 6-8x faster overall!

How it works:

  1. Filter out cached apartments (instant)
  2. Send all uncached apartments to Bright Data at once
  3. Bright Data cloud processes them in parallel
  4. Save all results when complete

πŸ“ Notes

  • Cost Awareness:
    • Each SERP request and Web Unlocker request consumes Bright Data credits
    • Parallel processing doesn't increase cost - same # of requests, just faster!
    • Mapbox Geocoding API: Free tier includes 100,000 requests/month
    • Google Gemini: Generous free tier for testing
  • Rate Limits: Be mindful of API rate limits on Bright Data, Google AI, and Mapbox
  • HTML Complexity: Very JavaScript-heavy sites might not return full content in initial HTML
  • Token Limits: Large HTML files are truncated before sending to Gemini (100,000 chars limit)
  • Caching: Already-scraped apartments are skipped to save API costs
  • Unique IDs: SHA256-based deterministic IDs allow linking data from multiple sources
  • Distance Calculations: Uses walking routes via Mapbox Directions API for real-world distances
  • Ethical Scraping: Respect website terms of service and robots.txt

πŸ”„ Comparison: JavaScript vs Python

Feature JavaScript Python (Deprecated)
Setup βœ… Simple npm install ❌ Complex LangChain versions
Dependencies βœ… 3 packages ❌ 5+ packages with conflicts
API Compatibility βœ… Stable ❌ Breaking changes
Error Handling βœ… Straightforward ❌ Import errors
Performance βœ… Fast async/await βœ… Similar

πŸš€ Future Enhancements

  • Add caching to avoid re-scraping same URLs βœ… Implemented
  • Add filtering options (price range) βœ… Implemented
  • Add support for location-based radius searches βœ… Implemented with Mapbox
  • Generate unique IDs for data linking βœ… Implemented
  • Export results to CSV/Excel file
  • Implement pagination support to get more listings from each site
  • Add support for Bright Data Data Collector API for specific sites
  • Implement parallel scraping for faster execution βœ… Implemented
  • Add more filters (bedrooms, bathrooms, amenities)
  • Add email/SMS alerts for new matching listings
  • Add transit time calculations (bus, subway, etc.)

πŸ“„ License

ADREI (AI-Driven Real Estate Intelligence System) was developed for CAL HACKS 12.0.
This project is for educational and demonstration purposes.

πŸ™ Acknowledgments

  • Bright Data for providing powerful web scraping infrastructure
  • Google for Gemini AI API
  • Mapbox for geocoding and directions APIs
  • Node.js community for excellent async patterns

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors