💡 Inspiration

Every year, millions of non-residents, expats, and global investors face the same nightmare: navigating the maze of cross-border tax laws, compliance requirements, and regulatory frameworks across multiple countries.

The problem is real:

  • A simple question like "Can I buy property in the US as an Indian citizen?" requires understanding FIRPTA withholding, FBAR reporting, DTAA treaties, and state-specific regulations
  • Information is scattered across 100+ page government PDFs, regulator websites, and statutory documents
  • Generic AI chatbots hallucinate legal advice—dangerous when dealing with tax authorities
  • Existing solutions are either expensive (tax attorneys at $500/hr) or unreliable (uncited AI responses)

We built GlobalGate to solve this: an evidence-driven compliance advisor that delivers accurate, cited answers from verified government sources only, personalized to your exact situation.

🎯 What It Does

GlobalGate is a production-grade, evidence-driven compliance and investment intelligence platform that:

  1. Personalizes to your profile - Captures citizenship, residency, investor type, and investment interests during onboarding
  2. Searches verified sources only - Queries a knowledge base of official IRS publications, regulator documents, and statutory texts
  3. Delivers strictly cited answers - Every response is grounded in retrieved evidence with mandatory citations, trust rankings, and source links
  4. Never hallucinates - If evidence doesn't exist, it says so clearly rather than making things up

What Makes Us Different

Traditional AI GlobalGate
Answers from training data Answers from retrieved evidence only
Citations often hallucinated Citations validated against source IDs
No provenance tracking Full audit trail (URL, country, trust rank)
One-size-fits-all Personalized by citizenship & residency

Example query: "What withholding tax applies if I sell US real estate as an Indian citizen?"

GlobalGate response: Detailed breakdown of FIRPTA 15% withholding requirements, Form 8288 filing obligations, and treaty benefits—every bullet point cited from IRS Publication 515 with chunk ID, relevance score, and direct link.

🛠️ How We Built It

Frontend (Next.js 15 + TypeScript + Tailwind)

  • Multi-step onboarding - 5-screen flow capturing persona (citizenship, residency, investor type) and investment preferences
  • Real-time chat interface - Clean UI with auto-resizing textarea and streaming-style responses
  • Collapsible citations panel - Right-side drawer showing all sources with drill-down to individual document details
  • Zustand state management - Persists user context across sessions with localStorage
  • Smart country selection - Auto-selects citizenship/residency countries with visual badges

Backend (FastAPI + MongoDB + RAG Pipeline)

1. Knowledge Ingestion (Evidence First)

# Every document ingested with full provenance
{
    "url": "https://www.irs.gov/pub/irs-pdf/p515.pdf",
    "source_type": "tax_authority",  # regulator | statute | tax_authority
    "trust_rank": 1,                  # 1 = highest trust
    "country": "United States",
    "asset_class": ["tax", "compliance", "reporting"]
}
  • Ingests HTML & PDF documents from official sources
  • Extracts clean text with deterministic chunking
  • Embeds chunks using SentenceTransformers (MiniLM) for fast local inference
  • Stores full provenance for audit trails

2. Evidence-Based Querying

# Query includes user context for personalized retrieval
payload = {
    "persona": {"citizenship": "India", "residency": "India", "investor_type": "individual"},
    "countries": ["United States", "India"],
    "asset_class_any": ["real_estate", "tax", "compliance"],
    "source_type_any": ["tax_authority", "regulator", "statute"],
    "trust_rank_lte": 5,
    "strict_citations": True
}
  • Semantic similarity search via MongoDB Atlas Vector Search (with cosine fallback)
  • Multi-dimensional filtering: country, asset class, source type, trust rank
  • Returns ranked evidence chunks, not hallucinated answers

3. Answer Composition (Deterministic + LLM)

Two modes that both enforce citation rules:

Mode Use Case Hallucination Risk
Deterministic High-compliance, audit-required Zero
LLM-assisted Better reasoning, structured output Controlled*

*LLM mode validates every citation against retrieved evidence IDs—invalid citations are rejected and system falls back to deterministic mode.

Tech Stack Decisions

Layer Technology Why
Frontend Next.js 15, TypeScript, Tailwind Fast, type-safe, great DX
State Zustand + localStorage Simple, persistent, no boilerplate
API FastAPI (Python 3.11) Async, typed, production-ready
Database MongoDB Flexible schema for regulatory data
Vector Search Atlas Vector Search + Cosine fallback Native filtering, no separate vector DB
Embeddings SentenceTransformers (MiniLM) Lightweight, deterministic, local
LLM OpenAI (optional) Only for structured reasoning, strictly controlled

🚧 Challenges We Faced

1. Document Ingestion Quality

Government PDFs are notoriously difficult—tables break, formatting is inconsistent, legal language is dense. We built custom chunking strategies that preserve context across section boundaries while maintaining deterministic chunk IDs for citation tracking.

2. Citation Integrity

LLMs love to hallucinate citations. Our solution:

# Every LLM output is validated
for citation_id in llm_response.citations:
    if citation_id not in retrieved_evidence_ids:
        raise CitationValidationError()
        # Automatic fallback to deterministic mode

3. Multi-Jurisdiction Queries

A question about "Non-resident real estate investment" needs documents from BOTH India (FEMA/RBI regulations) AND the target country (IRS for US). We solved this by always including citizenship + residency + selected countries in every search.

4. Balancing UX and Compliance

Legal information is complex. We iterated on how to present:

  • Sectioned answers (Tax Obligations, Regulatory Requirements, Legal Restrictions)
  • Bullet points with inline citation references
  • Limitations disclaimer
  • Collapsible source panel (not inline) to keep chat readable

🎓 What We Learned

  • Evidence-first > Model-first - For compliance domains, retrieval from authoritative sources beats model fine-tuning. You can't fine-tune away hallucinations.

  • User context changes everything - The same question has completely different answers based on citizenship and residency. Personalization isn't optional.

  • Trust signals matter - Users need to see WHERE information comes from. Source type, trust rank, and relevance score build confidence.

  • Fallbacks are essential - MongoDB Atlas not available? Cosine fallback. LLM fails validation? Deterministic composer. Always have a path to a correct answer.

  • Government docs are surprisingly good - IRS publications, while dense, are comprehensive, authoritative, and free. They just need better interfaces.

🚀 What's Next

  • [ ] Document upload - Users upload their own tax documents for personalized analysis
  • [ ] Comparison mode - "Compare tax implications: US vs Singapore real estate"
  • [ ] Calculation engine - Compute actual withholding amounts, not just rates
  • [ ] Multi-language - Japanese, Mandarin, Spanish interfaces
  • [ ] Professional API - Access for tax consultants and wealth managers

🏆 Why GlobalGate Matters

Cross-border investing shouldn't require a $500/hour tax attorney for basic questions.

Current alternatives fail:

  • Google: Overwhelming, conflicting, outdated results
  • ChatGPT: Confident but uncited, potentially wrong
  • Tax attorneys: Accurate but expensive and slow

GlobalGate is the middle ground: accurate, cited, personalized, and accessible.

We're not replacing tax attorneys—we're giving everyone the same quality of preliminary research that used to require expensive professionals. Know what questions to ask before you pay for the consultation.

Evidence-driven. Citation-mandatory. Compliance-safe.

Built With

Share this project:

Updates