The FDE: The AI Forward Deployed Engineer

Inspiration

Every enterprise client takes 2 weeks and 40+ hours of engineer time to onboard. An FDE manually scrapes their legacy portal, deciphers cryptic column names like cust_lvl_v2 and acct_bal, maps them to the platform's schema, and writes transformation scripts. Multiply that by dozens of clients, and you have the most expensive, repetitive bottleneck in enterprise software.

We asked: What if we could clone the best FDE? An agent that scrapes legacy portals, reasons about cryptic schemas, remembers past mappings across clients, and picks up the phone and calls you when it's stuck.

By the third client, the phone never rings. That's continual learning.

What It Does

The FDE is an autonomous agent that fully automates client data onboarding — and gets smarter with every client it onboards.

  1. Scrapes — Logs into legacy client portals (yes, the ones with Comic Sans and "Best viewed in IE 6.0") and extracts raw CSV data using an AGI Inc autonomous browser agent. The dashboard shows a live viewport of the browser navigating in real-time.

  2. Reasons — Runs a three-tier cascade: first checks ChromaDB vector memory for past mappings (free, instant), then researches unknown terms via You.com ("What does cust_lvl_v2 mean in CRM data?"), and only then sends enriched context to Gemini for structured JSON mapping with per-field confidence scores. Columns matched from memory skip the LLM entirely — making the agent faster and cheaper over time.

  3. Asks for Help — When confidence is low, the agent calls your phone via Plivo. Not a Slack notification — an actual phone call. It explains the ambiguity, you press 1 (yes) or 2 (no), and that confirmation gets stored as a high-confidence learning in memory forever.

  4. Deploys — Transforms the mapped data through a schema-aware type coercion engine (handling 22 boolean variants, 4 date formats, currency stripping) and deploys it to Google Sheets via Composio.

  5. Remembers — Every successful mapping is embedded into ChromaDB as an enriched document ("cust_id maps to customer_id"). The next client's customer_id_v2 gets matched via cosine similarity — no human needed.

The Learning Curve (Proven in Code)

Phase 1: Novice Phase 2: Intermediate Phase 3: Expert
Client Acme Corp Globex Inc Initech Ltd
Memory Hits 0 4+ 5+
Human Calls 1 0-1 0
Learning All AI-mapped Reuses Phase 1 knowledge Full memory-driven

Our integration tests mathematically verify this curve: summary_b["from_memory"] > summary_a["from_memory"] and summary_c["human_confirmed"] <= summary_a["human_confirmed"].

Architecture

Six modular components coordinated by the FDEAgent orchestrator:

Architecture Diagram

How We Built It

Gemini — The Brain (src/brain.py)

Gemini doesn't just receive text and return text. We use structured JSON output with response_mime_type="application/json" and a full response_schema enforcing the exact mapping structure we expect. Temperature is set to 0.1 for deterministic reasoning. The system instruction includes calibrated confidence thresholds — HIGH when the mapping is obvious, MEDIUM when plausible, LOW when uncertain.

The key insight: the Brain receives enriched context. Before Gemini sees a column, we inject You.com research results and ChromaDB memory hits into the prompt. The LLM reasons with evidence, not just its training data.

response = self._client.models.generate_content(
    model=Config.GEMINI_MODEL,
    config=types.GenerateContentConfig(
        system_instruction=SYSTEM_INSTRUCTION,
        response_mime_type="application/json",
        response_schema=MAPPING_SCHEMA,
        temperature=0.1,
    ),
    contents=prompt,
)

AGI Inc — The Hands (src/browser.py)

Full browser session lifecycle: creates a named session via POST /sessions, sends natural language navigation instructions ("Log in and download the customer CSV"), and polls for results with a 30-step retry loop. CSV extraction uses a content heuristic that validates header structure, field count consistency, and field lengths. The dashboard shows a live VNC stream of the browser navigating the portal in real-time.

You.com — The Researcher (src/research.py)

Domain-aware semantic search. When the Brain encounters cust_lvl_v2, the Researcher queries: "What does 'cust_lvl_v2' typically mean in CRM data?" and returns the top 5 snippets. Results are cached in-memory to prevent redundant API calls. The research context is injected into Gemini's prompt as a dedicated RESEARCH CONTEXT section, giving the LLM real-world grounding.

Plivo — The Teacher (src/teacher.py + server/webhooks.py)

A thread-safe webhook-driven voice loop. When the agent encounters a low-confidence mapping, it creates a Plivo call with the question embedded as query params. The answer webhook generates Plivo XML with a GetInput element for DTMF digit collection (5-second timeout). The input webhook captures the keypress (1=yes, 2=no), writes to a shared _pending_responses dict protected by threading.Lock, and the agent polls with 1-second intervals until the human responds. Confirmed mappings are stored as high-confidence learnings.

Composio — The Deployer (src/tools.py)

Two-step deployment: CREATE a new Google Sheet via Composio, then BATCH_UPDATE with the fully transformed dataset. But before deployment, the data passes through a schema-aware type coercion pipeline: booleans are normalized from 22 string variants (Y/N/yes/no/true/false/active/inactive/1/0...), dates from 4 formats to ISO 8601, numbers are stripped of $ and commas. Each transformation tracks per-row validation warnings.

ChromaDB — The Memory (src/memory.py)

This is the heart of the continual learning system. Two key innovations:

  1. Enriched document embedding: Instead of storing bare column names (which embed poorly), we store "cust_id maps to customer_id". This gives the embedding model semantic bridging context, dramatically improving cosine similarity for short abbreviated text.

  2. Usage-based trust: Mappings with fewer than 2 successful uses get a +0.01 distance penalty. The agent requires corroboration before fully trusting a learned pattern — preventing garbage-in-garbage-out from a single bad mapping.

# Enriched storage for better semantic matching
doc_text = f"{source_column} maps to {target_field}"

# Usage-based trust penalty
if uses < 2:
    adjusted_distance += 0.01

Distance threshold: 0.35 cosine distance with HNSW indexing for sub-millisecond lookup.

Real-Time Dashboard (server/)

A Flask + SSE streaming dashboard with:

  • Pub-sub event bus with thread-safe subscriber queues and 15-second keepalive heartbeats
  • Event history replay for late-joining browser tabs
  • Live pipeline step visualization (Scrape → Analyze → Call → Learn → Deploy)
  • Embedded browser viewport showing AGI agent navigation
  • Phone call overlay with ringing animation
  • Memory Bank panel showing STORED/RECALLED states in real-time
  • Results tab with 3-phase side-by-side comparison showing the learning curve
  • Setup tab for configuring target schema and client portal URLs

Challenges We Ran Into

  • The Short-Text Embedding Problem: Column names like cust_id are too short for standard embeddings (all-MiniLM-L6-v2) to differentiate meaningfully. Bare string similarity gave poor results. We solved this by storing enriched documents ("cust_id maps to customer_id") and querying with "cust_id maps to" — giving the embedding model enough semantic signal to work with.

  • Confidence Calibration: Getting the LLM to reliably score its own confidence was iterative. We landed on a system instruction with explicit calibration rules and temperature=0.1, paired with structured JSON output to eliminate parsing ambiguity.

  • Thread-Safe Voice Loop: Plivo webhooks arrive asynchronously on a Flask server while the agent polls for responses synchronously. We used a shared dict with threading.Lock and a polling loop with 1-second intervals and 60-second timeout to bridge the async/sync gap.

  • Data Ambiguity: "Active" could mean boolean 1/0, Y/N, or a status string. The Research module (You.com) resolves this by looking up real-world context before the LLM reasons — grounding decisions in evidence rather than guessing.

Accomplishments We're Proud Of

  • Proven Continual Learning: Not a claim — it's verified by integration tests that assert decreasing human calls and increasing memory hits across 3 sequentially onboarded clients. The agent gets faster, cheaper, and more autonomous with every client.

  • Voice-First Escalation: When the agent is uncertain, it calls your phone, explains the problem, and learns from a single keypress. By Phase 3, the phone never rings — the agent has learned enough to handle everything autonomously.

  • End-to-End Autonomy: From a raw portal URL to a populated Google Sheet — scraping, reasoning, researching, calling, learning, transforming, and deploying — with zero client-specific code.

  • Modular, Extensible Architecture: Each component is independently swappable. The system degrades gracefully — if Gemini is down, the agent still has its vector memory; if Plivo fails, it auto-accepts at medium confidence and flags for review. The system never stops, it just becomes more cautious.

  • 201 Automated Tests: Covering ChromaDB similarity search, type coercion edge cases (21 boolean variants), thread-safe webhook handling, and a full end-to-end integration test that runs the complete 3-client pipeline and verifies the learning curve.

What We Learned

  • Memory beats fine-tuning for this problem. We chose retrieval-augmented memory over model fine-tuning because it works from the first example (not thousands), it's explainable (you can see which past mapping influenced the decision), and it's auditable (you can inspect and correct the memory). This is continual learning in the practical sense — the system improves with every interaction.

  • Context transforms LLM accuracy. Gemini alone gets ~70% of mappings right. Gemini + You.com research + ChromaDB memory gets ~95%. The three-tier cascade (Memory → Research → LLM) is the core architectural insight.

  • The phone call is the trust builder. AI agents that silently make decisions don't earn trust. An agent that calls you when it's unsure — and then never asks again — earns trust through transparency and demonstrated learning.

What's Next

  • Complex Transformations: Handling data cleaning logic (formatting dates, parsing addresses, splitting full names) using LLM-generated transformation code.
  • One-to-Many Mappings: Supporting schemas where a single source column maps to multiple target fields (e.g., full_namefirst_name + last_name).
  • Broader Integrations: Salesforce, HubSpot, and SQL databases via Composio's expanding toolset.
  • Multi-Modal Inputs: Allowing the agent to read PDF contracts or data dictionaries to understand schemas before scraping.
  • Self-Correction: Detecting deployment errors and attempting fixes automatically, closing the feedback loop.

Built With

python gemini agi-inc you-com plivo composio chromadb flask remotion pytest

Built With

Share this project:

Updates