team 1 - seconds.ai

💡 Inspiration

Every day, millions of people get quietly wronged — a hidden subscription fee, a data breach, a defective product, a deceptive "free trial." Most never realize they have legal standing. And the attorneys who specialize in consumer protection usually find out months later, after a class has already been certified and the moment has passed.

The whole system runs on lag. We wanted to turn that lag into seconds.

⚖️ What it does

seconds.ai is an autonomous pipeline that reads public discourse the way a plaintiff attorney wishes they could — continuously, and at scale:

Listens — agents search Reddit and the open web for consumer complaints in real time.
Detects patterns — a single angry post isn't a lawsuit; a cluster is. We group complaints by (company × harm) and count distinct complainants — the legal test of numerosity.
Ranks — a Pioneer-tuned model scores each complaint for legal relevance and tags the likely statute.
Delivers — qualified, cited leads go to the right law firm, every claim traceable back to its source post.

🔧 How we built it

Ingestion — Guild.ai agents run on a schedule and pull complaints via Firecrawl (Reddit + the open web), with a Reddit RSS feed as a fallback source. A lightweight enrichment pass tags company, complaint type, dollar-harm, and a legal-signal score.
Storage — ClickHouse Cloud holds everything with full provenance: raw_posts, leads, rankings, and an ingest_runs log. Every row traces back to a source URL and the run that surfaced it.
The product view — a ClickHouse view, case_signals, rolls individual complaints into class-action candidates. For each (company, complaint_type) it computes distinct complainants, 7/30-day velocity, an evidence trail, and a viability score:

$$\text{case_score} = 0.5 \cdot \min\left(\frac{n_{\text{complainants}}}{10},\ 1\right) + 0.3 \cdot \bar{s}{\text{signal}} + 0.2 \cdot \min\left(\frac{n{\text{7d}}}{5},\ 1\right)$$

Handoff — a FastAPI service exposes /leads, /cases, a JSONL export for fine-tuning, and a /rank write-back. Pioneer reads a posts view and writes scores into a rankings table; those scores roll straight back up into case_signals.

🧗 Challenges we ran into

Reddit fought back. In 2026, Reddit blocks its public JSON API from datacenter IPs (403). We found the RSS/Atom feeds still respond, then moved to Firecrawl's managed proxy for reliable Reddit and open-web coverage.
Agents live in a sandbox. Guild coded agents can't make raw fetch() calls — by design, they reach the outside world only through sanctioned connector tools. That sent us back to rebuild the agent around Guild's connectors instead of direct HTTP.
Defining "a case." Our first scoring counted posts — but 50 posts from one person isn't a class. We rebuilt the aggregation around distinct complainants (uniqExact(author)); numerosity is the whole game.
ClickHouse Cloud quirks. Engines silently map to their Shared* variants, FINAL is required to dedup ReplacingMergeTree reads, and the table alias must come before FINAL in a join — small things that cost real debugging time.

📚 What we learned

A class action's legal tests — numerosity, commonality, a deep-pocket defendant, a statute — map almost one-to-one onto a real-time aggregation query. ClickHouse turned out to be the perfect engine for exactly that.
The right division of labor is a recall-biased heuristic pre-score feeding a learned ranker: cast a wide net, then let the model sharpen it.
Modern agent platforms sandbox the network for safety — you build with their connectors, not around them.