💡 Inspiration

Every day, millions of people get quietly wronged — a hidden subscription fee, a data breach, a defective product, a deceptive "free trial." Most never realize they have legal standing. And the attorneys who specialize in consumer protection usually find out months later, after a class has already been certified and the moment has passed.

The whole system runs on lag. We wanted to turn that lag into seconds.

⚖️ What it does

seconds.ai is an autonomous pipeline that reads public discourse the way a plaintiff attorney wishes they could — continuously, and at scale:

  1. Listens — agents search Reddit and the open web for consumer complaints in real time.
  2. Detects patterns — a single angry post isn't a lawsuit; a cluster is. We group complaints by (company × harm) and count distinct complainants — the legal test of numerosity.
  3. Ranks — a Pioneer-tuned model scores each complaint for legal relevance and tags the likely statute.
  4. Delivers — qualified, cited leads go to the right law firm, every claim traceable back to its source post.

🔧 How we built it

  • Ingestion — Guild.ai agents run on a schedule and pull complaints via Firecrawl (Reddit + the open web), with a Reddit RSS feed as a fallback source. A lightweight enrichment pass tags company, complaint type, dollar-harm, and a legal-signal score.
  • StorageClickHouse Cloud holds everything with full provenance: raw_posts, leads, rankings, and an ingest_runs log. Every row traces back to a source URL and the run that surfaced it.
  • The product view — a ClickHouse view, case_signals, rolls individual complaints into class-action candidates. For each (company, complaint_type) it computes distinct complainants, 7/30-day velocity, an evidence trail, and a viability score:

$$\text{case_score} = 0.5 \cdot \min\left(\frac{n_{\text{complainants}}}{10},\ 1\right) + 0.3 \cdot \bar{s}{\text{signal}} + 0.2 \cdot \min\left(\frac{n{\text{7d}}}{5},\ 1\right)$$

  • Handoff — a FastAPI service exposes /leads, /cases, a JSONL export for fine-tuning, and a /rank write-back. Pioneer reads a posts view and writes scores into a rankings table; those scores roll straight back up into case_signals.

🧗 Challenges we ran into

  • Reddit fought back. In 2026, Reddit blocks its public JSON API from datacenter IPs (403). We found the RSS/Atom feeds still respond, then moved to Firecrawl's managed proxy for reliable Reddit and open-web coverage.
  • Agents live in a sandbox. Guild coded agents can't make raw fetch() calls — by design, they reach the outside world only through sanctioned connector tools. That sent us back to rebuild the agent around Guild's connectors instead of direct HTTP.
  • Defining "a case." Our first scoring counted posts — but 50 posts from one person isn't a class. We rebuilt the aggregation around distinct complainants (uniqExact(author)); numerosity is the whole game.
  • ClickHouse Cloud quirks. Engines silently map to their Shared* variants, FINAL is required to dedup ReplacingMergeTree reads, and the table alias must come before FINAL in a join — small things that cost real debugging time.

📚 What we learned

  • A class action's legal tests — numerosity, commonality, a deep-pocket defendant, a statute — map almost one-to-one onto a real-time aggregation query. ClickHouse turned out to be the perfect engine for exactly that.
  • The right division of labor is a recall-biased heuristic pre-score feeding a learned ranker: cast a wide net, then let the model sharpen it.
  • Modern agent platforms sandbox the network for safety — you build with their connectors, not around them.

🚀 What's next

  • An LLM entity-extractor for sharper company/issue attribution.
  • The full Guild connector-based autonomous loop, end to end.
  • Composio email to firms + Senso-published citation pages per case.
  • Reddit OAuth and more sources (news, forums, complaint boards).

Built With

  • claude
  • clickhouse
  • guild
Share this project:

Updates