💡 Inspiration
Every day, millions of people get quietly wronged — a hidden subscription fee, a data breach, a defective product, a deceptive "free trial." Most never realize they have legal standing. And the attorneys who specialize in consumer protection usually find out months later, after a class has already been certified and the moment has passed.
The whole system runs on lag. We wanted to turn that lag into seconds.
⚖️ What it does
seconds.ai is an autonomous pipeline that reads public discourse the way a plaintiff attorney wishes they could — continuously, and at scale:
- Listens — agents search Reddit and the open web for consumer complaints in real time.
- Detects patterns — a single angry post isn't a lawsuit; a cluster is. We group complaints by (company × harm) and count distinct complainants — the legal test of numerosity.
- Ranks — a Pioneer-tuned model scores each complaint for legal relevance and tags the likely statute.
- Delivers — qualified, cited leads go to the right law firm, every claim traceable back to its source post.
🔧 How we built it
- Ingestion — Guild.ai agents run on a schedule and pull complaints via Firecrawl (Reddit + the open web), with a Reddit RSS feed as a fallback source. A lightweight enrichment pass tags company, complaint type, dollar-harm, and a legal-signal score.
- Storage — ClickHouse Cloud holds everything with full provenance:
raw_posts,leads,rankings, and aningest_runslog. Every row traces back to a source URL and the run that surfaced it. - The product view — a ClickHouse view,
case_signals, rolls individual complaints into class-action candidates. For each(company, complaint_type)it computes distinct complainants, 7/30-day velocity, an evidence trail, and a viability score:
$$\text{case_score} = 0.5 \cdot \min\left(\frac{n_{\text{complainants}}}{10},\ 1\right) + 0.3 \cdot \bar{s}{\text{signal}} + 0.2 \cdot \min\left(\frac{n{\text{7d}}}{5},\ 1\right)$$
- Handoff — a FastAPI service exposes
/leads,/cases, a JSONL export for fine-tuning, and a/rankwrite-back. Pioneer reads apostsview and writes scores into arankingstable; those scores roll straight back up intocase_signals.
🧗 Challenges we ran into
- Reddit fought back. In 2026, Reddit blocks its public JSON API from datacenter IPs (
403). We found the RSS/Atom feeds still respond, then moved to Firecrawl's managed proxy for reliable Reddit and open-web coverage. - Agents live in a sandbox. Guild coded agents can't make raw
fetch()calls — by design, they reach the outside world only through sanctioned connector tools. That sent us back to rebuild the agent around Guild's connectors instead of direct HTTP. - Defining "a case." Our first scoring counted posts — but 50 posts from one person isn't a class. We rebuilt the aggregation around distinct complainants (
uniqExact(author)); numerosity is the whole game. - ClickHouse Cloud quirks. Engines silently map to their
Shared*variants,FINALis required to dedupReplacingMergeTreereads, and the table alias must come beforeFINALin a join — small things that cost real debugging time.
📚 What we learned
- A class action's legal tests — numerosity, commonality, a deep-pocket defendant, a statute — map almost one-to-one onto a real-time aggregation query. ClickHouse turned out to be the perfect engine for exactly that.
- The right division of labor is a recall-biased heuristic pre-score feeding a learned ranker: cast a wide net, then let the model sharpen it.
- Modern agent platforms sandbox the network for safety — you build with their connectors, not around them.
🚀 What's next
- An LLM entity-extractor for sharper company/issue attribution.
- The full Guild connector-based autonomous loop, end to end.
- Composio email to firms + Senso-published citation pages per case.
- Reddit OAuth and more sources (news, forums, complaint boards).
Built With
- claude
- clickhouse
- guild
Log in or sign up for Devpost to join the conversation.