ModRadar

Pattern detection layer for Reddit mods. Catches the spam edits, mod collisions, and coordinated link campaigns that single item review keeps missing.

Inspiration

The seed was an ACM CHI 2026 paper, Understanding How Reddit Moderators Use the Modqueue. They surveyed 110 mods across 408 subreddits and the numbers were rough. 74.5% had experienced collisions where two mods unknowingly acted on the same item. 84% routinely left the queue mid review to fetch context: thread, user page, modlog. One mod (P019 in the paper) called the missing tool a "firefighter radar," something that could show patterns and clumps of alarms before you waste time on each one in isolation.

Around the same time I was reading an r/ModSupport thread (~95 upvotes, admin acknowledged) about a spam technique where the author posts something benign, waits a few days, then edits in an affiliate or phishing link. AutoModerator has no time window operator, so there is no way to write a rule like "flag edits more than 24 hours after submission." Bardfinn confirmed the gap in the same thread. spamlinkflagger handles comments but the maintainer publicly said post edit coverage was out of scope.

Then in March 2026, Reddit banned bulk ban bots that worked on community association. Mods who relied on those workflows had a behavioral defense gap with no replacement.

Three real problems, all with evidence, none with a Devvit app addressing them. That felt like a project.

What it does

ModRadar is a Devvit Web app that installs on a subreddit and runs four things in parallel.

Edit Radar. Snapshots every post and comment body on submission (30 day TTL in Devvit Redis). On every edit it diffs the URL set, scores any newly added URLs, and writes an alert if the score crosses your threshold. Optionally calls reddit.remove() if you opt in by setting an auto remove threshold. Scoring blends shortener resolution with caching, suspicious TLDs (.xyz, .top, .click), suspicious shape (long SLDs, digit runs, double dash), prior reports per domain, and Google Safe Browsing if you provide a key. A per subreddit editWindowHours setting addresses the AutoMod gap directly. "Only flag edits older than 24 hours" is one number in a form.

Collision Shield. When a mod clicks "ModRadar: review lock" on a queue item, ModRadar acquires a 5 minute Redis lock and broadcasts on a Devvit Realtime channel. Another mod clicking the same item sees a collision toast naming the first mod and when they started. The custom post dashboard shows all active locks with a pulsing green dot, updated live, no polling. The dashboard heartbeats locks every 60 seconds for the owning mod so a long review does not expire. Closing the tab releases the lock via navigator.sendBeacon. A scheduler tick cleans up orphans every 10 minutes.

Cluster Radar. A */5 min scheduler runs three clustering passes over the last 24 hours of snapshots: shared domain, shared author, and time window burst (a sliding 10 minute window where items share a domain). Each cluster gets a risk score blending group size, member URL risk hints, and time density. The dashboard shows clusters as cards sorted by risk, color coded, with "Remove all" and "Dismiss" buttons. One click bulk action goes through reddit.remove and lands in the modlog under the app account.

Agent Layer. Heuristics handle the obvious cases at near zero cost and zero latency, but they fall apart on borderline scores in [0.3, 0.7]. That is the band where false positives erode mod trust and false negatives let spam through. So I added an LLM adjudicator that only fires inside that band, using Gemini 3.5 flash via LangChain v1. It reads the body before and after the edit, the heuristic signals, and the added URLs, then returns a structured JSON verdict (spam, legit, unclear) with reasons. A legit verdict at high confidence suppresses the alert. A spam verdict at high confidence promotes it. The agent can never push a score past the auto remove threshold on its own, so a moderator is always the one who decided "yes, auto remove at 0.7." A separate narrator (Sonnet 4.6) reads cluster contents and writes a one to two sentence summary plus a campaign type tag (crypto_scam, affiliate_spam, engagement_farming) and a recommended action. The dashboard renders these on the cluster card so a mod sees "3 alt accounts <30d old promoting a fake Coinbase clone via bit.ly redirects, posted in 8 min" instead of just "3 items linking to coinbase-secure.xyz."

Everything is per subreddit. Settings include three module toggles, agentMode of off | borderline | always, edit window in hours, the alert and auto remove thresholds, and the cluster minimum group size.

How we built it

Devvit Web with Hono on the server, plain TypeScript and Vite on the client, Devvit Redis for state, Devvit Realtime for live channels, Devvit Scheduler for cron jobs. Anthropic via LangChain v1 (langchain + @langchain/google-genai + @langchain/core) for the agent layer. Zod for I/O validation throughout. Vitest for tests.

Build order tracked the dependency graph:

src/core/diff-engine.ts first, because everything downstream needs URL extraction, body hashing, set diffs, and edit window arithmetic. Pure functions, 17 vitest assertions.
src/core/redis-schema.ts second, defining every key namespace upfront. All keys prefixed mr:{subredditId}: for per sub isolation, with mr:cache:* for content derived caches that are sub agnostic.
Edit Radar end to end. Triggers wire to handleSubmit and handleUpdate, which snapshot/diff/score and conditionally write alerts.
Collision Shield using Devvit Realtime. Channel pattern modradar-{subredditId}-reviewing, with review-started | review-extended | review-ended event types.
Cluster Radar. The scheduler scans up to 500 recent items from a sorted set, runs three passes, and writes results to Redis. Reddit API enrichment happens only for items that survive clustering, dedup'd across overlapping clusters. That kept the scan inside Devvit's 30 second request budget even on large queues.
The custom post dashboard with three panels and realtime subscriptions, plus a settings form covering every per sub knob.
Agent Layer last. LangChain v1's createAgent with responseFormat: zodSchema was the right shape since it uses Anthropic's native structured output and avoids a tool call round trip. Two model factories, two prompt files, a small middleware for timing and a daily budget guard, two cache layers in Redis. Plumbing only is tested (5 prompt tests + 8 Zod schema tests), since LLM verdicts are non deterministic. Agent calls return null on any failure and the heuristic verdict stays authoritative.

Each module ended with the same gate: npm run type-check && npm run lint && npm test && npm run build. Nothing merged that broke any of the four.

Accomplishments that we're proud of

The whole thing is one Devvit app. No external backend, no third party DB, no auth layer. Runs entirely inside Devvit's hosted runtime and stores everything in Devvit Redis. A sub installs ModRadar, picks a sensitivity, and the rest is automatic.

Test coverage on the pure modules is honest. 41 vitest assertions across four suites covering diff engine, clustering, agent prompts, and agent schemas. Type check, lint, and build all clean. Failure paths in Module 4 are explicit: no API key, mode off, score out of band, timeout, Zod failure, budget exceeded all return null and let the heuristic verdict stand.

Realtime collision detection was the demo moment that surprised me. Two browsers on two test accounts, one clicks the review lock menu, the other sees a pulsing dot appear in under a second. No polling, no refresh, no reload. The CHI paper called this "subtle and unreliable" in current tooling. Pulsing badges across all open dashboards in real time is the opposite of subtle.

Policy compliance got designed in, not retrofitted. No PII at rest (snapshots hold t2_* opaque IDs, not usernames). No cross subreddit user tracking (clustering is in sub only, the one cross cutting signal is per domain). TTLs everywhere transient (snapshots 30d, alerts 30d, clusters 1h, locks 5m). onPostDelete and onCommentDelete evict snapshot + editlog + alert keys for the affected item. The agent gets only the body, added URLs, account age, signal tags, and heuristic score. No usernames, no mod identities, no subreddit name. The March 2026 ban bot policy line is held by behavior, not by accident.

What's next for ModRadar

The reporter correlation clustering pass (Pass D) is the most requested feature from the CHI paper that I did not ship in time. It needs onPostReport and onCommentReport trigger ingestion, a persistent per item reporter set in Redis, and a fourth pass grouping items by overlapping reporters. That detects brigaded reporting and rounds out the "firefighter radar" idea properly.

Account age signal in the agent input. The agent currently gets authorAgeDays: null because the trigger payload exposes only id and name. A separate reddit.getUserById(authorId) call would populate createdAt. The prompt already accepts it, the field is just unfilled.

Snooze and mute. Dismissed clusters come back on the next scan if the underlying items are still in window. A "dismiss this cluster + snooze this domain for 24 hours" action would cut repeated work.

Test mocking for url-scorer.ts and cluster-radar.ts. Both have side effects on Redis, Reddit API, and fetch, so they are intentionally not unit tested today. A mocking layer would unlock another 20+ assertions and let me regression test the scoring weights when they change.

Submission polish: app icon, listing copy on developers.reddit.com, a 60 second demo video walking through the four scenarios (install, spam edit alert, collision, cluster bulk action), and the Devpost writeup itself. Bumping package.json from 0.0.0 to 1.0.0 and fixing two scaffold lint warnings in nuke.ts before devvit publish runs the production gate.

Longer term, the agent layer's narration is the part I most want to push on. Right now it summarizes individual clusters. The natural next step is cross cluster pattern memory: if the same coinbase-secure.xyz shape shows up across three subs running ModRadar over a week, the agent narration can flag it as a known campaign rather than treating each instance as new. The hard part is doing that without violating the no cross sub user tracking guarantee. Domain reputation is already cross sub and policy safe; cluster narration metadata might be allowed at the same level if scoped to domains, not authors. Worth thinking through carefully before shipping.

Built With

Updates

Devesh . started this project — May 27, 2026 05:08 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.