Errors with threaded replies by dawkaka · Pull Request #326 · monoscope-tech/monoscope

dawkaka · 2026-02-24T12:47:46Z

Summary

Two major features: Error Tracking with Sentry-style fingerprinting and spike detection, and Log Pattern Detection using the DRAIN algorithm with baseline volume tracking. Both integrate with a new threaded notification system for Slack, Discord, Email, and PagerDuty.

Error Tracking System

Fingerprinting: Normalizes stack traces per runtime (Go, Node.js, Python, Java, PHP, .NET) into canonical module|function format; normalizes messages by replacing UUIDs, IPs, emails, timestamps with placeholders; uses XXHash for deterministic fingerprints
State machine: New → Escalating → Ongoing → Resolved (with Regressed for reactivation)
Baseline learning: Median + MAD (Median Absolute Deviation) — robust against outliers, establishes after 24h of data
Spike detection: z-score threshold >3σ AND ≥5 absolute increase above mean (avoids noise on low-volume errors)
Subscriptions: Per-error subscribe with configurable notifyEveryMinutes throttle (default 30min)
Assignment: Errors can be assigned to team members with notification on assignment

Log Pattern Detection

DRAIN algorithm: 3-level tree (root → token count → first token → log groups) with similarity threshold 0.5 and LRU eviction at 50 groups per leaf
Baseline tracking: Learning → Established state; hourly volume mean + MAD stored in log_pattern_hourly_stats
Spike detection: Rate-change alerts using baseline comparison (>3σ + absolute threshold)
Source fields: Patterns extracted from body, summary, url_path (with url_path noise suppression)
State machine: New → Acknowledged → Ignored; auto-acknowledges stale patterns after 30 days

Notification System

Slack threading: Sends first alert, stores thread_ts, subsequent updates reply in same thread
Discord threading: Stores message_id for reply-to references
Email templates: New templates for log pattern detection, log pattern rate changes, runtime errors, escalating errors, regressed errors, and error spikes
Throttling: Respects per-subscription frequency to prevent alert fatigue

Background Jobs

Job	Purpose
`ErrorBaselineCalculation`	Calculate baseline stats from error_events hourly buckets
`ErrorSpikeDetection`	Detect spikes, create issues, send alerts
`NewErrorDetected`	Create initial issue for new error fingerprint
`ErrorAssigned`	Notify on error assignment
`FiveMinuteLogPatternExtraction`	Run DRAIN extraction from recent logs
`LogPatternPeriodicProcessing`	Create baseline issues for new patterns
`LogPatternHourlyProcessing`	Update hourly stats & detect rate spikes

Database Migrations

0033_error.sql: apis.errors and apis.error_events tables with fingerprint indexes, triggers, and error_escalating/error_regressed issue types
0034_error_notification_columns.sql: subscribed, notify_every_minutes, last_notified_at on errors
0035_error_thread_ids.sql: slack_thread_ts and discord_message_id on errors for threaded notifications

Note: Log pattern migrations (log_patterns, log_pattern_hourly_stats, issue column extensions) were already merged via log-patterns-chp PR as 0032_log_patterns.sql.

Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.

claude · 2026-02-28T18:29:29Z

Code Review

Great PR overall — the error tracking system is well-designed and the threaded notification changes are clean. The refactoring wins (DRY buildDrainTree, WrappedEnumSC JSON instances, BaselineState extraction) are all solid improvements. A few things worth addressing:

🔴 `ErrorPattern` mixed snake_case fields — entity column mismatch risk

-- ErrorPatterns.hs
, occurrences_1m :: Int -- snake_case fields match DB column names exactly
, occurrences_5m :: Int

The record is derived with FieldModifiers '[CamelToSnake]. When quietSnake processes a field that already contains underscores (e.g. occurrences_1m), the output depends on the casing library — it may pass through unchanged, but it's inconsistent with the rest of the codebase which uses camelCase and lets the modifier do the work. Safer to rename to occurrences1m / occurrences5m etc. and let CamelToSnake produce occurrences_1m / occurrences_5m — that's the established pattern everywhere else.

🟡 `goStdlibPrefixes` — use `Set` for O(log n) lookups

-- ErrorFingerprint.hs
goStdlibPrefixes :: [Text]
goStdlibPrefixes = ["runtime.", "syscall.", "net.", ...]  -- 55 entries

nonePrefix :: [Text] -> Text -> Bool
nonePrefix needles txt = all (not . (`T.isPrefixOf` txt)) needles

Every Go stack frame iterates all 55 prefixes. Data.Set (already in deps) drops this to O(log n):

import Data.Set qualified as S
goStdlibPrefixes :: S.Set Text
goStdlibPrefixes = S.fromList ["runtime.", ...]

isGoStdlib :: Text -> Bool
isGoStdlib line = any (`T.isPrefixOf` line) goStdlibPrefixes

🟡 `parseStackFrame` — runtime as `Text` is fragile

parseStackFrame runtime line
  | runtime == "go"                    -> parseGoFrame trimmed
  | runtime `elem` ["nodejs", "webjs"] -> parseJsFrame trimmed

A typo silently falls through to parseGenericFrame. The existing RequestDumps.SDKTypes enum (already imported in ErrorPatterns.hs) or a local ADT would give compile-time safety.

🟡 `computeErrorFingerprint` — service-less errors collide across services

fingerprintComponents =
  if | not (T.null normalizedStack) ->
       [projectIdText, fromMaybe "" mService, normalizedType, normalizedStack]
  ...
combined = T.intercalate "|" $ filter (not . T.null) fingerprintComponents

When mService = Nothing, the empty string is filtered out. Two errors from different services where both lack a service name will share a fingerprint. May be intentional but deserves a comment, since it affects how multi-service errors are grouped.

🟡 `bulkCalculateAndUpdateBaselines` — 6-argument parameter order is hard to audit

(pid, now, now, pid, now, pid)

Three pids and three nows in the params tuple make it easy to get the order wrong when the SQL CTE changes. A comment mapping each ? position to its value would help maintainability.

🟢 Good things in this PR

encodeEnumSC/decodeEnumSC extracted — eliminates repetition across all WrappedEnumSC instances
WrappedEnumSC gains ToJSON/FromJSON — removes the separate CustomJSON '[ConstructorTagModifier ...] deriving for NotificationChannel, QueryLibType, MonitorStatus, SeverityLevel, SpanStatus, SpanKind, LogPatternState etc. Nice sweep.
buildDrainTree — clean extraction removing two near-identical V.foldl' patterns
batchUpsertErrorPatterns — correct single-round-trip unnest approach; the HM.insertWith grouping to avoid ON CONFLICT DO UPDATE cannot affect row a second time is the right call
Unsafe.fromJust err.hash → hash :: Text — removes a footgun
Replacing _ <- ...; pass with void $ ... throughout is a nice cleanup

🔵 Minor

Models.Apis.ErrorPatterns still imports RequestDumps only for SDKTypes. Now that ATError lives in ErrorPatterns, consider moving SDKTypes to a more foundational module (or noting the coupling).
runtimeErrorVariantEmail takes three always-paired string args (heading, subjectPrefix, intro) — a small record could enforce the pairing, though it's readable as-is.
0034_error_notification_columns.sql and 0035_error_thread_ids.sql are mentioned in the PR description but their columns appear to be included inline in 0033_error.sql. The .cabal only registers 0033. If they're real separate files coming later, they'll need registering there too.

claude · 2026-02-28T18:36:24Z

Revision history for monoscope

0.1.0.0 -- YYYY-mm-dd

First version. Released on an unsuspecting world.

claude · 2026-02-28T18:36:59Z

Code Review — PR #326 (Error Tracking + Threaded Notifications)

Overall this is a solid, well-structured PR. The test suite is comprehensive (20 scenarios covering the full state machine, concurrency, cross-project isolation, and thread-ID persistence), the SQL uses efficient batch patterns throughout, and the WrappedEnumSC consolidation reduces boilerplate across multiple types. A few things worth addressing:

🐛 Potential Bug — Duplicate Notifications

BackgroundJobs.hs — notifyErrorSubscriptions

The JOIN:

JOIN apis.issues i ON i.project_id = e.project_id AND i.target_hash = e.hash
…
AND i.issue_type = ?

can return multiple rows per error pattern when more than one RuntimeException issue exists for the same target_hash (e.g. original issue + a later regressed issue). This means forConcurrently fires duplicate notifications and updateErrorPatternThreadIdsAndNotifiedAt races with itself on the same row. Fix with DISTINCT ON (e.id) / ORDER BY e.id, i.created_at DESC:

SELECT DISTINCT ON (e.id)
  e.id, e.error_data, e.state, i.id, i.title,
  e.slack_thread_ts, e.discord_message_id
FROM apis.error_patterns e
JOIN apis.issues i ON i.project_id = e.project_id AND i.target_hash = e.hash
WHERE …
ORDER BY e.id, i.created_at DESC

⚡ Performance — Sequential Spike Detection

BackgroundJobs.hs — detectErrorSpikes

forM_ is sequential. Each iteration does DB writes and potentially sends notifications. For projects with many error patterns this is a bottleneck — same pattern as notifyErrorSubscriptions which correctly uses forConcurrently. Switch to:

forConcurrently_ errRates \errRate -> ...

🔍 Design — 7 Positional `Text` Args

Pkg/ErrorFingerprint.hs line 565

computeErrorFingerprint :: Text -> Maybe Text -> Maybe Text -> Text -> Text -> Text -> Text -> Text
computeErrorFingerprint projectIdText mService spanName runtime exceptionType message stackTrace = …

Seven positional Text / Maybe Text arguments is a silent footgun — callers can swap runtime/exceptionType/message without the compiler noticing. A small input record (or even a type alias tuple) would make call sites self-documenting and safer.

🧹 Style — Named Thread-ID Pair

The (Maybe Text, Maybe Text) tuple representing (slackThreadTs, discordMessageId) is threaded through sendAlertToChannels, createAndNotifyErrorIssue, processNewError, and notifyErrorSubscriptions. A type alias:

type ThreadIds = (Maybe Text, Maybe Text)

or a small record would make all those signatures easier to read without adding code.

🧹 Succinctness — `cleanPythonFunction`

Pkg/ErrorFingerprint.hs line 317

cleanPythonFunction = flip (foldr (uncurry T.replace)) ([…] :: [(Text, Text)])

The double-flip/foldr encoding is harder to read than the equivalent eta-expanded form which is the same length:

cleanPythonFunction func = foldr (\(f, t) -> T.replace f t) func [("<lambda>", "lambda"), …]

✅ Good Stuff

Batch SQL throughout: unnest arrays in batchUpsertErrorPatterns and upsertErrorPatternHourlyStats avoid N+1 round-trips entirely.
bulkCalculateAndUpdateBaselines: single CTE with PERCENTILE_CONT + MAD is exactly right — no per-row Haskell computation needed.
{-# NOINLINE goStdlibPrefixes #-}: correct to prevent repeated allocation of the list CAF.
truncateHour in Haskell: avoids date_trunc per row in SQL (already applied in upsertErrorPatternHourlyStats).
~(subj, html) lazy irrefutable in notifyErrorSubscriptions: avoids building email content when only Slack/Discord is configured.
WrappedEnumSC JSON instances: adding ToJSON/FromJSON to the newtype derivation eliminates the separate DAE.CustomJSON derivation across ErrorState, BaselineState, LogPatternState, MonitorStatus, etc. — clean reduction.
Test coverage: the 20-scenario ErrorPatternsSpec is thorough. Especially good: concurrency test (5e), cross-project isolation (test 9), decay chain (7a), and thread-ID reuse (test 6).

claude · 2026-02-28T18:38:37Z

PR Review: Errors with Threaded Replies

claude · 2026-02-28T18:41:28Z

See review content posted below - full markdown review to follow

claude · 2026-02-28T18:43:17Z

test

claude · 2026-02-28T18:44:55Z

Revision history for monoscope

0.1.0.0 -- YYYY-mm-dd

First version. Released on an unsuspecting world.

claude · 2026-02-28T18:45:27Z

Revision history for monoscope

0.1.0.0 -- YYYY-mm-dd

First version. Released on an unsuspecting world.

claude · 2026-02-28T18:46:16Z

PR Review part 1 of 2: BUGS. (1) createOtelTraceWithExceptionAtTime has broken UUID encoding: UUID.toText returns 36 chars with dashes, so T.replicate (32-36) returns empty string, and B16.decode fails silently. traceIdBS/spanIdBS are always empty bytes. Fix: strip dashes with T.filter before padding. (2) insertErrorQueryAndParams in Anomalies.hs is dead code: processProjectErrors now delegates to batchUpsertErrorPatterns, nothing calls insertErrorQueryAndParams. Remove it. PERFORMANCE: (3) goStdlibPrefixes is a 50-entry [Text] list scanned per Go stack frame. Use HashSet Text and check only the package segment via fst (splitDotted funcPart) for O(1) lookup.

claude · 2026-02-28T18:46:30Z

PR Review part 2 of 2: EXPORTS: (4) sendSlackAlertWith/sendDiscordAlertWith are exported but documented as Internal -- either remove from exports or drop the doc comment. STYLE: (5) errorSubscriptionAction has 3 consecutive single-binding lets (isActive, notifyEvery, actionUrl) -- combine into one let block. (6) createAndNotifyErrorIssue uses Effectful.Reader.Static.ask when ask is imported unqualified -- use ask @Config.AuthContext. (7) anomalyDetailCore uses the ampersand-LambdaCase pattern only once -- case issue.issueType of matches the rest of the file. (8) Multiple new sites use whenJust ... Relude.when -- for_ projectM is cleaner and avoids the qualified import. GOOD: void $ replaces _ <- pass throughout; WrappedEnumSC gains ToJSON/FromJSON directly eliminating ~6 duplicate via CustomJSON derivations; colIdxMap via zip cols [0..] removes sync-drift bugs; Permissions ordering fix with Ord makes >= PEdit semantically correct; sortBy before groupBy in processIssuesEnhancement is the right defensive fix; buildDrainTree removes the duplicated fold pattern; batchUpsertErrorPatterns hash-grouping correctly avoids the ON CONFLICT duplicate-row error.

Add DISTINCT ON to notifyErrorSubscriptions to prevent duplicate notifications when multiple issues exist for the same error pattern. Switch detectErrorSpikes from sequential forM_ to forConcurrently. Remove redundant CustomJSON derivations on IssueType/RateChangeDirection.

claude · 2026-02-28T19:14:28Z

PR Review: Errors with threaded replies

Good overall structure. The fingerprinting algorithm is well-documented with doctests, the single-CTE baseline calculation is efficient, and the WrappedEnumSC JSON/DAE.CustomJSON consolidation is a clean DRY win. A few things worth addressing:

Potential bugs

Race condition in detectErrorSpikes (BackgroundJobs.hs)
The check-then-act for escalation is not atomic. Two concurrent ErrorSpikeDetection jobs for the same project can both see state != ESEscalating and create duplicate issues. Fix with a conditional DB update, proceeding only if rowsAffected > 0:

UPDATE apis.error_patterns SET state = 'escalating' WHERE id = ? AND state != 'escalating'

Permissions constructor order changed silently
Adding Ord derives ordering from constructor position (PView < PEdit < PAdmin). The string-based WrappedEnumSC serialisation is unaffected, but if any code compares permissions for authorisation via Ord, this is a silent semantic change. Worth a note.

Succinctness / package usage

Unbounded forConcurrently in detectErrorSpikes and notifyErrorSubscriptions
A project with hundreds of active errors will launch hundreds of concurrent alert-sending coroutines. Consider pooledForConcurrentlyN (from unliftio) or chunking to bound concurrency.

cleanPythonFunction opaque pointfree (Pkg/ErrorFingerprint.hs)
The flip + uncurry hides intent without saving lines. Prefer an explicit lambda: foldr ((old, new) -> T.replace old new) f [...].

Dead branch in extractGenericLineNumber
The second case branch (_ -> Nothing) is unreachable since T.splitOn always returns at least 1 element. Drop it.

occurrences_1m / occurrences_5m snake_case in a camelCase record
The ErrorPattern record uses snake_case for number-suffixed fields while all other fields are camelCase. Functionally correct (CamelToSnake is a no-op on already-snake fields) but inconsistent. A brief comment explaining why (CamelToSnake cannot produce _1m from camelCase) would clarify intent.

Minor

RequestDumps.ATError -- if the old type still exists alongside the new ErrorPatterns.ATError, worth removing to avoid confusion. Telemetry.hs was migrated but worth confirming nothing else imports the old definition.
batchUpsertErrorPatterns insertWith semantics -- HM.insertWith f new old keeps the new error and sums counts. Correct behaviour (dedup-by-hash, latest data wins), but a one-line comment would help.
PR description mentions separate 0034/0035 migrations that do not exist as separate files -- they are correctly consolidated into 0033. The description is just inaccurate.

Good changes worth calling out

Removing the _ <- ...; pass antipattern with void throughout
WrappedEnumSC now derives AE.ToJSON/AE.FromJSON, eliminating all the DAE.CustomJSON/DAE.ConstructorTagModifier/DAE.StripPrefix/DAE.CamelToSnake boilerplate on SeverityLevel, SpanStatus, SpanKind
buildDrainTree is a clean higher-order generalisation that reduces duplication across processBatch and fetchLogPatterns
{-# NOINLINE goStdlibPrefixes #-} + CAF is correct for the O(log n) Set.lookupLE prefix check

looksLikeJson used isInfixOf for "{" which matched replacement tokens like {integer}, {YYYY-MM-DDThh:mm:ss.sTZD}, causing tokenizeJsonLike to split them into separate tokens and break drain clustering.

claude · 2026-02-28T19:47:14Z

Code Review

Great feature addition — the error tracking state machine, log pattern detection, and notification threading all fit well into the existing architecture. Several cleanups here are excellent. Here are my observations:

Highlights

WrappedEnumSC unification across 7+ types (MonitorStatus, SeverityLevel, LogPatternState, NotificationChannel, QueryLibType, etc.) — replacing 2-line derive chains with a single derive is exactly the kind of GHC extension leverage we want.
void $ cleanup replacing _ <- ...; pass throughout is a welcome hygiene pass.
Drain.buildDrainTree abstraction eliminates the local processNewLog helper in two call sites cleanly.
pg-entity _select/_selectWhere in place of manual column lists in Anomalies.hs and Issues.hs is the right direction.
unnest batch upserts in batchUpsertErrorPatterns and upsertErrorPatternHourlyStats — single round-trip instead of N+1, good.
V.groupBy sort-before-group fix in processIssuesEnhancement (and same fix for errors in processOneMinuteErrors) was a real silent correctness bug. Good catch.
Removing QueryMonitorsTriggered and inlining monitor notification logic into evaluateQueryMonitor — simpler, removes dead code.

Issues / suggestions

1. Non-idiomatic snake_case field names in ErrorPattern

  , occurrences_1m :: Int -- snake_case fields match DB column names exactly
  , occurrences_5m :: Int
  , occurrences_1h :: Int
  , occurrences_24h :: Int

Haskell record fields should be camelCase. With FieldModifiers '[CamelToSnake], a field occurrences1m would map to column occurrences1m (no uppercase to convert), not occurrences_1m. For fields where the column name can't be derived by CamelToSnake, use pg-entity's field quasi-quoter with an explicit column name override rather than putting underscores in Haskell identifiers. Consider occurrencesLast1m / occurrencesLast5m etc.

2. Z-score recomputed in createErrorSpikeIssue

detectErrorSpikes already computes the z-score via detectSpikeOrDrop, but createErrorSpikeIssue recomputes it from scratch:

let zScore = if baselineStddev > 0 then (currentRate - baselineMean) / baselineStddev else 0

Pass the already-computed spike.zScore from the call site instead.

3. Redundant drainContent helper in processBatch

  tokenize = tok . drainContent
  drainContent = \case SeedPattern c -> c; NewEvent _ c -> c

Used exactly once — inline it: tokenize = tok . \case SeedPattern c -> c; NewEvent _ c -> c.

4. Destructive migration without data migration (0033_error.sql)

DROP TABLE IF EXISTS apis.errors CASCADE;

This drops apis.errors with no migration of existing rows to apis.error_patterns. The PR description doesn't mention this — please confirm apis.errors has no production data worth preserving.

5. bool 0 1 for Int conversion

addError acc e = HM.insertWith addCounts e.hash (1 :: Int, bool 0 1 (isJust e.userId)) acc

fromEnum (isJust e.userId) is more idiomatic for converting Bool to Int.

6. Lazy irrefutable pattern ~(subj, html) in sendAlertToChannels

~(subj, html) = case alertType of ...

Smart optimization to avoid computing email content when NEmail is not in the channel list. A short comment would help maintainability since irrefutable patterns are uncommon in application code.

7. selectIssueByHash may return the wrong issue in ErrorAssigned job

issueM <- Issues.selectIssueByHash pid err.hash

If multiple issues exist for the same hash (e.g., after a regression creates a second issue), this returns an arbitrary match. Ensure the underlying query orders by created_at DESC so the assignee email links to the most recent issue.

8. addCounts argument order

addCounts (c1, u1) (c2, u2) = (c1 + c2, u1 + u2)

With HM.insertWith f key newVal map the signature is f newValue oldValue. Since summation is symmetric this is harmless, but a comment would clarify intent.

Summary

The design is solid and the code is well-structured. Items worth addressing before merge:

[Confirm] Intent of dropping apis.errors without data migration
[Style] Snake_case field names in ErrorPattern record
[Correctness] selectIssueByHash ordering for the assignee email link
[Minor] Z-score duplication, drainContent inline, fromEnum over bool 0 1

The test was passing Nothing for service and omitting the actual service name used during OTEL ingestion (test-service), causing the computed fingerprint hash to not match the DB.

- createErrorSpikeIssue now receives the already-computed z-score from detectSpikeOrDrop instead of recomputing it from mean/stddev - Inline single-use drainContent helper into tokenize composition

claude · 2026-02-28T20:14:53Z

Code Review — Error Tracking + Log Pattern + Threaded Notifications

Overall this is a well-structured, well-tested PR. The fingerprinting, baseline/spike logic, and threading support are all coherent. Notes below focus on succinctness, correctness, and performance.

Bugs / Correctness

batchUpsertErrorPatterns — keeps first occurrence, not most-recent, per hash

HM.insertWith f new old calls f new old. The lambda (\(a, n) (_, k) -> (a, n + k)) keeps the first ATError seen per hash and discards later ones. This means recent_trace_id in the upsert carries the first trace ID, not the latest. Swap the lambda to keep the newcomer:

-- grouped in Models/Apis/ErrorPatterns.hs
HM.insertWith (\(newE, n) (_, k) -> (newE, n + k)) e.hash (e, 1) m

isGoStdlib — undocumented invariant on the Set

The lookupLE trick is correct today because every entry in goStdlibPrefixes ends with . or /. If someone adds a bare entry like "runtime" (without the dot), lookupLE "runtimex.foo" would return "runtime", and isPrefixOf would produce a false positive. Add a comment asserting the invariant:

-- INVARIANT: every entry must end with '.' or '/' so the lookupLE prefix
-- check in isGoStdlib cannot produce false positives against sibling packages.
goStdlibPrefixes :: S.Set Text

Succinctness / Style

cleanPythonFunction — flip (foldr (uncurry T.replace)) is opaque

The point-free form with flip foldr is harder to read than needed. A direct fold is shorter and clearer:

cleanPythonFunction f = foldr (\(a, b) -> T.replace a b) f
  [("<lambda>", "lambda"), ("<listcomp>", "listcomp"), ("<dictcomp>", "dictcomp")]

Unnecessary lazy pattern in notifyErrorSubscriptions

~(subj, html) = case alertType of ...

The ~ defers the pattern match but the RHS is a total case — no benefit. Plain let (subj, html) = ... avoids hiding an accidental thunk.

extractGenericLineNumber — unreachable branch

case parts of
  _ : rest -> listToMaybe $ mapMaybe readText rest
  _         -> Nothing   -- unreachable: T.splitOn always returns >= 1 element

T.splitOn on a non-empty string always returns a non-empty list, so the wildcard is dead. Remove it.

deduplicateByHash — document which occurrence is kept

HM.insert overwrites, so the last vector element per hash wins. The comment says "deduplication" but doesn't state this. Worth noting for callers who care about recency vs. first-seen.

Performance

getErrorPatternsWithCurrentRates — no pagination + unbounded concurrency

getErrorPatternsWithCurrentRates fetches all non-resolved, non-ignored errors for a project in one shot, then detectErrorSpikes fans them all out via forConcurrently. For a project with thousands of errors this results in an unbounded concurrent DB write fan-out. Consider capping concurrency (pooledForConcurrently or chunked batching), or pushing spike detection into a single SQL UPDATE so it's one round-trip.

bulkCalculateAndUpdateBaselines — positional parameters are hard to audit

The tuple (pid, now, now, pid, now, pid) has six positional bindings spread across two CTEs and the final UPDATE. A comment counting them, or restructuring with a values CTE (SELECT ? AS pid, ?::timestamptz AS cutoff) to bind them once, would reduce the risk of mis-ordering on future edits.

Minor Notes

ErrorPattern mixes occurrences_1m/occurrences_5m/occurrences_1h/occurrences_24h (snake_case) with all other camelCase fields. The inline comment explains it, but worth considering aliasing to camelCase + explicit column mapping long-term.
processNewError silently skips ESOngoing/ESEscalating states (alertType = Nothing). That's intentional per the state machine, but a Log.logDebug would help during incident triage.
The ErrorAssigned job constructor appears in the BgJobs ADT — confirm its handler is wired up in processBackgroundJob (not visible in the diff summary).
SQL in updateErrorPatternSubscription passes subscribed as both a value and a condition (CASE WHEN ? AND NOT subscribed THEN NULL ...). This relies on postgresql-simple binding Bool as boolean; worth a targeted integration test if not already covered.

Overall the test coverage in ErrorPatternsSpec.hs is thorough and the fingerprinting algorithm is well-designed. Main asks: fix the first-vs-last hash grouping in batchUpsertErrorPatterns, document the isGoStdlib invariant, and address the unbounded concurrency in spike detection.

claude · 2026-02-28T20:19:56Z

Code Review

This is a substantial, well-structured PR. The test coverage (20 cases in ErrorPatternsSpec.hs) is thorough, the SQL is efficient (CTE baselines, unnest batch upserts), and the state machine design is sound. A few things worth tightening:

1. `{-# NOINLINE #-}` on a top-level CAF is redundant

goStdlibPrefixes is a top-level constant — GHC already memoises it as a CAF without any pragma. The NOINLINE pragma is a no-op here (it matters on functions, not top-level values). Safe to remove.

2. `cleanPythonFunction` — opaque point-free

Current:

cleanPythonFunction = flip (foldr (uncurry T.replace)) ([("<lambda>", "lambda"), ("<listcomp>", "listcomp"), ("<dictcomp>", "dictcomp")] :: [(Text, Text)])

flip (foldr (uncurry T.replace)) is hard to parse at a glance. A direct fold is clearer and the same length:

cleanPythonFunction func = foldr (\(o, n) -> T.replace o n) func [("<lambda>", "lambda"), ("<listcomp>", "listcomp"), ("<dictcomp>", "dictcomp")]

3. `truncateHour` — exposes `UTCTime` internals unnecessarily

Utils.hs already imports utcTimeToPOSIXSeconds. The current decomposition reaches into DiffTime internals:

-- current
truncateHour (UTCTime day dt) = UTCTime day (secondsToDiffTime $ floor dt `div` 3600 * 3600)

Using the existing POSIX import (just add posixSecondsToUTCTime) keeps UTCTime opaque and avoids the pattern match:

truncateHour = posixSecondsToUTCTime . fromIntegral . (* 3600) . (`div` 3600) . (floor :: NominalDiffTime -> Int) . utcTimeToPOSIXSeconds

4. `detectSpikeOrDrop` — `Int64` parameter is immediately `fromIntegral`-ed

detectSpikeOrDrop :: Double -> Double -> Double -> Double -> Int64 -> Maybe Issues.SpikeResult
-- body: rate = fromIntegral currentCount

Both call sites pass fromIntegral x where x :: Int. Changing the last parameter to Int removes the fromIntegral at every call site and makes the signature more natural.

5. `forConcurrently` in `detectErrorSpikes` — unbounded DB concurrency

void $ forConcurrently errorsWithRates \errRate -> do
  ErrorPatterns.updateErrorPatternState ...
  Issues.insertIssue ...

For a project with many error patterns this fires one DB-touching thread per pattern simultaneously. The logsPatternExtraction pipeline already uses Ki-based bounded concurrency. Consider a similar cap here to avoid connection pool exhaustion on busy projects.

6. `notifyErrorSubscriptions` embeds a complex SQL query in `BackgroundJobs.hs`

The DISTINCT ON join across error_patterns and issues belongs in Models.Apis.ErrorPatterns alongside the other query functions, not in the job runner. This would also make it testable in isolation.

Minor nits

Relude.when (V.length spansWithErrors > 0) should be unless (V.null spansWithErrors) — more idiomatic and signals intent directly
Relude.when is used qualified throughout BackgroundJobs.hs even though Relude is imported unqualified — bare when suffices unless there is an actual ambiguity worth documenting
parseJsFrame strips line twice (T.strip $ T.drop 3 $ T.strip line); the outer strip is only needed if "at" may be followed by multiple spaces — worth a comment if intentional

Positives

Batch SQL patterns (unnest arrays, single-CTE baseline calc) are the right approach throughout
isGoStdlib O(log n) prefix check via S.lookupLE is clever and correct
batchUpsertErrorPatterns grouping by hash before ON CONFLICT correctly avoids the "cannot affect row a second time" error
Test coverage is comprehensive: state machine cycle, idempotency, flat-baseline edge case (MAD floor), cross-project isolation, and thread-ID threading are all covered

claude · 2026-02-28T20:24:46Z

Code Review - PR 326: Errors with threaded replies

Big feature addition overall. Architecture is sound and the batch-upsert and threading designs are well thought out. Some targeted notes below.

Correctness / Potential Bugs

batchUpsertErrorPatterns - grouping semantics undocumented

insertWith f key new old means (a, n) is the new (incoming) value, (_, k) is the existing one in the lambda \(a, n) (_, k) -> (a, n + k). So this keeps the last-seen error data and sums counts. That is probably the right call (most-recent error wins for message/stack), but the intent is non-obvious from the lambda alone. A brief comment would prevent future surprises.

ErrorState missing Ord

The state machine progresses New -> Escalating -> Ongoing -> Resolved. Without a derived Ord, you cannot write guards like state >= ESEscalating and instead need exhaustive case matches scattered through BackgroundJobs. Since the declaration order already encodes the progression, adding deriving stock (… Ord, Enum, Bounded) costs nothing and makes conditions like state elem [ESNew, ESEscalating, ESOngoing, ESRegressed] replaceable with state < ESResolved.

Conciseness / GHC-extension opportunities

normalizeFunction - three-let chain vs. composition

Current (in normalizeStackTrace where clause):

normalizeFunction func =
  let noAddr     = unwords $ filter (not . ("0x" `T.isPrefixOf`)) $ words func
      noGenerics = T.takeWhile (/= '<') noAddr
      noParams   = T.takeWhile (/= '(') noGenerics
   in T.strip noParams

With point-free composition (no intermediate names needed):

normalizeFunction =
  T.strip . T.takeWhile (/= '(') . T.takeWhile (/= '<')
  . unwords . filter (not . T.isPrefixOf "0x") . words

cleanPythonFunction - flip (foldr ...) is opaque

-- current
cleanPythonFunction = flip (foldr (uncurry T.replace)) ([("<lambda>", "lambda"), ...] :: [(Text, Text)])
-- clearer
cleanPythonFunction f =
  foldl' (\t (k, v) -> T.replace k v t) f
    [("<lambda>", "lambda"), ("<listcomp>", "listcomp"), ("<dictcomp>", "dictcomp")]

extractJsModule - operator section replaces the lambda

-- current
asum $ map (\ext -> T.stripSuffix ext baseName) [".js", ".ts", ".mjs", ".cjs"]
-- simpler
asum $ map (`T.stripSuffix` baseName) [".js", ".ts", ".mjs", ".cjs"]

getErrorPatterns - duplicated ORDER BY / LIMIT / OFFSET suffix

The suffix " ORDER BY updated_at DESC LIMIT ? OFFSET ?" is copy-pasted in both branches of the case mstate of. Factor it into a where binding.

Test Coverage

Pkg.ErrorFingerprint - doctest examples are not wired up

The module has rich >>> examples covering Go stdlib detection, Python line-number exclusion, Java generics, etc., but there is no doctest suite in monoscope.cabal. Either add a doctest suite or migrate the examples to an ErrorFingerprintSpec. This is the most critical correctness surface in the PR and currently has zero automated test coverage.

Background jobs lack unit tests

processNewError, detectErrorSpikes, calculateErrorBaselines, and notifyErrorSubscriptions are new and complex but only reachable via the full integration flow. Targeted tests for the spike-detection logic (z-score threshold, BSLearning skip) and state-machine transitions would give much faster feedback on regressions.

Minor

updateErrorPatternThreadIds' (Bool flag): the two public wrappers are the right API; the private helper could accept Maybe UTCTime instead of Bool + UTCTime to avoid passing now twice, but this is a nit.
BackgroundJobs.hs is growing. No action needed now, but calculateErrorBaselines, detectErrorSpikes, and processNewError could eventually live in a dedicated Background.ErrorJobs module.

What is good

Batch unnest upsert in batchUpsertErrorPatterns / upsertErrorPatternHourlyStats correctly avoids N+1 and the ON CONFLICT DO UPDATE duplicate-row pitfall via the HashMap grouping step.
isGoStdlib via Set.lookupLE - O(log n) prefix matching instead of a linear scan; well commented.
WrappedEnumSC ToJSON/FromJSON consolidation in DeriveUtils eliminates a class of CustomJSON boilerplate across the codebase.
BaselineState centralised in DeriveUtils is the right call given it is now shared across LogPatterns and ErrorPatterns.
Threaded Slack/Discord design is clean: sendNotificationWithReply returns Maybe Text and thread IDs are stored separately, so retries do not duplicate threads.

dawkaka and others added 30 commits January 22, 2026 08:45

feat: log patterns migrations

6554a94

log pattern model types

19e5557

feat: complete log patterns partition

c729678

Auto-format code with fourmolu

598d185

ui for log patterns

5e03ab0

Auto-format code with fourmolu

d983b09

fix drain tests

fc40a76

fix hlint errors

b55e22f

use select where

282bfb8

fetch log ids at once to avoid N+1 queries

c5f3125

Auto-format code with fourmolu

f2abf87

feat: use text quasiquotes

83e8e1f

add issue columns

642b376

claude review fixes

07f5f18

Auto-format code with fourmolu

684e41e

Remove focus outline from AI search input

a254f11

Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.

chore: multiway if

bf14999

Auto-format code with fourmolu

127bdbe

mer

05b0b03

auto set basleine to established after 24 hours

5bbf760

updaqte log patterns

2a04530

Auto-format code with fourmolu

2733507

fetch log pattern stats without joining tables

eabc206

Auto-format code with fourmolu

e0566a6

use text neat interpolation

bd731dc

remove skip

6c3b074

Merge branch 'log-patterns-chp' into errors-chp

b9facb1

feat: errors features clean up

4acbb47

assign error and error notif subscription

bbf7b0c

Auto-format code with fourmolu

6693ad5

tonyalaribe and others added 2 commits February 28, 2026 19:27

cleanups

f9820a2

Auto-format code with fourmolu

46df1fe

This comment was marked as outdated.

Sign in to view

tonyalaribe and others added 2 commits February 28, 2026 19:42

use Set + lookupLE for O(log n) Go stdlib prefix matching

0c0566e

Auto-format code with fourmolu

bc233a2

This comment was marked as outdated.

Sign in to view

fix: prevent false JSON detection in drain tokenizer

1e74a7f

looksLikeJson used isInfixOf for "{" which matched replacement tokens like {integer}, {YYYY-MM-DDThh:mm:ss.sTZD}, causing tokenizeJsonLike to split them into separate tokens and break drain clustering.

tonyalaribe added 3 commits February 28, 2026 20:53

fix: correct fingerprint params in error pattern integration test

0d2900a

The test was passing Nothing for service and omitting the actual service name used during OTEL ingestion (test-service), causing the computed fingerprint hash to not match the DB.

refactor: pass precomputed z-score and inline drainContent helper

a391ab9

- createErrorSpikeIssue now receives the already-computed z-score from detectSpikeOrDrop instead of recomputing it from mean/stddev - Inline single-use drainContent helper into tokenize composition

make migration 0033 idempotent for repeated runs

70a8b22

tonyalaribe merged commit 7ce9230 into master Feb 28, 2026
7 checks passed

tonyalaribe deleted the errors-chp branch February 28, 2026 20:13

Conversation

dawkaka commented Feb 24, 2026 • edited by tonyalaribe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Error Tracking System

Log Pattern Detection

Notification System

Background Jobs

Database Migrations

Uh oh!

claude bot commented Feb 28, 2026

Code Review

🔴 ErrorPattern mixed snake_case fields — entity column mismatch risk

🟡 goStdlibPrefixes — use Set for O(log n) lookups

🟡 parseStackFrame — runtime as Text is fragile

🟡 computeErrorFingerprint — service-less errors collide across services

🟡 bulkCalculateAndUpdateBaselines — 6-argument parameter order is hard to audit

🟢 Good things in this PR

🔵 Minor

Uh oh!

claude bot commented Feb 28, 2026

Revision history for monoscope

0.1.0.0 -- YYYY-mm-dd

Uh oh!

claude bot commented Feb 28, 2026

Code Review — PR #326 (Error Tracking + Threaded Notifications)

🐛 Potential Bug — Duplicate Notifications

⚡ Performance — Sequential Spike Detection

🔍 Design — 7 Positional Text Args

🧹 Style — Named Thread-ID Pair

🧹 Succinctness — cleanPythonFunction

✅ Good Stuff

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

claude bot commented Feb 28, 2026

PR Review: Errors with Threaded Replies

Uh oh!

This comment was marked as outdated.

claude bot commented Feb 28, 2026

Uh oh!

claude bot commented Feb 28, 2026

Uh oh!

claude bot commented Feb 28, 2026

Revision history for monoscope

0.1.0.0 -- YYYY-mm-dd

Uh oh!

claude bot commented Feb 28, 2026

Revision history for monoscope

0.1.0.0 -- YYYY-mm-dd

Uh oh!

claude bot commented Feb 28, 2026

Uh oh!

claude bot commented Feb 28, 2026

Uh oh!

This comment was marked as outdated.

claude bot commented Feb 28, 2026

PR Review: Errors with threaded replies

Potential bugs

Succinctness / package usage

Minor

Good changes worth calling out

Uh oh!

claude bot commented Feb 28, 2026

Code Review

Highlights

Issues / suggestions

Summary

Uh oh!

Uh oh!

claude bot commented Feb 28, 2026

Code Review — Error Tracking + Log Pattern + Threaded Notifications

Bugs / Correctness

Succinctness / Style

Performance

Minor Notes

Uh oh!

claude bot commented Feb 28, 2026

Code Review

1. {-# NOINLINE #-} on a top-level CAF is redundant

dawkaka commented Feb 24, 2026 •

edited by tonyalaribe

Loading

🔴 `ErrorPattern` mixed snake_case fields — entity column mismatch risk

🟡 `goStdlibPrefixes` — use `Set` for O(log n) lookups

🟡 `parseStackFrame` — runtime as `Text` is fragile

🟡 `computeErrorFingerprint` — service-less errors collide across services

🟡 `bulkCalculateAndUpdateBaselines` — 6-argument parameter order is hard to audit

🔍 Design — 7 Positional `Text` Args

🧹 Succinctness — `cleanPythonFunction`

1. `{-# NOINLINE #-}` on a top-level CAF is redundant

2. `cleanPythonFunction` — opaque point-free

3. `truncateHour` — exposes `UTCTime` internals unnecessarily

4. `detectSpikeOrDrop` — `Int64` parameter is immediately `fromIntegral`-ed

5. `forConcurrently` in `detectErrorSpikes` — unbounded DB concurrency

6. `notifyErrorSubscriptions` embeds a complex SQL query in `BackgroundJobs.hs`