Log patterns: Log patterns related code#300
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
|
||
|
|
||
| -- | Get a pattern by ID | ||
| getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern) |
There was a problem hiding this comment.
use the _selectWhere pattern, instead of enumerating the fields one by one
| Issues.LogPattern -> | ||
| "Describe this log pattern issue and its implications.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
| Issues.LogPatternRateChange -> | ||
| "Describe this log pattern rate change and its implications.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
|
|
There was a problem hiding this comment.
Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
src/Models/Apis/LogPatterns.hs
Outdated
| SELECT | ||
| lp.id, | ||
| lp.project_id, | ||
| lp.log_pattern, | ||
| lp.pattern_hash, | ||
| lp.baseline_state, | ||
| lp.baseline_volume_hourly_mean, | ||
| lp.baseline_volume_hourly_stddev, | ||
| COALESCE(counts.current_count, 0)::INT AS current_hour_count | ||
| FROM apis.log_patterns lp | ||
| LEFT JOIN ( | ||
| SELECT log_pattern, COUNT(*) AS current_count | ||
| FROM otel_logs_and_spans | ||
| WHERE project_id = ? | ||
| AND timestamp >= date_trunc('hour', NOW()) | ||
| AND log_pattern IS NOT NULL | ||
| GROUP BY log_pattern | ||
| ) counts ON counts.log_pattern = lp.log_pattern | ||
| WHERE lp.project_id = ? | ||
| AND lp.state != 'ignored' AND lp.baseline_state = 'established' |
There was a problem hiding this comment.
How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?
Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range
There was a problem hiding this comment.
Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case
src/Models/Apis/RequestDumps.hs
Outdated
| [text| | ||
| SELECT lp.log_pattern, count(*) as p_count | ||
| FROM apis.log_patterns lp | ||
| INNER JOIN otel_logs_and_spans ols | ||
| ON lp.log_pattern = ols.log_pattern AND lp.project_id::text = ols.project_id | ||
| WHERE lp.project_id = ? | ||
| AND lp.state != 'ignored' | ||
| AND ${whereCondition} | ||
| GROUP BY lp.log_pattern | ||
| ORDER BY p_count DESC | ||
| OFFSET ? LIMIT 15 | ||
| |] |
There was a problem hiding this comment.
same as here. how is log_pattern joining on otel_logs_and_spans?
There was a problem hiding this comment.
otel_logs_and_spans has a log_pattern column
There was a problem hiding this comment.
Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.
src/Models/Apis/LogPatterns.hs
Outdated
|
|
||
| -- | Get pattern stats from otel_logs_and_spans | ||
| -- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation | ||
| getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats) |
There was a problem hiding this comment.
why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?
There was a problem hiding this comment.
This is for detecting spikes
|
|
||
| -- | Calculate baselines for log patterns | ||
| -- Uses hourly counts from otel_logs_and_spans over the last 7 days | ||
| calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx () |
There was a problem hiding this comment.
I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
src/Models/Apis/Issues.hs
Outdated
| [text| | ||
| SELECT id, created_at, updated_at, project_id, issue_type::text, endpoint_hash, acknowledged_at, acknowledged_by, archived_at, title, service, critical, | ||
| CASE WHEN critical THEN 'critical' ELSE 'info' END, affected_requests, affected_clients, NULL::double precision, | ||
| CASE WHEN critical THEN 'critical' ELSE 'info' END, 0::int, 0::int, NULL::double precision, |
There was a problem hiding this comment.
Why did you set these to 0?
There was a problem hiding this comment.
Those are not part of the new issues table.
There was a problem hiding this comment.
Why are we adding it to the query?
| Issues.LogPattern -> | ||
| "Generate a concise title for this log pattern issue.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
| Issues.LogPatternRateChange -> | ||
| "Generate a concise title for this log pattern rate change.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service |
There was a problem hiding this comment.
Use the text quasiquotes here.
| @@ -0,0 +1,68 @@ | |||
| BEGIN; | |||
|
|
|||
| CREATE TABLE IF NOT EXISTS apis.log_patterns ( | |||
There was a problem hiding this comment.
We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.
| CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC); | ||
| CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name); | ||
|
|
||
| CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$ |
There was a problem hiding this comment.
Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.
There was a problem hiding this comment.
It's a regular table.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Rename patterns function from 15mins to 5mins. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
2f2da2d to
5a609a9
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
…rom master Restore the original 0030_replay_event_file_count and 0031_slack_bot_token migrations from master and move log patterns changes into a new 0032 migration.
PR Review: Log PatternsGood overall design — Potential Bug:
|
PR Review: Log Patterns FeatureGood overall structure. The Drain algorithm integration, hourly stats table, and MAD-based spike detection are well thought-out. Several concrete improvements below, prioritised by impact. Bugs / Correctness
On a non-matching -- current (buggy):
else before <> "@" <> replaceEmails (T.drop 1 after)
-- fix: don't re-scan `before`:
else T.reverse prefixRev <> T.singleton '@' <> replaceEmails (T.drop 1 after)
PostgreSQL does correlate both unnests pair-wise here, but the behaviour is surprising. Prefer an explicit lateral join for future readers: FROM unnest(?::text[], ?::text[]) AS t(sf, ph)
WHERE source_field = t.sf AND pattern_hash = t.phCode Duplication (contradicts "less lines" priority)
Both have identical ON CONFLICT clauses (~20 lines each). Delegate instead: upsertLogPattern up = upsertLogPatternBatch [up]
upsertHourlyStat pid sf h t c = upsertHourlyStatBatch [(pid, sf, h, t, c)]Hardcoded Enum Strings in SQL (type-safety regression)
[sql| WHERE state = 'acknowledged' ... |] -- should be a typed parameter
[sql| WHERE state = 'new' ... |] -- should be a typed parameter
pruneStalePatterns pid now staleDays =
PG.execute [sql| DELETE FROM apis.log_patterns
WHERE project_id = ? AND last_seen_at < ?::timestamptz - INTERVAL '1 day' * ?
AND state = ? |]
(pid, now, staleDays, LPSAcknowledged)Same treatment for Opaque 6-tuple in
|
Code Review: PR #300 -- Log Patterns PipelineGood overall structure. Here are the key findings: Bugs1. ON CONFLICT deduplication collapses unrelated issues when The new unique index has no guard for empty hashes. For CREATE UNIQUE INDEX ... WHERE acknowledged_at IS NULL AND archived_at IS NULL AND target_hash != '';2. The old Correctness / Logic3. Spike projection is 4x too aggressive for the first 15 minutes of every hour let minutesIntoHour = max 15 $ todMin (timeToTimeOfDay $ utctDayTime now)
let totalSecs = round (realToFrac (utctDayTime now) :: Double) :: Int
minutesIntoHour = max 15 $ (totalSecs `mod` 3600) `div` 60Code Conciseness / Best Practices4.
import Control.Exception.Safe (catchSync)
tryLog :: Text -> ATBackgroundCtx () -> ATBackgroundCtx ()
tryLog label = (`catchSync` \(e :: SomeException) ->
Log.logAttention ("LogPattern pipeline step failed: " <> label) (show e))5. Six 6. -- Current:
let allUps = urlUpserts ++ excUpserts
void $ LogPatterns.upsertLogPatternBatch (map fst allUps)
void $ LogPatterns.upsertHourlyStatBatch (map snd allUps)
-- Better:
let (ups, hss) = unzip (urlUpserts ++ excUpserts)
void $ LogPatterns.upsertLogPatternBatch ups
void $ LogPatterns.upsertHourlyStatBatch hss7.
let tokens = (if isSummary then Drain.generateSummaryDrainTokens else Drain.generateDrainTokens) content8. Deriving Good consolidation. Confirm the first constructor in each type is the intended Test Coverage Gaps9. Spike detection test has non-deterministic branching The test branches based on whether an established pattern already exists in the DB, seeding one manually only if not found. This covers different code paths depending on DB state. Make seed setup unconditional for determinism. 10. The test checks that 11. Missing boundary test for Only verifies a stat at -100h is pruned. No test that a stat at Info
|
PR Review: Log Pattern ProcessingGood overall architecture — the extraction→baseline→spike→new-pattern pipeline is clearly structured, and the refactoring of Potential Bug:
|
- Use elapsed seconds for spike projection instead of todMin indirection - Remove redundant async exception re-throw in tryLog (UnliftIO.catch handles it) - Use lens tuple accessors in updateBaselineBatch for readability - Use unzip in extractFieldPatterns instead of map fst/snd - Replace bool with explicit if in processNewLog for clarity
Code ReviewGood overall implementation of the log pattern pipeline. A few issues worth addressing, ordered by severity. Bugs / Correctness1. Before, there was an explicit alias 2. The handler receives 3. Issues.LogPatternRateChange -> ("bg-fillWarning-strong", "trending-up", "RATE CHANGE")The issue can represent a spike or a drop, but the icon is hardcoded to Code Conciseness / DRY4. The entire SQL body is copy-pasted between the two. Since upsertLogPattern :: DB es => UpsertPattern -> Eff es Int64
upsertLogPattern = upsertLogPatternBatch . pure5. let srcFields = V.map (view _1) rows
hashes = V.map (view _2) rows
...
samples = V.map (view _6) rowsPulling in let srcFields = V.map (\(f,_,_,_,_,_) -> f) rowsBetter still: define a 6. Three identical
withIssueDataH :: (AE.FromJSON a, Applicative m) => Issue -> (a -> m ()) -> m ()
withIssueDataH issue f = case AE.fromJSON (getAeson issue.issueData) of
AE.Success d -> f d
_ -> pure ()Usage: Issues.LogPattern -> withIssueDataH @Issues.LogPatternData issue \d -> ...
Issues.LogPatternRateChange -> withIssueDataH @Issues.LogPatternRateChangeData issue \d -> ...SQL Type Safety7. Literal -- pruneStalePatterns
AND state = 'acknowledged'
-- autoAcknowledgeStaleNewPatterns
WHERE state = 'new'
SET state = 'acknowledged'The rest of the module uses AND state = ? -- bind LPSAcknowledged / LPSNew8. Parallel WHERE (source_field, pattern_hash) IN (SELECT unnest(?::text[]), unnest(?::text[]))This works on Postgres >=9.4 but the safer canonical idiom is: WHERE (source_field, pattern_hash) IN (
SELECT * FROM unnest(?::text[], ?::text[]) AS t(source_field, pattern_hash)
)Minor
Overall the |
Code Review: Log Patterns PipelineGreat feature overall — the architecture is solid and the new Bugs / Correctness1.
2.
3. The Code Duplication4. Copy-pasted SQL in upsert helpers
Packages / Extensions Already Available5. Lens-based 6-tuple access in Using 6. Redundant
Design / Clarity7. Deprecated
8.
9. Retry-induced double-counting in hourly stats The 10. Omitting the conflict target silently suppresses any constraint violation during tests. Use Minor11. Operator precedence in
12. If this runs criterion/tasty-bench it will inflate unit test runtimes. Benchmarks should be a separate cabal stanza. |
…ate suite - upsertLogPattern now delegates to upsertLogPatternBatch via pure - Replace repeated case AE.fromJSON/getAeson/pass blocks with withIssueDataH - Move UtilsBenchSpec out of unit tests into dedicated bench suite - Pass scheduledTime to detectLogPatternSpikes for accurate partial-hour projection
PR Review: Log Patterns FeatureOverall a solid, well-structured PR with good use of GHC extensions and batch DB operations. A few suggestions for improvement. Positives
Suggestions1. let srcFields = V.map (view _1) rows
hashes = V.map (view _2) rows
states = V.map (view _3) rows
means = V.map (view _4) rows
mads = V.map (view _5) rows
samples = V.map (view _6) rowsThis 6-tuple is hard to read and maintain. A small named record: data BaselineUpdate = BaselineUpdate
{ sourceField :: Text, patternHash :: Text, state :: BaselineState
, mean :: Double, mad :: Double, samples :: Int }…would let you use 2. HM.HashMap Text (Maybe Text, Maybe Text, Maybe Text)
-- traceId? serviceName? level?Anonymous 3-tuple of 3. The function is ~100 lines mixing: SQL building, Drain tree construction, aggregation, and sparkline computation. The on-the-fly branch should be extracted as 4. Hourly stats are not idempotent on job retry -- Note: SUM semantics mean job retries double-count.This is acknowledged but worth addressing. A double-counted hourly bucket inflates the MAD baseline, making the MAD effectively 0 for that hour, which lowers the noise floor and could either suppress or trigger false spike alerts. Since MAD is robust to occasional outliers, this is probably fine for now — but it's worth a follow-up issue, especially once this runs at scale. 5. , endpointHash :: Text -- Deprecated: always == targetHash (set by mkIssue). Drop column once ...No follow-up issue is referenced. This will persist indefinitely without one. A 6. T.foldl' (\acc c -> acc * 10 + (fromEnum c - 48)) 0 t <= (255 :: Int)This works but 7. Benchmark suite uses elapsed `shouldSatisfy` (< 1.0) -- 2k rows under 1 secondThis is brittle under CI load variance. Fine as a smoke test but consider 8. The Minor
|
Code Review — PR #300: Log PatternsSolid feature work overall — the pipeline design (extract → baseline → spike/drop → new pattern) is well-structured, the SQL is correct, and the doctest coverage on Bug / Security
-- LogPatterns.hs
getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)
getLogPatternById lpid = listToMaybe <$> PG.query (_selectWhere @LogPattern [[DAT.field| id |]]) (Only lpid)
getLogPatternsByIds :: DB es => V.Vector LogPatternId -> Eff es (V.Vector LogPattern)
getLogPatternsByIds ids
| V.null ids = pure V.empty
| otherwise = V.fromList <$> PG.query (_select @LogPattern <> " WHERE id = ANY(?)") (Only ids)Every other function in this module takes Correctness / Subtle Bug
forM_ prepared \(filteredIds, patternHash, _, _) -> do
let hashTag = "pat:" <> patternHash
void $ withTimefusion tfEnabled $ PG.execute [sql|UPDATE otel_logs_and_spans SET hashes = array_append(hashes, ?) WHERE project_id = ? AND timestamp >= ? AND id = ANY(?::uuid[]) AND NOT (hashes @> ARRAY[?])|] (hashTag, pid, since, filteredIds, hashTag)For a project with many patterns this is O(patterns) queries against Code Conciseness6-tuple -- updateBaselineBatch — requires Control.Lens import just for this
let srcFields = V.map (view _1) rows
hashes = V.map (view _2) rows
states = V.map (view _3) rows
means = V.map (view _4) rows
mads = V.map (view _5) rows
samples = V.map (view _6) rowsWith
replaceAllFormats !input = … go Nothing (replacePrePass input)
where
monthSet :: HS.HashSet Text
monthSet = HS.fromList ["Jan","Feb",…]This is a constant with no free variables. GHC may float it as a CAF, but it's not guaranteed — move it to a top-level binding to make the intent explicit and avoid any risk of repeated allocation on every call to
sourceFieldLabel f = fromMaybe f $ lookup f knownPatternFields
-- Already concise. But knownPatternFields is a list — for 4 entries a Map or plain list is fine;
-- just making sure it's not accidentally O(n) in a hot path (it isn't, 4 entries, no concern).OK as-is.
let issueWorthy = V.fromList $ filter (\lp -> lp.sourceField /= "url_path") newPatterns
issueIds <- V.forM issueWorthy \lp -> …
let issueWorthy = filter (\lp -> lp.sourceField /= "url_path") newPatterns
issueIds <- forM issueWorthy \lp -> …
unless (null issueIds) $ …Design / Maintainability
data LogPatternState = LPSNew | LPSAcknowledged | LPSIgnoredNothing in this PR transitions a pattern to
, endpointHash :: Text -- Deprecated: always == targetHash. Drop column once selectIssues/selectIssueByHash are migrated.The migration backfills and the comment acknowledges the debt. It's fine for now, but worth tracking as a follow-up since it keeps two fields in sync by convention rather than by code. Minor
Overall the logic is sound. The three most important things to fix before merging: the missing |
Code Review — Log Patterns (#300)Overall this is a well-structured addition. The baseline/MAD spike detection, the Drain integration, and the migration are solid. A few issues worth addressing before merge: 1. Replace the 6-tuple in
|
Saving and presenting log patterns in ui and log patterns anomaly detection
Closes #
How to test