[Uptime] Use scripted metric for snapshot calculation#58247
[Uptime] Use scripted metric for snapshot calculation#58247andrewvc merged 7 commits intoelastic:7.6from
Conversation
|
Pinging @elastic/uptime (Team:uptime) |
justinkambic
left a comment
There was a problem hiding this comment.
I had a few questions and suggestions for cleaning, naming, commenting, but the base code looks good to me. I also still need to finish a functional review.
| return state; | ||
| `, | ||
| reduce_script: ` | ||
| // Use a treemap since it's later traversable in sorted order |
There was a problem hiding this comment.
it's later traversable in sorted order
I'm not familiar with the TreeMap class, am I understanding correctly that it is self-balancing? Meaning as keys are inserted, it handles the sort based on the comparison function you provide to merge below?
I.e. if I have a map with keys 1, 4, 5 and I insert 3, then traverse the entrySet, it will iterate like 1 3 4 5?
If that's correct, it might be good to expand this comment a little, since we are writing Java in a TypeScript file; it's reasonable that someone viewing this code might not be able to understand it easily.
There was a problem hiding this comment.
Exactly, it will maintain the keys in order. Merge doesn't have anything to do with the sorting, I've added a comment below that explains that. Merge just updates the value if we have a more recent check from the same location.
The order of the treemap uses the built-in compareTo implementation of java's String class.
|
|
||
| // Parse the length delimited id/location strings described in the map section | ||
| int colonIndex = idLoc.indexOf(":"); | ||
| int idEnd = Integer.parseInt(idLoc.substring(0, colonIndex), 16) + colonIndex + 1; |
There was a problem hiding this comment.
Exactly, since we hex encode the numbers for density
...cy/plugins/uptime/server/lib/adapters/monitor_states/elasticsearch_monitor_states_adapter.ts
Outdated
Show resolved
Hide resolved
| String loc = idLoc.substring(idEnd, idLoc.length()); | ||
| String status = timeStatus.substring(timeStatus.length() - 1); | ||
|
|
||
| locTotals.compute(loc, (k,v) -> { |
There was a problem hiding this comment.
A comment heading this block would be helpful be useful to a javascript developer 😅.
My understanding is we are updating the value for key loc, and the output of the provided function determines the new value. If the value was null, we create a new HashMap, then we increment appropriate values based on the documents we iterate over.
There was a problem hiding this comment.
Yes, that's correct. I'll add a comment
...cy/plugins/uptime/server/lib/adapters/monitor_states/elasticsearch_monitor_states_adapter.ts
Outdated
Show resolved
Hide resolved
| counts[leastCommonStatus] = await slowStatusCount(context, leastCommonStatus); | ||
| counts[mostCommonStatus] = counts.total - counts[leastCommonStatus]; | ||
| } | ||
| const counts = await statusCount(context); |
There was a problem hiding this comment.
Do you think it'd be better to name this function getStatusCount?
There was a problem hiding this comment.
I'm not sure if get has any particular meaning at least in my head, unless there's something to juxtapose it against.
| }; | ||
| }; | ||
|
|
||
| const slowStatusCount = async (context: QueryContext, status: string): Promise<number> => { |
There was a problem hiding this comment.
So now rather than having a fast/slow count, we're able to just have one counter (slower, but still fast, and always accurate), right?
|
@elasticmachine merge upstream |
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
Fixes elastic#58079 This is an improved version of elastic#58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings.
Fixes #58079 This is an improved version of #58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings.
…elastic#58389) Fixes elastic#58079 This is an improved version of elastic#58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings.
…re/files-and-filetree * 'master' of github.com:elastic/kibana: (174 commits) [SIEM] Fix unnecessary re-renders on the Overview page (elastic#56587) Don't mutate error message (elastic#58452) Fix service map popover transaction duration (elastic#58422) [ML] Adding filebeat config to file dataviz (elastic#58152) [Uptime] Improve refresh handling when generating test data (elastic#58285) [Logs / Metrics UI] Remove path prefix from ViewSourceConfigur… (elastic#58238) [ML] Functional tests - adjust classification model memory (elastic#58445) [ML] Use event.timezone instead of beat.timezone in file upload (elastic#58447) [Logs UI] Unskip and stabilitize log column configuration tests (elastic#58392) [Telemetry] Separate the license retrieval from the stats in the usage collectors (elastic#57332) hide welcome screen for cloud (elastic#58371) Move src/legacy/ui/public/notify/app_redirect to kibana_legacy (elastic#58127) [ML] Functional tests - stabilize typing during df analytics creation (elastic#58227) fix short url in spaces (elastic#58313) [SIEM] Upgrades cypress to version 4.0.2 (elastic#58400) [Index management] Move to new platform "plugins" folder (elastic#58109) [kbn/optimizer] disable parallelization in terser plugin (elastic#58396) [Uptime] Delete useless try...catch blocks (elastic#58263) [Uptime] Use scripted metric for snapshot calculation (elastic#58247) (elastic#58389) [APM] Stabilize agent configuration API (elastic#57767) ... # Conflicts: # src/plugins/console/public/application/containers/editor/legacy/console_editor/editor.tsx
…elastic#58389) (elastic#58415) Fixes elastic#58079 This is an improved version of elastic#58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Summary
Fixes #58079
This is an improved version of #58078
Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later.
This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful.
We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added.
I attempted to keep memory usage relatively slow by using simple maps of strings.
Checklist
Delete any items that are not applicable to this PR.
For maintainers