Commit 47ccede
# Backport
This will backport the following commits from `main` to `8.x`:
- [[Streams] Partitioning improvements
(#209095)](#209095)
<!--- Backport version: 9.4.3 -->
### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)
<!--BACKPORT [{"author":{"name":"Kerry
Gallagher","email":"kerry.gallagher@elastic.co"},"sourceCommit":{"committedDate":"2025-02-07T13:07:19Z","message":"[Streams]
Partitioning improvements (#209095)\n\n## Summary \r\n\r\nThis issue
predominantly tries to improve the situation around fetching\r\nand
showing samples. Some of the discussion can be seen
here:\r\nhttps://github.com/elastic/streams-program/issues/37#issuecomment-2605288052\r\n\r\nWe
have several issues - runtime fields are expensive (but needed
if\r\nfields aren't mapped), we are susceptible to timeouts depending
on\r\namount of data and timerange, getting exact document counts (for
match /\r\nnot matched counts) is expensive etc.\r\n\r\nAfter speaking
with Joe we decided it might be worth trying out async\r\nsearch, as
this alleviates some of these issues. E.g. the ability to\r\nload and
show partial results without trying to communicate this through\r\nour
API, or have to provide a potentially confusing UI around timeouts
/\r\nrunning to exhaustion options / toggles.\r\n\r\nRealistically we
only fetch 100 examples, but we might need to scan many\r\ndocuments to
gather that set of documents, I'm not 100% sure how often\r\nwe'll
actually hit partial results here, but it seems more robust
than\r\nworrying about timeouts.\r\n\r\nFor the matching counts I just
couldn't see a way to get an accurate\r\ncount without something
expensive (e.g. `track_total_hits`) so I've\r\ntried to use an
\"approximate match rate\" based on a random sample, that\r\nrandom
sample is then filtered to the condition to see what
approximate\r\npercent matched. One note: aggregations don't seem to
return partial\r\nresults (which makes sense I guess), you get the
interval polling\r\nrequests, but won't get a result until the end. I
did wonder if you\r\ncould do something smart with `track_total_hits`
and aggs to \"stream\"\r\npartial counts, I found a Slack thread saying
don't do this 😅\r\n\r\n⚠️ ~I'm not 100% sure what I'm missing here but I
have seen the filter\r\nsub aggregation come back with a doc_count that
is higher than the\r\nrandom sample.~\r\n\r\n~[From
the\r\ndocs](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-random-sampler-aggregation.html#random-sampler-inner-workings)\r\nI
understand \"If a query is provided, a document is returned if it
is\r\nmatched by the query and if the document is in the random
sampling. The\r\nsampling is not done over the matched documents.\" but
I don't see why\r\nthat affects the sub aggregation under the random
sample.~\r\n\r\n\r\n\r\n~I
hit this when playing with the `probability` setting, not sure if
I'm\r\nmissing something
stupid.~\r\n\r\n\r\n[Solved](https://github.com/elastic/kibana/pull/209095#discussion_r1940567855)\r\n\r\nOverall,
this does seem to work well. I've used this against ~250k
and\r\n~2.5million documents, and whilst (depending on time range /
runtime\r\nfields) it can still be slow, it seems to provide a better
experience\r\nthan hitting our API and holding the open connection.
Obviously it comes\r\nwith the downsides of sitting on the client (not
really sure it's a con,\r\nthese are platform services) and not using
the standard\r\n`streamsRepositoryClient`.\r\n\r\n## Other
changes\r\n\r\n- The core changes here are in the `use_async_sample`
hook, and where\r\nthat's consumed.\r\n\r\n- Runtime fields are not
generated for fields that are mapped.\r\n\r\n- I've also refactored the
routing index page so that components / hooks\r\nlive in their own files
(this makes the diff look bigger than it is)\r\n\r\n- Refactored some
logic around preview panel / preview panel\r\nillustration so that the
two branches of logic / conditionals now become\r\none.\r\n\r\n##
Followups\r\n\r\n- I haven't changed enrichment to use this or removed
the actual API\r\nroute as I figured this would need discussion first to
see if we want to\r\nuse
this.","sha":"97d0c1b2aeee10bdadede71b05691f8857c5fc2f","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:version","Feature:Streams","v9.1.0","v8.19.0"],"title":"[Streams]
Partitioning improvements
","number":209095,"url":"https://github.com/elastic/kibana/pull/209095","mergeCommit":{"message":"[Streams]
Partitioning improvements (#209095)\n\n## Summary \r\n\r\nThis issue
predominantly tries to improve the situation around fetching\r\nand
showing samples. Some of the discussion can be seen
here:\r\nhttps://github.com/elastic/streams-program/issues/37#issuecomment-2605288052\r\n\r\nWe
have several issues - runtime fields are expensive (but needed
if\r\nfields aren't mapped), we are susceptible to timeouts depending
on\r\namount of data and timerange, getting exact document counts (for
match /\r\nnot matched counts) is expensive etc.\r\n\r\nAfter speaking
with Joe we decided it might be worth trying out async\r\nsearch, as
this alleviates some of these issues. E.g. the ability to\r\nload and
show partial results without trying to communicate this through\r\nour
API, or have to provide a potentially confusing UI around timeouts
/\r\nrunning to exhaustion options / toggles.\r\n\r\nRealistically we
only fetch 100 examples, but we might need to scan many\r\ndocuments to
gather that set of documents, I'm not 100% sure how often\r\nwe'll
actually hit partial results here, but it seems more robust
than\r\nworrying about timeouts.\r\n\r\nFor the matching counts I just
couldn't see a way to get an accurate\r\ncount without something
expensive (e.g. `track_total_hits`) so I've\r\ntried to use an
\"approximate match rate\" based on a random sample, that\r\nrandom
sample is then filtered to the condition to see what
approximate\r\npercent matched. One note: aggregations don't seem to
return partial\r\nresults (which makes sense I guess), you get the
interval polling\r\nrequests, but won't get a result until the end. I
did wonder if you\r\ncould do something smart with `track_total_hits`
and aggs to \"stream\"\r\npartial counts, I found a Slack thread saying
don't do this 😅\r\n\r\n⚠️ ~I'm not 100% sure what I'm missing here but I
have seen the filter\r\nsub aggregation come back with a doc_count that
is higher than the\r\nrandom sample.~\r\n\r\n~[From
the\r\ndocs](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-random-sampler-aggregation.html#random-sampler-inner-workings)\r\nI
understand \"If a query is provided, a document is returned if it
is\r\nmatched by the query and if the document is in the random
sampling. The\r\nsampling is not done over the matched documents.\" but
I don't see why\r\nthat affects the sub aggregation under the random
sample.~\r\n\r\n\r\n\r\n~I
hit this when playing with the `probability` setting, not sure if
I'm\r\nmissing something
stupid.~\r\n\r\n\r\n[Solved](https://github.com/elastic/kibana/pull/209095#discussion_r1940567855)\r\n\r\nOverall,
this does seem to work well. I've used this against ~250k
and\r\n~2.5million documents, and whilst (depending on time range /
runtime\r\nfields) it can still be slow, it seems to provide a better
experience\r\nthan hitting our API and holding the open connection.
Obviously it comes\r\nwith the downsides of sitting on the client (not
really sure it's a con,\r\nthese are platform services) and not using
the standard\r\n`streamsRepositoryClient`.\r\n\r\n## Other
changes\r\n\r\n- The core changes here are in the `use_async_sample`
hook, and where\r\nthat's consumed.\r\n\r\n- Runtime fields are not
generated for fields that are mapped.\r\n\r\n- I've also refactored the
routing index page so that components / hooks\r\nlive in their own files
(this makes the diff look bigger than it is)\r\n\r\n- Refactored some
logic around preview panel / preview panel\r\nillustration so that the
two branches of logic / conditionals now become\r\none.\r\n\r\n##
Followups\r\n\r\n- I haven't changed enrichment to use this or removed
the actual API\r\nroute as I figured this would need discussion first to
see if we want to\r\nuse
this.","sha":"97d0c1b2aeee10bdadede71b05691f8857c5fc2f"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/209095","number":209095,"mergeCommit":{"message":"[Streams]
Partitioning improvements (#209095)\n\n## Summary \r\n\r\nThis issue
predominantly tries to improve the situation around fetching\r\nand
showing samples. Some of the discussion can be seen
here:\r\nhttps://github.com/elastic/streams-program/issues/37#issuecomment-2605288052\r\n\r\nWe
have several issues - runtime fields are expensive (but needed
if\r\nfields aren't mapped), we are susceptible to timeouts depending
on\r\namount of data and timerange, getting exact document counts (for
match /\r\nnot matched counts) is expensive etc.\r\n\r\nAfter speaking
with Joe we decided it might be worth trying out async\r\nsearch, as
this alleviates some of these issues. E.g. the ability to\r\nload and
show partial results without trying to communicate this through\r\nour
API, or have to provide a potentially confusing UI around timeouts
/\r\nrunning to exhaustion options / toggles.\r\n\r\nRealistically we
only fetch 100 examples, but we might need to scan many\r\ndocuments to
gather that set of documents, I'm not 100% sure how often\r\nwe'll
actually hit partial results here, but it seems more robust
than\r\nworrying about timeouts.\r\n\r\nFor the matching counts I just
couldn't see a way to get an accurate\r\ncount without something
expensive (e.g. `track_total_hits`) so I've\r\ntried to use an
\"approximate match rate\" based on a random sample, that\r\nrandom
sample is then filtered to the condition to see what
approximate\r\npercent matched. One note: aggregations don't seem to
return partial\r\nresults (which makes sense I guess), you get the
interval polling\r\nrequests, but won't get a result until the end. I
did wonder if you\r\ncould do something smart with `track_total_hits`
and aggs to \"stream\"\r\npartial counts, I found a Slack thread saying
don't do this 😅\r\n\r\n⚠️ ~I'm not 100% sure what I'm missing here but I
have seen the filter\r\nsub aggregation come back with a doc_count that
is higher than the\r\nrandom sample.~\r\n\r\n~[From
the\r\ndocs](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-random-sampler-aggregation.html#random-sampler-inner-workings)\r\nI
understand \"If a query is provided, a document is returned if it
is\r\nmatched by the query and if the document is in the random
sampling. The\r\nsampling is not done over the matched documents.\" but
I don't see why\r\nthat affects the sub aggregation under the random
sample.~\r\n\r\n\r\n\r\n~I
hit this when playing with the `probability` setting, not sure if
I'm\r\nmissing something
stupid.~\r\n\r\n\r\n[Solved](https://github.com/elastic/kibana/pull/209095#discussion_r1940567855)\r\n\r\nOverall,
this does seem to work well. I've used this against ~250k
and\r\n~2.5million documents, and whilst (depending on time range /
runtime\r\nfields) it can still be slow, it seems to provide a better
experience\r\nthan hitting our API and holding the open connection.
Obviously it comes\r\nwith the downsides of sitting on the client (not
really sure it's a con,\r\nthese are platform services) and not using
the standard\r\n`streamsRepositoryClient`.\r\n\r\n## Other
changes\r\n\r\n- The core changes here are in the `use_async_sample`
hook, and where\r\nthat's consumed.\r\n\r\n- Runtime fields are not
generated for fields that are mapped.\r\n\r\n- I've also refactored the
routing index page so that components / hooks\r\nlive in their own files
(this makes the diff look bigger than it is)\r\n\r\n- Refactored some
logic around preview panel / preview panel\r\nillustration so that the
two branches of logic / conditionals now become\r\none.\r\n\r\n##
Followups\r\n\r\n- I haven't changed enrichment to use this or removed
the actual API\r\nroute as I figured this would need discussion first to
see if we want to\r\nuse
this.","sha":"97d0c1b2aeee10bdadede71b05691f8857c5fc2f"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
Co-authored-by: Kerry Gallagher <kerry.gallagher@elastic.co>
1 parent d931871 commit 47ccede
15 files changed
Lines changed: 1365 additions & 849 deletions
File tree
- x-pack/solutions/observability
- packages/kbn-streams-schema/src/helpers
- plugins
- streams_app/public
- components/stream_detail_routing
- hooks
- hooks/queries
- streams/server/routes/streams/management
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
12 | 10 | | |
| 11 | + | |
13 | 12 | | |
14 | | - | |
| 13 | + | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | 9 | | |
11 | | - | |
| 10 | + | |
12 | 11 | | |
| 12 | + | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
Lines changed: 6 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
9 | 14 | | |
10 | 15 | | |
11 | | - | |
12 | | - | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
| |||
Lines changed: 171 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
0 commit comments