ESQL: INLINESTATS docs#134480
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/core-docs (Team:Docs) |
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
bpintea
left a comment
There was a problem hiding this comment.
Thanks Andrei for taking on the docs!
| The command is identical to [`STATS`](/reference/query-languages/esql/commands/stats-by.md) except that it does not reduce | ||
| the number of columns in the output table. |
There was a problem hiding this comment.
Alternative / optional:
| The command is identical to [`STATS`](/reference/query-languages/esql/commands/stats-by.md) except that it does not reduce | |
| the number of columns in the output table. | |
| The command is identical to [`STATS`](/reference/query-languages/esql/commands/stats-by.md) except that it preserves all the columns from the input table. |
| INLINE STATS [column1 =] expression1 [WHERE boolean_expression1][, | ||
| ..., | ||
| [columnN =] expressionN [WHERE boolean_expressionN]] | ||
| [BY grouping_expression1[, ..., grouping_expressionN]] |
There was a problem hiding this comment.
This is inherited, but [BY grouping_expression1[, ..., grouping_expressionN]] isn't as "detailed" as it could/should be:
[BY [grouping_name1 =] grouping_expression1[,
...,
[grouping_nameN = ] grouping_expressionN]]
This is relevant below [§].
There was a problem hiding this comment.
Indeed, you are right. I've changed it.
|
|
||
| `grouping_expressionX` | ||
| : An expression that outputs the values to group by. | ||
| If its name coincides with one of the computed columns, that column will be ignored. |
There was a problem hiding this comment.
[§] In this case, the "name collision" might not be clear, since, as given, that's en expression.
Also, for INLINE STATS we have the non-computed columns.
| If its name coincides with one of the computed columns, that column will be ignored. | |
| If its name coincides with one of the existing or computed columns, that column will be overridden by this one. |
There was a problem hiding this comment.
Another option: If the name matches an existing or computed column, this new column will replace it.
There was a problem hiding this comment.
Thank you both. I've used @bpintea's suggestion.
| :::{include} ../examples/inlinestats.csv-spec/avg-salaries-where.md | ||
| ::: | ||
|
|
||
| Specifying the output column name is optional. If not specified, the new column |
There was a problem hiding this comment.
Optional: I'd drop this line. It's specified already in the synopsis and unlike for STATS, we don't add an example here.
|
|
||
| **Limitations** | ||
|
|
||
| - [`CATEGORIZE`](/reference/query-languages/esql/functions-operators/grouping-functions.md#esql-categorize) grouping function is not |
There was a problem hiding this comment.
| - [`CATEGORIZE`](/reference/query-languages/esql/functions-operators/grouping-functions.md#esql-categorize) grouping function is not | |
| - The [`CATEGORIZE`](/reference/query-languages/esql/functions-operators/grouping-functions.md#esql-categorize) grouping function is not |
|
|
||
|
|
||
| The [`STATS`](/reference/query-languages/esql/commands/stats-by.md) command supports these aggregate functions: | ||
| The [`STATS`](/reference/query-languages/esql/commands/stats-by.md) and [`INLINE STATS`](/reference/query-languages/esql/commands/inlinestats-by.md) command supports these aggregate functions: |
There was a problem hiding this comment.
| The [`STATS`](/reference/query-languages/esql/commands/stats-by.md) and [`INLINE STATS`](/reference/query-languages/esql/commands/inlinestats-by.md) command supports these aggregate functions: | |
| The [`STATS`](/reference/query-languages/esql/commands/stats-by.md) and [`INLINE STATS`](/reference/query-languages/esql/commands/inlinestats-by.md) commands support these aggregate functions: |
leemthompo
left a comment
There was a problem hiding this comment.
Looks good, few (mainly language clarification) suggestions and couple questions from me :)
docs/redirects.yml
Outdated
| - to: 'reference/query-languages/esql/commands/inlinestats-by.md' | ||
| anchors: {'esql-inlinestats-by'} |
There was a problem hiding this comment.
Don't think you need this redirect, this was only required when we broke out the commands into standalone sub-pages, but this is a brand new page
|
|
||
| `grouping_expressionX` | ||
| : An expression that outputs the values to group by. | ||
| If its name coincides with one of the computed columns, that column will be ignored. |
There was a problem hiding this comment.
Another option: If the name matches an existing or computed column, this new column will replace it.
| If its name coincides with one of the computed columns, that column will be ignored. | ||
|
|
||
| `boolean_expressionX` | ||
| : The condition that must be met for a row to be included in the evaluation of `expressionX`. |
There was a problem hiding this comment.
| : The condition that must be met for a row to be included in the evaluation of `expressionX`. | |
| : The condition that determines which rows are included when evaluating `expressionX`. |
| The `INLINE STATS` processing command groups rows according to a common value | ||
| (what comes after `BY`) and calculates one or more aggregated values over the | ||
| grouped rows. The output table contains the same number of rows as the input | ||
| table and the command just adds new columns or overrides any existent ones with | ||
| the same name to the result. The resulting calculated values are matched to the | ||
| input rows according to the common value(s) (also known as grouping key(s)). |
There was a problem hiding this comment.
| The `INLINE STATS` processing command groups rows according to a common value | |
| (what comes after `BY`) and calculates one or more aggregated values over the | |
| grouped rows. The output table contains the same number of rows as the input | |
| table and the command just adds new columns or overrides any existent ones with | |
| the same name to the result. The resulting calculated values are matched to the | |
| input rows according to the common value(s) (also known as grouping key(s)). | |
| The `INLINE STATS` processing command groups rows according to a common value | |
| (also known as the grouping key), specified after `BY`, and calculates one or more | |
| aggregated values over the grouped rows. The output table contains the same | |
| number of rows as the input table. The command only adds new columns or overrides existing columns with the same name as the result. |
suggestion for concision
| In case there are overlapping column names between the newly added columns and the | ||
| existing ones, besides overriding the existing columns, there can be a change in | ||
| the column order. The new columns are added/moved so that they appear in the order | ||
| they are defined in the `INLINE STATS` command. |
There was a problem hiding this comment.
| In case there are overlapping column names between the newly added columns and the | |
| existing ones, besides overriding the existing columns, there can be a change in | |
| the column order. The new columns are added/moved so that they appear in the order | |
| they are defined in the `INLINE STATS` command. | |
| When using a `BY` clause, columns are reordered to match the structure of the | |
| `INLINE STATS` command - calculated columns appear first, followed by grouping | |
| columns. |
I might (probably) have misunderstood, but are naming collisions + overrides separate to column reordering? If so my suggestion tries to clarify that, otherwise, just ignore :)
There was a problem hiding this comment.
In my mind, the original wording explains inline stats behavior better. But there is a small change I need to make to the original text still.
besides overriding the existing columns, there can be
to
besides overriding the existing columns values, there can be
There was a problem hiding this comment.
I'd simplify the first sentence anyway for readability, maybe something like:
- In case there are overlapping column names between the newly added columns and the existing ones, besides overriding the existing columns, there can be a change in the column order.
+ If column names overlap, existing column values may be overridden and column order may change.There was a problem hiding this comment.
Yep, much better. Changed.
| Calculating a statistic and grouping by the values of another column; note also | ||
| that `languages` column is moved as the last column in the output since it is | ||
| used as grouping key (the `KEEP` command before `INLINE STATS` had `languages` | ||
| set as the second column): |
There was a problem hiding this comment.
| Calculating a statistic and grouping by the values of another column; note also | |
| that `languages` column is moved as the last column in the output since it is | |
| used as grouping key (the `KEEP` command before `INLINE STATS` had `languages` | |
| set as the second column): | |
| The following example shows how to calculate a statistic on one column and group | |
| by the values of another column. | |
| :::{note} | |
| The `languages` column moves to the last position in the output table because it is | |
| the grouping key. | |
| ::: |
There was a problem hiding this comment.
attempt to make the language flow clearer, take what you need if feel it isn't quite accurate!
| Omitting `BY` calculates the aggregation applied over the entire dataset, the | ||
| order of the existent columns is preserved and a new column with the calculated | ||
| maximum salary value is added as the last column: |
There was a problem hiding this comment.
| Omitting `BY` calculates the aggregation applied over the entire dataset, the | |
| order of the existent columns is preserved and a new column with the calculated | |
| maximum salary value is added as the last column: | |
| The following example shows how to calculate an aggregation over the entire dataset | |
| by omitting `BY`. The order of the existing columns is preserved and a new column | |
| with the calculated maximum salary value is added as the last column: |
| :::{include} ../examples/inlinestats.csv-spec/max-salary-without-by.md | ||
| ::: | ||
|
|
||
| It’s possible to calculate multiple values in more complex queries: |
There was a problem hiding this comment.
| It’s possible to calculate multiple values in more complex queries: | |
| The following example shows how to calculate multiple aggregations with multiple grouping keys: |
| :::{include} ../examples/inlinestats.csv-spec/multi-agg-multi-grouping.md | ||
| ::: | ||
|
|
||
| To filter the rows that go into an aggregation, use the `WHERE` clause: |
There was a problem hiding this comment.
| To filter the rows that go into an aggregation, use the `WHERE` clause: | |
| The following example shows how to filter which rows are used for each aggregation, using the `WHERE` clause: |
| {{esql}} only supports the UTC timezone. | ||
|
|
||
|
|
||
| ## INLINE STATS limitations [esql-limitations-inlinestats] |
There was a problem hiding this comment.
Do we need to duplicate limitations here? No strong opinions from me.
There was a problem hiding this comment.
I have a preference for doing it like this, having some experience with users interaction for ES EQL and ES SQL projects where a separate Limitations page helped Support reason much better about some specific things not being available in those projects.
But, if there are other principles behind documentation for ES|QL, I can definitely follow them. Just le me know.
| and calculates one or more aggregated values over the grouped rows. The results | ||
| are appended as new columns to the input rows. | ||
|
|
||
| The command is identical to [`STATS`](/reference/query-languages/esql/commands/stats-by.md) except that it does not reduce |
There was a problem hiding this comment.
Should we mention (somewhere) that the STATS will break down MVs fields it groups by, by individual SVs, while INLINE STATS links back to the MV field?
I'm referring to the behaviour noted in this test (which, IMO is correct, answering to the question there).
Though maybe that's intuitive.
There was a problem hiding this comment.
Good point. I left this with no MV specification because I regarded that test result something to discuss further.
I added a mention in the csv-spec file about docs update when/if we decide on the MV behavior.
…search into inlinestats_docs
…inlinestats_docs
Fixes #124718