Skip to content

feat(api) Add support for column list in aggregate functions#408

Merged
markstory merged 2 commits intomasterfrom
aggregation-schema
Aug 23, 2019
Merged

feat(api) Add support for column list in aggregate functions#408
markstory merged 2 commits intomasterfrom
aggregation-schema

Conversation

@markstory
Copy link
Member

We have a use-case in sentry where we want to use argMax() which requires multiple columns to work. While this function is allowed in selected_columns it is actually an aggregate value and should probably be in aggregations. This expands the schema and query builder to allow lists of columns in aggregations functions.

This may all be a bad idea as I'm a snuba noob.

Refs getsentry/sentry#14321

@tkaemming tkaemming requested a review from a team August 12, 2019 17:17
Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm my understanding of this PR is right.

you want to ask Snuba

....[argMax, [(a,b,c), d]]

and, as of now, a,b,c columns would not be processed because they are one parameter of your function but the tuple does not get decomposed into columns.
Is that correct ?

snuba/util.py Outdated
return complex_column_expr(dataset, column_name, body)
elif isinstance(column_name, str) and QUOTED_LITERAL_RE.match(column_name):
return escape_literal(column_name[1:-1])
elif isinstance(column_name, (list, tuple)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this case into complex_column_expr since this method is meant to deal with one individual column.

@markstory
Copy link
Member Author

Just to confirm my understanding of this PR is right.

you want to ask Snuba

....[argMax, [(a,b,c), d]]

and, as of now, a,b,c columns would not be processed because they are one parameter of your function but the tuple does not get decomposed into columns.
Is that correct ?

I want to be able to ask for ['argMax', ['id', 'timestamp'], 'latest_event'] in the aggregations list, but the current jsonschema does not allow that as ['id', 'timestamp'] is not a single column. I am hoping to enable this use case but not attempting to enable nested function calls as I don't currently have a use-case for them.

@fpacifici
Copy link
Contributor

Did you try this way ?

['argMax', [['id', 'timestamp'], 'latest_event']]

I suspect that should be trasformed into argaMax(('id', 'timestamp'), 'latest_event')

@markstory
Copy link
Member Author

markstory commented Aug 14, 2019

I didn't try argMax((id, timestamp), latest_event)as isn't going to do the right thing in clickhouse. The sql I would like to end up with is argMax(id, timestamp) AS latest_event. Right now I'm able to do that by passing the aggregate function with arguments as a string via ['argMax(id, timestamp)', '', 'latest_event'] but I thought that was kludgey and wanted to try and improve the aggregations parsing in snuba.

@fpacifici
Copy link
Contributor

I see.
@tkaemming do you have any context on why aggregates field is not processed like groupby or select by calling column_expr on every statement ?

groupby and selected columns are processed this way:
https://github.com/getsentry/snuba/blob/master/snuba/api.py#L211
Which resolves columns like ["funciton", ["p1", "p2"], "alias"] as a function.

While aggregations are processed differently:
https://github.com/getsentry/snuba/blob/master/snuba/api.py#L204-L206

By expecting that an aggregate function will take one parameter only.

Do you know if this is intentional ?

@tkaemming
Copy link
Contributor

@tkaemming do you have any context on why aggregates field is not processed like groupby or select by calling column_expr on every statement ?

I'm not sure to be honest — this (specifically the inconsistencies between select_columns, groupby, and aggregations) is something that has seemed strange to me for a while, but I haven't had a chance to read into it. I expect that I will be in the next couple of days, though…

@fpacifici
Copy link
Contributor

So, while we figure this out, @markstory , could you please move the logic that manages this use case into complex_column_expr. We could at least try to be consistent with what column_expr does (manage one column) and what complex_column_expr does.

@markstory
Copy link
Member Author

@fpacifici I've had to retain the top-level condition in column_expr, but by delegating to complex_column_expr I've been able to remove a bit of code. Is this what you had in mind, or were you hoping to see aggregations covered by the existing cases that column_expr has?

if is_function(column_name, 0):
return complex_column_expr(dataset, column_name, body)
elif isinstance(column_name, (list, tuple)) and aggregate:
return complex_column_expr(dataset, [aggregate, column_name, alias], body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work if you just call this directly from api.py skipping column_expr ? Ideally this would work like for line 149 for functions, but for some reason we are already unpacking the expression before calling column_expr for aggregates so we cannot do that.

Copy link
Member Author

@markstory markstory Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going directly to complex_column_expr from api.py ends up breaking aggregation handling for things like ['topK(4)', 'issue', 'aggregate'] as the topK(4) pattern isn't recognized as a function, until it gets into function_expr which isn't in use by complex_column_expr.

We have a use-case in sentry where we want to use `argMax()` which
requires multiple columns to work. While this function is allowed in
`selected_columns` it is actually an aggregate value and should probably
be in `aggregations`. This expands the schema and query builder to allow
lists of columns in aggregations functions.

Refs getsentry/sentry#14321
@markstory markstory merged commit 145f168 into master Aug 23, 2019
@markstory markstory deleted the aggregation-schema branch August 23, 2019 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants