Aggregation overhaul — return to Profunctor and semi-aggregations#235
Merged
shane-circuithub merged 1 commit intomasterfrom Jun 18, 2023
Merged
Aggregation overhaul — return to Profunctor and semi-aggregations#235shane-circuithub merged 1 commit intomasterfrom
shane-circuithub merged 1 commit intomasterfrom
Conversation
868407a to
2c138c7
Compare
2c138c7 to
256ba2b
Compare
256ba2b to
144366d
Compare
This PR makes a number of changes to how aggregation works in Rel8. The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like: ```haskell fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do foo <- each fooSchema pure (sum foo.x, count foo.x) ``` The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case. The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`. The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly. Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s. This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`. It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former). Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`. Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions. `filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s. `orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
144366d to
e28cc31
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes a number of changes to how aggregation works in Rel8.
The biggest change is that we drop the
Aggregatecontext and we return to theProfunctor-basedAggregatorthat Opaleye uses (as in #37). While working withProfunctors is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have theavgfunction built-in. With the previous Rel8, there is no way to directly writesum(x) / count(x). The best you could do would something like:The key thing is that the mapping can only happen after
aggregateis called. Whereas with theProfunctor-basedAggregatorthis is just(/) <$> sum <*> fmap fromIntegral count. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after theaggregate. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is thelistAggExprfunction. The only reason Rel8 exports this is because it can't be directly expressed in terms oflistAgg. With theProfunctor-basedAggregatorit can be, it's just(id $*) <$> listAgg, it no longer needs to be a special case.The original attempt in #37 recognised that it can be awkward to have to write
lmap (.x) sum, so instead of sum having the type signatureAggregator (Expr a) (Expr a), it had the type signature(i -> Expr a) -> Aggregator i (Expr a), so that you wouldn't have to uselmap, you could just typesum (.x). However, there are many ways to composeAggregators — for example, if you wanted to use combinators fromproduct-profunctorto combine aggregators, then you'd rather typesum ***! countthansum id ***! count id. So in this PR we keep the type ofsumasAggregator (Expr a) (Expr a), but we also exportsumOn, which has the bundledlmap.The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.
Up until now, the
aggrgegatecombinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code likefmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _. Again, we "know" that0is the identity value forsumand we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has bothAggregatorandAggregator1, with the former having identity values and the latter not. Theaggregatefunction now takes anAggregatorand returns the identity value when encountering zero rows, whereas theaggregate1function takes anAggregator1and behaves as before.count,sum,and,or,listAggareAggregators (with the identity values0,0,true,falseandlistTable []respectively) andgroupBy,maxandminareAggregator1s.This also means that
manyis now justaggregate listAgginstead offmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg.It should also be noted that these functions are actually polymorphic —
sumwill actually give you anAggregator'that can be used as eitherAggregatororAggregator1without needing to explicitly convert between them. Similarlyaggregate1can take either anAggegatoror anAggregator1(though it won't use the identity value of the former).Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced —
distinctAggregate,filterWhereandorderAggregateBy.Opaleye itself already supported
distinctAggregateand indeed we used this to implementcountDistinctas a special case, but we now support usingDISTINCTon arbitrary aggregation functions.filterWhereis new to both Rel8 and Opaleye. It corresponds to PostgreSQL'sFILTER (WHERE ...)syntax in aggregations. It also uses the identity value of anAggregatorin the case where the given predicate returns zero rows. There is alsofilterWhereOptionalwhich can be used withAggregator1s.orderAggregateByallows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions likelistAgg.