Hi folks,
My colleague @tstat and I were playing around with the library for the first time and we ran into a performance issue with a generated query that may or may not be totally expected. I understand that translating monadic syntax to SQL queries is perhaps going to be unavoidably "for-loopy" at times, so please feel free to close this issue as a #wontfix, although if that's the case I think we could put together some documentation that mention some known performance gotchas so that users won't necessarily have to examine plan output themselves.
Anyways, the query we were interested in writing with rel8 was a simple left-join. The schema is unimportant here, but let me know if it would be helpful to provide a repo that reproduces this issue exactly. (I think it will be clear enough from the following high-level description).
We have some students table, and students have first and last names, and we're interested in pairing students whose first name maybe matches the last name of another student:
select *
from students s1
left join students s2 on s1.first = s2.last;
This generates the following plan:
Hash Left Join (cost=2.35..4.78 rows=60 width=114)
Hash Cond: (s1.first = s2.last)
-> Seq Scan on students s1 (cost=0.00..1.60 rows=60 width=57)
-> Hash (cost=1.60..1.60 rows=60 width=57)
-> Seq Scan on students s2 (cost=0.00..1.60 rows=60 width=57)
In rel8, we translated this as:
s1 <- each studentsSchema
s2 <- optional do
s2 <- each studentsSchema
where_ (first s1 ==. last s2)
pure s2
pure (s1, s2)
which generated a messier version of the following query:
select *
from students s1
cross join lateral (
select *
from (select 0) t1
left outer join (
select *
from students s2
where s1.first = s2.last
) t2 on true
) t2;
Unfortunately, Postgres' optimizer does not seem to consider a hash join in this case, and falls back to a much more expensive nested loop join:
Nested Loop (cost=0.00..109.00 rows=60 width=118)
-> Seq Scan on students s1 (cost=0.00..1.60 rows=60 width=57)
-> Nested Loop Left Join (cost=0.00..1.77 rows=1 width=61)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Seq Scan on students s2 (cost=0.00..1.75 rows=1 width=57)
Filter: (s1.first = last)
Curiously, this is not merely due to the existence of a lateral join with a pushed-in where-clause. We tried rewriting the original SQL in this style:
select *
from students s1
left join lateral (
select *
from students s2
where s1.first = s2.last
) t2 on true;
and in this case Postgres did again consider the hash join:
Hash Left Join (cost=2.35..4.78 rows=60 width=114)
Hash Cond: (s1.first = s2.last)
-> Seq Scan on students s1 (cost=0.00..1.60 rows=60 width=57)
-> Hash (cost=1.60..1.60 rows=60 width=57)
-> Seq Scan on students s2 (cost=0.00..1.60 rows=60 width=57)
Hi folks,
My colleague @tstat and I were playing around with the library for the first time and we ran into a performance issue with a generated query that may or may not be totally expected. I understand that translating monadic syntax to SQL queries is perhaps going to be unavoidably "for-loopy" at times, so please feel free to close this issue as a #wontfix, although if that's the case I think we could put together some documentation that mention some known performance gotchas so that users won't necessarily have to examine plan output themselves.
Anyways, the query we were interested in writing with
rel8was a simple left-join. The schema is unimportant here, but let me know if it would be helpful to provide a repo that reproduces this issue exactly. (I think it will be clear enough from the following high-level description).We have some
studentstable, and students havefirstandlastnames, and we're interested in pairing students whose first name maybe matches the last name of another student:This generates the following plan:
In
rel8, we translated this as:which generated a messier version of the following query:
Unfortunately, Postgres' optimizer does not seem to consider a hash join in this case, and falls back to a much more expensive nested loop join:
Curiously, this is not merely due to the existence of a lateral join with a pushed-in where-clause. We tried rewriting the original SQL in this style:
and in this case Postgres did again consider the hash join: