Generated left-join is too complicated for Postgres to plan well

Hi folks, 

My colleague @tstat and I were playing around with the library for the first time and we ran into a performance issue with a generated query that may or may not be totally expected. I understand that translating monadic syntax to SQL queries is perhaps going to be unavoidably "for-loopy" at times, so please feel free to close this issue as a #wontfix, although if that's the case I think we could put together some documentation that mention some known performance gotchas so that users won't necessarily have to examine plan output themselves.

Anyways, the query we were interested in writing with `rel8` was a simple left-join. The schema is unimportant here, but let me know if it would be helpful to provide a repo that reproduces this issue exactly. (I think it will be clear enough from the following high-level description).

We have some `students` table, and students have `first` and `last` names, and we're interested in pairing students whose first name maybe matches the last name of another student:

```sql
select *
from students s1
left join students s2 on s1.first = s2.last;
```

This generates the following plan:

```
 Hash Left Join  (cost=2.35..4.78 rows=60 width=114)
   Hash Cond: (s1.first = s2.last)
   ->  Seq Scan on students s1  (cost=0.00..1.60 rows=60 width=57)
   ->  Hash  (cost=1.60..1.60 rows=60 width=57)
         ->  Seq Scan on students s2  (cost=0.00..1.60 rows=60 width=57)
```

In `rel8`, we translated this as:

```haskell
s1 <- each studentsSchema
s2 <- optional do
  s2 <- each studentsSchema
  where_ (first s1 ==. last s2)
  pure s2
pure (s1, s2)
```

which generated a messier version of the following query:

```sql
select *
from students s1
cross join lateral (
  select *
  from (select 0) t1
  left outer join (
    select *
    from students s2
    where s1.first = s2.last
  ) t2 on true
) t2;
```

Unfortunately, Postgres' optimizer does not seem to consider a hash join in this case, and falls back to a much more expensive nested loop join:

```
 Nested Loop  (cost=0.00..109.00 rows=60 width=118)
   ->  Seq Scan on students s1  (cost=0.00..1.60 rows=60 width=57)
   ->  Nested Loop Left Join  (cost=0.00..1.77 rows=1 width=61)
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
         ->  Seq Scan on students s2  (cost=0.00..1.75 rows=1 width=57)
               Filter: (s1.first = last)
```

Curiously, this is _not_ merely due to the existence of a lateral join with a pushed-in where-clause. We tried rewriting the original SQL in this style:

```sql
select *
from students s1
left join lateral (
  select *
  from students s2
  where s1.first = s2.last
) t2 on true;
```

and in this case Postgres did again consider the hash join:

```
 Hash Left Join  (cost=2.35..4.78 rows=60 width=114)
   Hash Cond: (s1.first = s2.last)
   ->  Seq Scan on students s1  (cost=0.00..1.60 rows=60 width=57)
   ->  Hash  (cost=1.60..1.60 rows=60 width=57)
         ->  Seq Scan on students s2  (cost=0.00..1.60 rows=60 width=57)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated left-join is too complicated for Postgres to plan well #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generated left-join is too complicated for Postgres to plan well #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions