Skip to content

Report multiple errors, not just the first one #13676

@eliaperantoni

Description

@eliaperantoni

Is your feature request related to a problem or challenge?

In the following query there are 4 distinct errors:

WITH users AS (
	SELECT 1 AS id, 'John' AS name
)
SELECT 'id:' + idd, name FROM userss GROUP BY id;
  1. userss doesn't exist
  2. idd doesn't exist
  3. Can't add a string to a number
  4. name is missing from GROUP BY

DataFusion currently reports only one of those error when you try to execute the query. After you solve one, you can try again and get the next error.

This can be a bit frustrating for the end user because it requires many iterations of a (possibly expensive and slow) parsing and planning step. Furthermore, reporting multiple errors would make it possible to develop an LSP on top of DataFusion and such.

The desired feature is for DataFusion to report as many errors as possible in one go.

Describe the solution you'd like

The world of programmatic language does this quite well, I think. Take rustc for example: you can get tens of errors in one go and fix them all before invoking an expensive compilation again.

I think we should take inspiration from the way these compilers do it, e.g. panic mode and synchronization. See here for an introduction https://craftinginterpreters.com/parsing-expressions.html#panic-mode-error-recovery.

The way it could work is: when parsing or planning for the SelectItems in a Select, we catch any error coming from one of the SelectItem, store it in a local variable, and proceed with the next. Then if there were any errors, we return their collection. We could add a DataFusionError::Many(Vec<DataFusionError>) to represent this.

The same idea of "storing the error for later, synchronising to the next safe point, and continuing" could also be applied when parsing or planning for different parts of a query (e.g. the CTEs, the SELECT, the WHERE, the ORDER BY, etc. After any error in the CTEs section, we can continue with the SELECT and collect the errors there, then move on to the WHERE, etc), and also when analysing different Statements.

Describe alternatives you've considered

No response

Additional context

This is related to issue #13662 and my PR about diagnostics #13664. I'd be open to work on this issue too if the contributions would be welcomed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions