[META] Error Handling Enhancements

This is a meta issue collecting a few different ideas from different sources.

The core problem is that misbehaving queries are typically hard to debug & require a lot of knowledge of PPL/SQL engine behavior, and there's room for enhancement.

Discussing the concept with @anasalkouz, there are three main problem classes we want to enhance:

1. Something went wrong (execution error)
2. 0 results are returned, why?
3. The query is slow, why?

So, compiling existing issues for the backend:
- https://github.com/opensearch-project/sql/issues/4919 is the backend for the first one, it introduces a general reporting interface that can integrate with our current exceptions and add context. We can incrementally drill into error classes over time.
- https://github.com/opensearch-project/sql/issues/4343 can give the foundation for the latter 2, by providing analyze metrics for results returned & timings at all stages.

Once those have reasonably complete implementations, errors will be in structured reports like this:

```json
{
  "status": 400,
  "error": {
    "type": "SemanticCheckException",
    "code": "FIELD_NOT_FOUND",
    "reason": "Invalid Query",
    "details": "Failed to resolve field 'foo'",
    "location": [
      "while planning the query",
      "while resolving fields in the index mapping"
    ],
    "context": {
      "index_pattern": "logs-*",
      "position": {"line": 1, "column": 25},
      "query": "source=logs-* | fields foo",
      "query_id": "b6627794-3939-4ac4-8c5b-821ccc400f4f"
    },
    "suggestion": "Did you mean: 'foobar'"
  }
}
```

Frontends can choose to do whatever they want with this: render specific fixed pieces of context (e.g. `position` is pointing to a spot in the query, highlight it?), render details/locations/suggestions, throw it in an LLM, etc. It also helps with oncall debugging when given these responses (either directly or from har files). https://github.com/opensearch-project/sql/issues/4919#issuecomment-3863005661 shows me doing this quickly for the SQL CLI based on a proof-of-concept implementation.

From the frontend, there's a separate meta issue:
- https://github.com/opensearch-project/OpenSearch-Dashboards/issues/11577 is the catch-all for integrating with the reporting interface in the frontend.
- Explain-analyze --> debug will be a call to action when a query runs either over some threshold (1000ms?) or with 0 results.

Once the core flows here are done, what's left is to start vetting specific error cases:
- #4771 
- #4872 
- #4869 
- #4896 
- #5065 
- Field not found (issue pending)
- Suggest syntax rewrites (#5262 is a special case but it'd be nice to do it in the general case)
- Zero-result & slow query explanations on the frontend
- Optionally: [anything else with the error-experience label](https://github.com/opensearch-project/sql/issues?q=sort%3Aupdated-desc%20is%3Aissue%20is%3Aopen%20label%3Aerror-experience). Feel free to suggest some!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] Error Handling Enhancements #5261

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[META] Error Handling Enhancements #5261

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions