[Data] Custom error handling for failed rows in dataset processing

### Description

Ideally, I should be able to control the behavior when just a single row fails - e.g. continue on error vs. interrupt, custom post-processing for row failures. Ideally the API looks something like:
```
build_llm_processor(
        processor_config,
        preprocess=lambda row: dict(
            payload=dict(
                messages=[
                    {
                        "role": "system",
                        "content": "foo"
                    },
                    {
                        "role": "user",
                        "content": fn(row)
                    },
                ],
                model="bar"
                temperature=0.1,
            )
        ),
        postprocess=lambda row: {
            **{key: value for key, value in row.items() if key != "http_response"},
            f"{name}_formatted_output": str(row["http_response"]['choices'][0]['message']['content'])
        },
       on_error: "continue",
       error_handler: error_handling_fn
)
```
where I define custom `error_handling_fn(row, e)` to control handling behavior. 

### Use case

I was using Ray Data LLM APIs to analyze a decent sized corpus of text data and there were a few transient issues with specific rows of data - for instance, dynamically generated prompt too large for context window of the LLM. This caused the whole pipeline to fail, several hours into processing. I would have liked to have specified the failure handling to continue on error and flag this in my corpus with `row['error'] = True`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Custom error handling for failed rows in dataset processing #52449

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data] Custom error handling for failed rows in dataset processing #52449

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions