Skip to content

Address Scale Items for Lambda Processor and Sink #5031

@srikanthjg

Description

@srikanthjg

Is your feature request related to a problem? Please describe.
It would be beneficial to have the ability to offload tasks asynchronously to AWS Lambda functions, especially when handling large volumes of data in Data Prepper. Currently, the synchronous Lambda invocation can limit concurrency and performance. Having an async client will allow Data Prepper to handle Lambda invocations concurrently, improving throughput and scalability.

Describe the solution you'd like
I propose adding support for

  1. AWS Lambda Async Client by default in Data Prepper's Lambda-related components. This will enable non-blocking Lambda invocations for more efficient handling of high throughput data streams. The LambdaAsyncClient from the AWS SDK will be integrated for all Lambda invocations, making the system more scalable.
  2. SDK defaults the connection timeout to 60secs. This means that if the lambda processing takes >60sec, the requests would fail causing all the records to drop. We should give this as a tunable parameter to the user.
  3. Address Acknowledgements for processor and sink. For Processor, we will need to handle the response cardinality:
    3.1. When a batch of N events that is configured by the user (N could be <=pipeline batch) ie, request to lambda contains N events in a batch , the lambda could return back N responses or M responses(N!=M). When N responses are sent as a json array, we resuse the original records and clear the old event data and populate it with the response from lambda, that way the acknowledgement set need not be changed.
    3.2 When M responses(N!=M), we create new events and populate them to the original acknowledgement set. The older events are also retained in the ack set but will be released by core later.
  4. Address failures at process the events, the events in the processor will be tagged and forwarded. This processor will NOT drop events on failure.
  5. Lambda sink will send to DLQ on failure and will acknowledge as true. If a dlq is not setup, we will send a negative acknowledgement.
  6. Add json codec for request and response.

Additional context
This enhancement allows for improved scalability, better error handling, and non-blocking invocations of Lambda functions, which is crucial for high-throughput systems.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions