[BUG] UTF-8 Character Encoding Issues in opensearchproject/data-prepper container

**Bug Description**
`opensearchproject/data-prepper` container image incorrectly handles UTF-8 characters when streaming data from DynamoDB to S3 buckets in NDJSON format. Non-ASCII characters are replaced with question marks (?) in the output files.

**Steps to Reproduce**
1. Set up data-prepper using the `opensearchproject/data-prepper` container image
2. Create a DynamoDB table with items containing strings with non-ASCII characters (e.g., Mandarin, Tamil)
3. Configure data-prepper to stream changes from the DynamoDB table to an S3 bucket using NDJSON format
4. Observe the resulting S3 objects

**Actual Behavior**
All non-ASCII characters in the original DynamoDB data are replaced with question marks (?) in the S3 output files.

**Expected Behavior**
All UTF-8 characters, including non-ASCII characters, should be preserved in the output NDJSON files exactly as they appear in the source DynamoDB table.

**Workaround**
Adding the environment variable `LC_ALL=C.UTF-8` to the container configuration resolves the issue. This environment variable should be set by default in the container image to ensure proper UTF-8 handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] UTF-8 Character Encoding Issues in opensearchproject/data-prepper container #5238

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] UTF-8 Character Encoding Issues in opensearchproject/data-prepper container #5238

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions