Skip to content

Set the JVM file encoding to UTF-8. #5420

Merged
dlvenable merged 1 commit intoopensearch-project:mainfrom
dlvenable:5238-unicode
Feb 7, 2025
Merged

Set the JVM file encoding to UTF-8. #5420
dlvenable merged 1 commit intoopensearch-project:mainfrom
dlvenable:5238-unicode

Conversation

@dlvenable
Copy link
Copy Markdown
Member

Description

Sets the JVM default encoding to UTF-8 by including -Dfile.encoding=UTF-8 in the JVM arguments. With this, we use UTF-8 by default. I found that this fixes writing to S3 with Unicode data.

Some data I got out of S3 with this fix.

{"test":"😀!!ああ😀!!ああ","id":"abc11"}
{"test":"😀!!ああ😀!!ああ😀!!ああ","id":"😀3"}

Issues Resolved

Resolves #5238.

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: David Venable <dlv@amazon.com>
@dlvenable dlvenable merged commit 815ddc0 into opensearch-project:main Feb 7, 2025
@dlvenable dlvenable deleted the 5238-unicode branch February 10, 2025 21:51
chenqi0805 pushed a commit to chenqi0805/data-prepper that referenced this pull request Apr 2, 2025
…opensearch-project#5420)

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: George Chen <qchea@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] UTF-8 Character Encoding Issues in opensearchproject/data-prepper container

3 participants