Remove uses of Charset name parsing by rjernst · Pull Request #85795 · elastic/elasticsearch

rjernst · 2022-04-11T21:33:35Z

There are many places in Elasticsearch which must decode some stream of
bytes into characters. Most of the time this is expected to be UTF-8
encoded data, and we hardcode that charset name. However, methods in the
JDK that take a String charset name require catching
UnsupportedEncodingException. Yet most of these APIs also has a variant
of the same methods which take a known Charset instance, for which we
can use StandardCharsets.UTF_8. This commit converts most instances of
passing string charset names to use a Charset instance.

There are many places in Elasticsearch which must decode some stream of bytes into characters. Most of the time this is expected to be UTF-8 encoded data, and we hardcode that charset name. However, methods in the JDK that take a String charset name require catching UnsupportedEncodingException. Yet most of these APIs also has a variant of the same methods which take a known Charset instance, for which we can use StandardCharsets.UTF_8. This commit converts most instances of passing string charset names to use a Charset instance.

elasticmachine · 2022-04-11T21:33:39Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

pugnascotia

LGTM. Is there any scope for adding some forbidden APIs rules to avoid more instances being added?

rjernst · 2022-04-12T19:05:17Z

Unfortunately this doesn't quite get rid of all of the uses because there are some older apis that simply don't have methods taking Charset. I will separately investigate if we can forbid the ones I can find that do take Charset.

* upstream/master: (40 commits) Fix BuildTests serialization (elastic#85827) Use urgent priority for node shutdown cluster state update (elastic#85838) Remove Task classes from HLRC (elastic#85835) Remove unused migration classes (elastic#85834) Remove uses of Charset name parsing (elastic#85795) Remove legacy versioned logic for DefaultSystemMemoryInfo (elastic#85761) Expose proxy settings for GCS repositories (elastic#85785) Remove SLM classes from HLRC (elastic#85825) TSDB: fix the time_series in order collect priority (elastic#85526) Remove ILM classes from HLRC (elastic#85822) FastVectorHighlighter should use ValueFetchers to load source data (elastic#85815) Iteratively execute synchronous ingest processors (elastic#84250) Remove TransformClient from HLRC (elastic#85787) Mute XPackRestIT deprecation/10_basic/Test Deprecations (elastic#85807) Unmute Lintian packaging test (elastic#85778) Add a highlighter unit test base class (elastic#85719) Remove NIO Transport Plugin (elastic#82085) [TEST] Remove token methods from HLRC SecurityClient (elastic#85515) [Test] Use thread-safe hashSet for result collection (elastic#85653) [TEST] Mute BuildTests.testSerialization (elastic#85801) ... # Conflicts: # server/src/test/java/org/elasticsearch/search/aggregations/timeseries/TimeSeriesIndexSearcherTests.java

ChrisHegarty

LGTM.

…n/elasticsearch into datastream-reuse-pipeline-source * 'datastream-reuse-pipeline-source' of github.com:weizijun/elasticsearch: (28 commits) Add JDK 19 to Java testing matrix [ML] add nlp config update serialization tests (elastic#85867) [ML] A text categorization aggregation that works like ML categorization (elastic#80867) [ML] Fix serialisation of text embedding updates (elastic#85863) TSDB: fix wrong initial value of tsidOrd in TimeSeriesIndexSearcher (elastic#85713) Enforce external id uniqueness during DesiredNode construction (elastic#84227) Fix Intellij integration (elastic#85866) Upgrade Azure SDK to version 12.14.4 (elastic#83884) [discovery-gce] Fix initialisation of transport in FIPS mode (elastic#85817) Remove unnecessary docs/changelog/85534.yaml Prevent ThreadContext header leak when sending response (elastic#68649) Add support for impact_areas to health impacts (elastic#85830) Reduce port range re-use in tests (elastic#85777) Fix TranslogTests#testStats (elastic#85828) Remove hppc from cat allocation api (elastic#85842) Fix BuildTests serialization (elastic#85827) Use urgent priority for node shutdown cluster state update (elastic#85838) Remove Task classes from HLRC (elastic#85835) Remove unused migration classes (elastic#85834) Remove uses of Charset name parsing (elastic#85795) ...

rjernst added :Core/Infra/Core Core issues without another label >refactoring v8.3.0 labels Apr 11, 2022

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Apr 11, 2022

spotless

f92997f

pugnascotia approved these changes Apr 12, 2022

View reviewed changes

rjernst merged commit f0d0c37 into elastic:master Apr 12, 2022

rjernst deleted the utf8_charset branch April 12, 2022 19:05

ChrisHegarty reviewed Apr 13, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove uses of Charset name parsing#85795

Remove uses of Charset name parsing#85795
rjernst merged 2 commits intoelastic:masterfrom
rjernst:utf8_charset

rjernst commented Apr 11, 2022

Uh oh!

elasticmachine commented Apr 11, 2022

Uh oh!

pugnascotia left a comment

Uh oh!

rjernst commented Apr 12, 2022

Uh oh!

ChrisHegarty left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rjernst commented Apr 11, 2022

Uh oh!

elasticmachine commented Apr 11, 2022

Uh oh!

pugnascotia left a comment

Choose a reason for hiding this comment

Uh oh!

rjernst commented Apr 12, 2022

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants