Remove uses of Charset name parsing#85795
Merged
rjernst merged 2 commits intoelastic:masterfrom Apr 12, 2022
Merged
Conversation
There are many places in Elasticsearch which must decode some stream of bytes into characters. Most of the time this is expected to be UTF-8 encoded data, and we hardcode that charset name. However, methods in the JDK that take a String charset name require catching UnsupportedEncodingException. Yet most of these APIs also has a variant of the same methods which take a known Charset instance, for which we can use StandardCharsets.UTF_8. This commit converts most instances of passing string charset names to use a Charset instance.
Collaborator
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
pugnascotia
approved these changes
Apr 12, 2022
Contributor
pugnascotia
left a comment
There was a problem hiding this comment.
LGTM. Is there any scope for adding some forbidden APIs rules to avoid more instances being added?
Member
Author
|
Unfortunately this doesn't quite get rid of all of the uses because there are some older apis that simply don't have methods taking Charset. I will separately investigate if we can forbid the ones I can find that do take Charset. |
weizijun
added a commit
to weizijun/elasticsearch
that referenced
this pull request
Apr 13, 2022
* upstream/master: (40 commits) Fix BuildTests serialization (elastic#85827) Use urgent priority for node shutdown cluster state update (elastic#85838) Remove Task classes from HLRC (elastic#85835) Remove unused migration classes (elastic#85834) Remove uses of Charset name parsing (elastic#85795) Remove legacy versioned logic for DefaultSystemMemoryInfo (elastic#85761) Expose proxy settings for GCS repositories (elastic#85785) Remove SLM classes from HLRC (elastic#85825) TSDB: fix the time_series in order collect priority (elastic#85526) Remove ILM classes from HLRC (elastic#85822) FastVectorHighlighter should use ValueFetchers to load source data (elastic#85815) Iteratively execute synchronous ingest processors (elastic#84250) Remove TransformClient from HLRC (elastic#85787) Mute XPackRestIT deprecation/10_basic/Test Deprecations (elastic#85807) Unmute Lintian packaging test (elastic#85778) Add a highlighter unit test base class (elastic#85719) Remove NIO Transport Plugin (elastic#82085) [TEST] Remove token methods from HLRC SecurityClient (elastic#85515) [Test] Use thread-safe hashSet for result collection (elastic#85653) [TEST] Mute BuildTests.testSerialization (elastic#85801) ... # Conflicts: # server/src/test/java/org/elasticsearch/search/aggregations/timeseries/TimeSeriesIndexSearcherTests.java
weizijun
added a commit
to weizijun/elasticsearch
that referenced
this pull request
Apr 13, 2022
…n/elasticsearch into datastream-reuse-pipeline-source * 'datastream-reuse-pipeline-source' of github.com:weizijun/elasticsearch: (28 commits) Add JDK 19 to Java testing matrix [ML] add nlp config update serialization tests (elastic#85867) [ML] A text categorization aggregation that works like ML categorization (elastic#80867) [ML] Fix serialisation of text embedding updates (elastic#85863) TSDB: fix wrong initial value of tsidOrd in TimeSeriesIndexSearcher (elastic#85713) Enforce external id uniqueness during DesiredNode construction (elastic#84227) Fix Intellij integration (elastic#85866) Upgrade Azure SDK to version 12.14.4 (elastic#83884) [discovery-gce] Fix initialisation of transport in FIPS mode (elastic#85817) Remove unnecessary docs/changelog/85534.yaml Prevent ThreadContext header leak when sending response (elastic#68649) Add support for impact_areas to health impacts (elastic#85830) Reduce port range re-use in tests (elastic#85777) Fix TranslogTests#testStats (elastic#85828) Remove hppc from cat allocation api (elastic#85842) Fix BuildTests serialization (elastic#85827) Use urgent priority for node shutdown cluster state update (elastic#85838) Remove Task classes from HLRC (elastic#85835) Remove unused migration classes (elastic#85834) Remove uses of Charset name parsing (elastic#85795) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There are many places in Elasticsearch which must decode some stream of
bytes into characters. Most of the time this is expected to be UTF-8
encoded data, and we hardcode that charset name. However, methods in the
JDK that take a String charset name require catching
UnsupportedEncodingException. Yet most of these APIs also has a variant
of the same methods which take a known Charset instance, for which we
can use StandardCharsets.UTF_8. This commit converts most instances of
passing string charset names to use a Charset instance.