Store Template's mappings as bytes for disk serialization#78746
Store Template's mappings as bytes for disk serialization#78746elasticsearchmachine merged 23 commits intoelastic:masterfrom
Conversation
original-brownbear
left a comment
There was a problem hiding this comment.
I think we should do this more like we do for mappings (maybe even reuse some code from there) to keep things simpler. No need to do any hacks around reading base64 from JSON.
| if (uncompressedMapping.size() > 0) { | ||
| builder.field(MAPPINGS.getPreferredName()); | ||
| builder.map(reduceMapping(uncompressedMapping)); | ||
| if (Metadata.CONTEXT_MODE_GATEWAY.equals(params.param(Metadata.CONTEXT_MODE_PARAM, Metadata.CONTEXT_MODE_API))) { |
There was a problem hiding this comment.
It would be nice if this followed the exact same conventions we use for mappings, respecting the binary parameter and serializing as bytes for everything but the API.
There was a problem hiding this comment.
I've added binary parameter and changed usage to similar to IndexMetada
| Object compressed = values.get("compressed"); | ||
| if (compressed == null) { | ||
| return new CompressedXContent(Strings.toString(XContentFactory.jsonBuilder().map(values))); | ||
| } else if (compressed instanceof String) { |
There was a problem hiding this comment.
No need for this if we do things the same way we do them for mappings.
There was a problem hiding this comment.
This handles situation when you use binary parameter with JsonXContent - it stores byte arrays as BASE64 encoded strings. Without it ToAndFromJsonMetadataTests#testSimpleJsonFromAndTo fails
| } else { | ||
| return path + ": first element is null, the second element is not null"; | ||
| } | ||
| } else if (first instanceof byte[]) { |
There was a problem hiding this comment.
Probably also not necessary to extend things this way if we just do what we do for mappings and have a special binary param path here I think.
There was a problem hiding this comment.
This is required for ESIntegTestCase#ensureClusterStateCanBeReadByNodeTool to pass - this test uses binary serialization and this is the first instance where we have to compare byte[] in it.
There was a problem hiding this comment.
Why don't we have that for mappings already, they should be binary serialized as well and appear to run through the exact same code?
There was a problem hiding this comment.
There was different path with parsing IndexMetadata->bytes->IndexMetada and ComposableIndexMetadata->bytes->UnknownMetadataCustom when using ElasticsearchNodeCommand#namedXContentRegistry so results differed when using binary serialization. I changed ElasticsearchNodeCommand to parse ComposableIndexMetadata so the code is no longer needed.
|
@elasticmachine update branch |
|
@elasticmachine run elasticsearch-ci/part-2 |
|
@original-brownbear I simplified code and made it similar to |
original-brownbear
left a comment
There was a problem hiding this comment.
Still not sure why we have to do anything different than we do with mappings here in the tests and in part of production code. Can you explain what's different here?
| new CompressedXContent(Strings.toString(XContentFactory.jsonBuilder().map(p.mapOrdered()))), MAPPINGS); | ||
| PARSER.declareField(ConstructingObjectParser.optionalConstructorArg(), (p, c) -> { | ||
| XContentParser.Token token = p.currentToken(); | ||
| if (token == XContentParser.Token.VALUE_STRING) { |
There was a problem hiding this comment.
Where do we encode the template like this? This seems broken?
We don't seem to run into a base64 JSON string anywhere for mappings so we shouldn't for templates?
There was a problem hiding this comment.
This shows up with combination of JsonXContent and binary parameter. I think we have that only in ToAndFromJsonMetadataTests#testSimpleJsonFromAndTo. Thing is we don't serialize IndexMetadata there because Metadata.toXContent will filter them out if context is not XContentContext.API so we don't have to serialize mappings for them as well. Test like below would fail exactly because we don't handle VALUE_STRING case in IndexMetadata, we just don't test it. It would work with SmileXContent.
IndexMetadata.Builder indexMetadataBuilder = IndexMetadata.builder("test12")
.settings(settings(Version.CURRENT))
.numberOfShards(1)
.numberOfReplicas(0)
.putMapping("{\"mapping1\":{\"text1\":{\"type\":\"string\"}}}");
Map<String, String> params = new HashMap<>(2);
params.put("binary", "true");
params.put(Metadata.CONTEXT_MODE_PARAM, Metadata.CONTEXT_MODE_GATEWAY);
XContentBuilder builder = JsonXContent.contentBuilder();
builder.startObject();
indexMetadataBuilder.build().toXContent(builder, new ToXContent.MapParams(params));
builder.endObject();
IndexMetadata indexMetadata = IndexMetadata.fromXContent(createParser(builder));
assertNotNull(indexMetadata.mapping());| } else { | ||
| return path + ": first element is null, the second element is not null"; | ||
| } | ||
| } else if (first instanceof byte[]) { |
There was a problem hiding this comment.
Why don't we have that for mappings already, they should be binary serialized as well and appear to run through the exact same code?
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
) This change the way we store mappings in `Template` during serialization to disk - instead storing it as map we use byte array that we already have. This avoids deserialization-serialization cycle during storing cluster state on disk. # Conflicts: # server/src/main/java/org/elasticsearch/cluster/coordination/ElasticsearchNodeCommand.java
…79522) This change the way we store mappings in `Template` during serialization to disk - instead storing it as map we use byte array that we already have. This avoids deserialization-serialization cycle during storing cluster state on disk. # Conflicts: # server/src/main/java/org/elasticsearch/cluster/coordination/ElasticsearchNodeCommand.java
* upstream/master: (24 commits) Implement framework for migrating system indices (elastic#78951) Improve transient settings deprecation message (elastic#79504) Remove getValue and getValues from Field (elastic#79516) Store Template's mappings as bytes for disk serialization (elastic#78746) [ML] Add queue_capacity setting to start deployment API (elastic#79433) [ML] muting rest compat test issue elastic#79518 (elastic#79519) Avoid redundant available indices check (elastic#76540) Re-enable BWC tests TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496) [DOCS] Temporarily remove APM links (elastic#79411) Fix CCSDuelIT for skipped shards (elastic#79490) Add other time accounting in HotThreads (elastic#79392) Add deprecation info API entries for deprecated monitoring settings (elastic#78799) Add note in breaking changes for nameid_format (elastic#77785) Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302) Upgrade lucene version 8b68bf60c98 (elastic#79461) Use Strings#EMPTY_ARRAY (elastic#79452) Quicker shared cache file preallocation (elastic#79447) [ML] Removing some code that's obsolete for 8.0 (elastic#79444) Ensure indexing_data CCR requests are compressed (elastic#79413) ...
|
Pinging @elastic/es-data-management (Team:Data Management) |
This change the way we store mappings in
Templateduring serialization to disk - instead storing it as map we use byte array that we already have. This avoids deserialization-serialization cycle during storing cluster state on disk.