Skip to content

Support synthetic source together with ignore_malformed in histogram fields#109882

Merged
lkts merged 4 commits intoelastic:mainfrom
lkts:feature/histogram_synthetic_source_ignore_malformed
Jun 20, 2024
Merged

Support synthetic source together with ignore_malformed in histogram fields#109882
lkts merged 4 commits intoelastic:mainfrom
lkts:feature/histogram_synthetic_source_ignore_malformed

Conversation

@lkts
Copy link
Copy Markdown
Contributor

@lkts lkts commented Jun 18, 2024

Contributes to #106483.

@lkts lkts added the :StorageEngine/Mapping The storage related side of mappings label Jun 18, 2024
@github-actions
Copy link
Copy Markdown
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @lkts, I've created a changelog YAML for you.

* Typical use case is to gather field values from doc_values and append malformed values
* stored in a different field in case of ignore_malformed being enabled.
*/
public class CompositeSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed after implementing this that this is very close to what ObjectMapper.SyntheticSourceFieldLoader does. Maybe we can unify some code later.

This is also an alternative approach to current implementation of f.e. SortedNumericDocValuesSyntheticFieldLoader where malformed values handling is implemented explicitly. That logic is repeated in multiple loaders that handle different doc values types. I obviously didn't refactor that in this PR but wanted to gather some thoughts.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think in a followup we can explore how to have a common base class for this class and ObjectMapper.SyntheticSourceFieldLoader.

@lkts
Copy link
Copy Markdown
Contributor Author

lkts commented Jun 18, 2024

@elasticmachine update branch

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* Typical use case is to gather field values from doc_values and append malformed values
* stored in a different field in case of ignore_malformed being enabled.
*/
public class CompositeSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think in a followup we can explore how to have a common base class for this class and ObjectMapper.SyntheticSourceFieldLoader.

private List<Object> values;

public MalformedValuesLayer(String fieldName) {
this.fieldName = fieldName + "._ignore_malformed";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"._ignore_malformed" should be a const somewhere.

if (v instanceof BytesRef r) {
XContentDataHelper.decodeAndWrite(b, r);
} else {
b.value(v);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this one? I thought malformed values are always encoded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's f.e. text we skip encoding in some fields. This is for compatibility with existing code.

if (binaryValue == null) {
return;
}
b.startObject();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed from b.startObject(simpleName()); ?

Copy link
Copy Markdown
Contributor Author

@lkts lkts Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because composite loader writes that now. It is possible that there are malformed values so this is now not an object but an array that contains an object.

id: 2
- match:
_source:
latency: [{"values": [2.0], "counts": [2]}, {"values": [1.0], "counts": [1], "hello": "world"}, 123, 456, "fox"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We miss that we got [123, 456] as a pair.. Not a biggie, wonder if there's an easy way to catch the array.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, this is how it works everywhere.

Copy link
Copy Markdown
Member

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just a few minor ones.

@lkts lkts merged commit 8bc5ecd into elastic:main Jun 20, 2024
@lkts lkts deleted the feature/histogram_synthetic_source_ignore_malformed branch June 20, 2024 16:09
@felixbarny felixbarny mentioned this pull request Aug 6, 2024
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants