[Ingest] Have processors operate one field at time.#15133
[Ingest] Have processors operate one field at time.#15133talevy merged 1 commit intoelastic:feature/ingestfrom
Conversation
There was a problem hiding this comment.
say which processor in the error message?
There was a problem hiding this comment.
this message might get confusing if people don't specify any fields, in which case we are going to throw index out of bounds later. btw we should add a test for that.
|
I left some comments, I like the simplification. Tests need to be adapted though, they won't compile at the moment, thus there are some missing bits in prod code too (e.g. missing pkg private getters). |
|
@javanna thanks for the comments! I hoped you wouldn't do a full review just yet :) but appreciated! Wanted to first put it out there to share with you to see if you are a fan of the simplification proposed... looks like you are! I'll go ahead and complete the PR |
|
thanks @talevy btw don't forget to update the docs as well ;) |
eafaa1c to
eb64b7c
Compare
is failing, still figuring out why. I may not be understanding the intention here |
eb64b7c to
62af8d3
Compare
|
@talevy that test was added because of the deep copy change. Processor settings could be modified by a document running through the pipeline. For example if one a field in the set processor holds a map, a remove processor could alter that map. This tests verifies that this doesn't happen. |
docs/plugins/ingest.asciidoc
Outdated
There was a problem hiding this comment.
at this point I wonder if we should always require a field and a pattern ( or separator) for these processors too? key:value might be more verbose but clearer. it would make things also more consistent with other processors.
There was a problem hiding this comment.
so,
"join" : {
"field" : "joined_array_field",
"separator" : "-"
}
I think you are correct. it will be more consistent. then it also gets rid of the whole "multiple fields" provided issue. where there is a concrete set of keys to be expected
There was a problem hiding this comment.
@kimchy @martijnvg @clintongormley do you have any preferences surrounding this?
Whether these single field key:value configurations should move to a more explicit configurations with two fields, one representing the field, and one representing the value. In join's case, that value is the separator.
There was a problem hiding this comment.
@martijnvg haha, everything is a key-value and I have two examples. which one do you prefer? the existing solution?
or this example?
"join" : {
"field" : "joined_array_field",
"separator" : "-"
}
There was a problem hiding this comment.
and this of course brings the question of consistency with all other processors as well. Are we explicit there or not. I see that with Gsub we are explicit... so now we have inconsistency across the different processors. I vote to always be explicit (so always have the "field" explicitly set)
There was a problem hiding this comment.
++
I'll update the syntax for the ones that apply:
- convert
- join
- rename
- set
- split
There was a problem hiding this comment.
updated to reflect the new spec
There was a problem hiding this comment.
seems like the join example needs updating
There was a problem hiding this comment.
@talevy this example seems not aligned still: "joined_array_field": "-"
62af8d3 to
5fbf25a
Compare
|
@martijnvg thanks for the fix! tests pass now.
EDIT: resolved. |
ac3a7f9 to
5508719
Compare
There was a problem hiding this comment.
when we have field: null containsKey will return true, but the method will return null as remove returns null, which was the actual value. I think that null shouldn't be returned and we should throw an exception instead, like we do in other methods. Just call remove and check the returned value, without containsKey.
There was a problem hiding this comment.
for the sake of separated error messages, I will treat contains property vs. null property value separately
There was a problem hiding this comment.
I am also not entirely convinced this is correct. I was expecting a user to wish to set a field to be null.
There was a problem hiding this comment.
yes, in a document. but this configuration, i think we can simplify and consider null and missing values the same way.
There was a problem hiding this comment.
In all other methods in this class we don't call containsKey. I would stay aligned with those here.
There was a problem hiding this comment.
@talevy I think you have missed my last comment ^ . We can simplify things like we do in other methods of the same class here.
To simplify both error handling and specialization of each processor definition, we can operate on one field only. If one wishes to operate on a list of fields, further processors can be defined and repeated within a pipeline.