Skip to content

[Ingest] Have processors operate one field at time.#15133

Merged
talevy merged 1 commit intoelastic:feature/ingestfrom
talevy:the_one_true_field
Dec 7, 2015
Merged

[Ingest] Have processors operate one field at time.#15133
talevy merged 1 commit intoelastic:feature/ingestfrom
talevy:the_one_true_field

Conversation

@talevy
Copy link
Copy Markdown
Contributor

@talevy talevy commented Dec 1, 2015

To simplify both error handling and specialization of each processor definition, we can operate on one field only. If one wishes to operate on a list of fields, further processors can be defined and repeated within a pipeline.

@talevy talevy added the :Distributed/Ingest Node Execution or management of Ingest Pipelines label Dec 1, 2015
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say which processor in the error message?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message might get confusing if people don't specify any fields, in which case we are going to throw index out of bounds later. btw we should add a test for that.

@javanna
Copy link
Copy Markdown
Contributor

javanna commented Dec 1, 2015

I left some comments, I like the simplification. Tests need to be adapted though, they won't compile at the moment, thus there are some missing bits in prod code too (e.g. missing pkg private getters).

@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Dec 1, 2015

@javanna thanks for the comments! I hoped you wouldn't do a full review just yet :) but appreciated! Wanted to first put it out there to share with you to see if you are a fan of the simplification proposed... looks like you are!

I'll go ahead and complete the PR

@javanna
Copy link
Copy Markdown
Contributor

javanna commented Dec 1, 2015

thanks @talevy btw don't forget to update the docs as well ;)

@talevy talevy force-pushed the the_one_true_field branch from eafaa1c to eb64b7c Compare December 1, 2015 23:37
@talevy talevy changed the title [Ingest] [WIP] Have processors operate one field at time. [Ingest] Have processors operate one field at time. Dec 1, 2015
@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Dec 1, 2015

gradle test -Dtests.class=org.elasticsearch.ingest.PipelineTests -Dtests.method="testProcessorSettingsRemainUntouched"

is failing, still figuring out why. I may not be understanding the intention here

@talevy talevy added the review label Dec 2, 2015
@talevy talevy force-pushed the the_one_true_field branch from eb64b7c to 62af8d3 Compare December 2, 2015 00:30
@martijnvg
Copy link
Copy Markdown
Member

@talevy that test was added because of the deep copy change. Processor settings could be modified by a document running through the pipeline. For example if one a field in the set processor holds a map, a remove processor could alter that map. This tests verifies that this doesn't happen.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point I wonder if we should always require a field and a pattern ( or separator) for these processors too? key:value might be more verbose but clearer. it would make things also more consistent with other processors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so,

"join" : {
  "field" : "joined_array_field",
  "separator" : "-"
}

I think you are correct. it will be more consistent. then it also gets rid of the whole "multiple fields" provided issue. where there is a concrete set of keys to be expected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimchy @martijnvg @clintongormley do you have any preferences surrounding this?

Whether these single field key:value configurations should move to a more explicit configurations with two fields, one representing the field, and one representing the value. In join's case, that value is the separator.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talevy @javanna I personally prefer the key value approach (like in @talevy example) too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg haha, everything is a key-value and I have two examples. which one do you prefer? the existing solution?

or this example?

"join" : {
  "field" : "joined_array_field",
  "separator" : "-"
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this of course brings the question of consistency with all other processors as well. Are we explicit there or not. I see that with Gsub we are explicit... so now we have inconsistency across the different processors. I vote to always be explicit (so always have the "field" explicitly set)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

I'll update the syntax for the ones that apply:

  • convert
  • join
  • rename
  • set
  • split

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to reflect the new spec

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the join example needs updating

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talevy this example seems not aligned still: "joined_array_field": "-"

@talevy talevy force-pushed the the_one_true_field branch from 62af8d3 to 5fbf25a Compare December 3, 2015 16:20
@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Dec 3, 2015

@martijnvg thanks for the fix! tests pass now.

just need to resolve this open design question: #15133 (comment)

EDIT: resolved.

@talevy talevy force-pushed the the_one_true_field branch 2 times, most recently from ac3a7f9 to 5508719 Compare December 3, 2015 20:20
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we have field: null containsKey will return true, but the method will return null as remove returns null, which was the actual value. I think that null shouldn't be returned and we should throw an exception instead, like we do in other methods. Just call remove and check the returned value, without containsKey.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you're right

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the sake of separated error messages, I will treat contains property vs. null property value separately

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not entirely convinced this is correct. I was expecting a user to wish to set a field to be null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in a document. but this configuration, i think we can simplify and consider null and missing values the same way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all other methods in this class we don't call containsKey. I would stay aligned with those here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talevy I think you have missed my last comment ^ . We can simplify things like we do in other methods of the same class here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants