Skip to content

[ML] Job and datafeed mappings with index template#32594

Closed
davidkyle wants to merge 4 commits intoelastic:feature/jindexfrom
davidkyle:job-mapping
Closed

[ML] Job and datafeed mappings with index template#32594
davidkyle wants to merge 4 commits intoelastic:feature/jindexfrom
davidkyle:job-mapping

Conversation

@davidkyle
Copy link
Copy Markdown
Member

Mappings for job and datafeed configurations and the index template.
For moving the configuration classes out of the clusterstate

@davidkyle davidkyle added >feature review :ml Machine learning labels Aug 2, 2018
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core

Copy link
Copy Markdown

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments and questions.

* is stored
* @return The index name
*/
public static String jobConfigIndexName() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just configIndexName(), as it will also store datafeed configs?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.field(TYPE, KEYWORD)
.endObject()
.startObject(Detector.CUSTOM_RULES_FIELD.getPreferredName())
.field(TYPE, NESTED)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the nested document functionality here? If not then OBJECT would be lighter-weight. (Maybe at this level NESTED is useful - I'm not sure.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom rules are an array which is why I used nested

.field(ENABLED, false)
.endObject()
.startObject(DetectionRule.CONDITIONS_FIELD.getPreferredName())
.field(TYPE, NESTED)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this level NESTED is definitely overkill and should be OBJECT.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again conditions are an array. It's arguable that the extensive mappings are overkill anyway, not all fields need to be searchable and these mappings come with a maintenance burden. Is it conceivable that someone will want to search for a job where RuleCondition.AppliesTo == ACTUAL && RuleCondition.Operator == GT

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there will be relatively few documents in the index, so we can get away with high-cost mappings. We have NESTED in our results index where it is much more costly due to the higher document volume and it's never been found to be the root cause of any problem, so it's probably fine.

.endObject()
.endObject()
.startObject(Detector.DETECTOR_INDEX.getPreferredName())
.field(TYPE, LONG)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is INTEGER in other mappings.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

public static boolean isValidFieldName(String fieldName) {
String[] segments = DOT_PATTERN.split(fieldName);
return !RESERVED_FIELD_NAMES.contains(segments[0]);
return RESERVED_RESULT_FIELD_NAMES.contains(segments[0]) == false && RESERVED_CONFIG_FIELD_NAMES.contains(segments[0]) == false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for the config field names to be included here? It will prevent fields being added to results documents just because they clash with a config field name. But that's not necessary because the config and results are in different indices. Is there another reason?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

.startObject(Job.RESULTS_INDEX_NAME.getPreferredName())
.field(TYPE, KEYWORD)
.endObject()
.startObject(Job.DELETED.getPreferredName()) // TODO can this be removed?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question. At the moment this is what other APIs use to determine that a job is in the process of being deleted. Maybe storing that flag in an index won't be sufficiently atomic in the future. An alternative might be to make the job deletion process a persistent task, and use the existence of that persistent task to determine whether a job is being deleted. This field is probably just one of several places where indices won't give the same ordering guarantee that cluster state gave us.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the loose eventually consistent guarantees this field is useless, we have considered other ways to mark the fact that a job is in the process of deletion I'd like to remove this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the DELETED field from the mapping

Copy link
Copy Markdown

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :ml Machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants