[ML] Job and datafeed mappings with index template by davidkyle · Pull Request #32594 · elastic/elasticsearch

davidkyle · 2018-08-02T17:41:29Z

Mappings for job and datafeed configurations and the index template.
For moving the configuration classes out of the clusterstate

elasticmachine · 2018-08-02T17:41:31Z

Pinging @elastic/ml-core

droberts195

I left a few comments and questions.

droberts195 · 2018-08-07T09:14:52Z

...ore/src/main/java/org/elasticsearch/xpack/core/ml/job/persistence/AnomalyDetectorsIndex.java

+     * is stored
+     * @return The index name
+     */
+    public static String jobConfigIndexName() {


Maybe just configIndexName(), as it will also store datafeed configs?

droberts195 · 2018-08-07T09:18:11Z

...ore/src/main/java/org/elasticsearch/xpack/core/ml/job/persistence/ElasticsearchMappings.java

+                            .field(TYPE, KEYWORD)
+                        .endObject()
+                        .startObject(Detector.CUSTOM_RULES_FIELD.getPreferredName())
+                            .field(TYPE, NESTED)


Do we need the nested document functionality here? If not then OBJECT would be lighter-weight. (Maybe at this level NESTED is useful - I'm not sure.)

Custom rules are an array which is why I used nested

droberts195 · 2018-08-07T09:18:36Z

...ore/src/main/java/org/elasticsearch/xpack/core/ml/job/persistence/ElasticsearchMappings.java

+                                    .field(ENABLED, false)
+                                .endObject()
+                                .startObject(DetectionRule.CONDITIONS_FIELD.getPreferredName())
+                                    .field(TYPE, NESTED)


I think at this level NESTED is definitely overkill and should be OBJECT.

Again conditions are an array. It's arguable that the extensive mappings are overkill anyway, not all fields need to be searchable and these mappings come with a maintenance burden. Is it conceivable that someone will want to search for a job where RuleCondition.AppliesTo == ACTUAL && RuleCondition.Operator == GT

I guess there will be relatively few documents in the index, so we can get away with high-cost mappings. We have NESTED in our results index where it is much more costly due to the higher document volume and it's never been found to be the root cause of any problem, so it's probably fine.

droberts195 · 2018-08-07T09:21:47Z

...ore/src/main/java/org/elasticsearch/xpack/core/ml/job/persistence/ElasticsearchMappings.java

+                            .endObject()
+                        .endObject()
+                        .startObject(Detector.DETECTOR_INDEX.getPreferredName())
+                            .field(TYPE, LONG)


This is INTEGER in other mappings.

droberts195 · 2018-08-07T09:25:16Z

...lugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/results/ReservedFieldNames.java

    public static boolean isValidFieldName(String fieldName) {
        String[] segments = DOT_PATTERN.split(fieldName);
-        return !RESERVED_FIELD_NAMES.contains(segments[0]);
+        return RESERVED_RESULT_FIELD_NAMES.contains(segments[0]) == false && RESERVED_CONFIG_FIELD_NAMES.contains(segments[0]) == false;


Is it necessary for the config field names to be included here? It will prevent fields being added to results documents just because they clash with a config field name. But that's not necessary because the config and results are in different indices. Is there another reason?

droberts195 · 2018-08-07T09:31:03Z

...ore/src/main/java/org/elasticsearch/xpack/core/ml/job/persistence/ElasticsearchMappings.java

+        .startObject(Job.RESULTS_INDEX_NAME.getPreferredName())
+            .field(TYPE, KEYWORD)
+        .endObject()
+        .startObject(Job.DELETED.getPreferredName())  // TODO can this be removed?


This is an interesting question. At the moment this is what other APIs use to determine that a job is in the process of being deleted. Maybe storing that flag in an index won't be sufficiently atomic in the future. An alternative might be to make the job deletion process a persistent task, and use the existence of that persistent task to determine whether a job is being deleted. This field is probably just one of several places where indices won't give the same ordering guarantee that cluster state gave us.

With the loose eventually consistent guarantees this field is useless, we have considered other ways to mark the fact that a job is in the process of deletion I'd like to remove this

I've removed the DELETED field from the mapping

droberts195

LGTM

Job and datafeed mappings with index template

4d9d269

davidkyle added >feature review :ml Machine learning labels Aug 2, 2018

droberts195 reviewed Aug 7, 2018

View reviewed changes

Address review comments

03a23c2

droberts195 approved these changes Aug 7, 2018

View reviewed changes

davidkyle added 2 commits August 7, 2018 13:23

Remove job is deleted mapping

6e2571e

Fix mappings and test

d757f1f

davidkyle force-pushed the job-mapping branch from 5c68da5 to d757f1f Compare August 8, 2018 12:18

davidkyle closed this Aug 8, 2018

davidkyle mentioned this pull request Aug 8, 2018

[ML] Job and datafeed mappings with index template #32719

Merged

Conversation

davidkyle commented Aug 2, 2018

Uh oh!

elasticmachine commented Aug 2, 2018

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants