Add constraints to dimension fields by csoulios · Pull Request #74939 · elastic/elasticsearch

csoulios · 2021-07-05T16:30:23Z

This PR adds the following constraints to dimension fields:

It must be an indexed field and must has doc values
It cannot be multi-valued
The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit)
keyword fields cannot be more than 1024 bytes long
keyword fields must not use a normalizer

Based on the code added in PR #74450
Relates to #74660

- It must be either indexed or doc_value - It cannot be multi-valued (WIP)

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

Cleaned up code

Dimension fields must be both indexed and doc_values

elasticmachine · 2021-07-06T18:37:57Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticmachine · 2021-07-06T18:37:57Z

Pinging @elastic/es-search (Team:Search)

nik9000

Huh. I wish there were a cleaner way to check if you had multiple values - but I see this is how we do it. I left some comments around the edges, but I think its the right thing for the most part. I'd love to try to clean up the addWithKey stuff to check for duplicates to be more appropriate for what we're doing.

nik9000 · 2021-07-06T18:54:32Z

server/src/main/java/org/elasticsearch/index/mapper/IpFieldMapper.java

+                context.doc().addWithKey(fieldType().name(), field);
+            } else {
+                context.doc().add(field);
+            }


I see why you did it this way. I wish there were something cleaner - the addWithKey thing does seem to be something we do in other places though.

I wonder if we need the getByKey at all - maybe we could make addWithKey return the old field and we could check and throw an exception we like? Or something like that. I dunno, feels wasteful to check up front when we already have the check in addWithKey.

I could totally omit the getByKey() call. addWithKey() checks if there is an entry already and will throw a throw new IllegalStateException("Only one field can be stored per key");

I used getByKey() because I wanted to produce a better error message

It looks like we do the if (context.doc().getByKey(fieldType().name()) != null) { dance all over the place. Would you be ok refactoring them all in a follow up? It feels like a good thing to clean up to me but not a good thing to mix into this change.

If we're going to keep the getByKey thing in this change, could you move it to right above addWithKey line? I feel like putting them together would make it a little more readable.

nik9000 · 2021-07-06T18:56:37Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

+                        throw new IllegalArgumentException(
+                            "Field [" + ignoreAbove.name + "] cannot be set in conjunction with field [dimension]"
+                        );
                    }


I think we should also refuse when there is a normalizer - it'd make it pretty complex to reason about the normalized values.

nik9000 · 2021-07-06T18:57:49Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

+            throw new IllegalArgumentException(
+                "Dimension field [" + fieldType().name() + "] cannot be more than [" + DIMENSION_MAX_LENGTH + "] characters long."
+            );
+        }


Could you do the assertion on the length of the binaryValue instead? That's the length we really care about.

The limit should be 1025 bytes, not chars

nik9000 · 2021-07-07T14:34:24Z

server/src/main/java/org/elasticsearch/index/mapper/IpFieldMapper.java

+                context.doc().addWithKey(fieldType().name(), field);
+            } else {
+                context.doc().add(field);
+            }


If we're going to keep the getByKey thing in this change, could you move it to right above addWithKey line? I feel like putting them together would make it a little more readable.

nik9000

LGTM

nik9000 · 2021-08-11T18:57:28Z

When you backport this please grab the second half of #76368 as well.

…rameters (#78265) Backports the following PRs: * Add dimension mapping parameter (#74450) Added the dimension parameter to the following field types: keyword ip Numeric field types (integer, long, byte, short) The dimension parameter is of type boolean (default: false) and is used to mark that a field is a time series dimension field. Relates to #74014 * Add constraints to dimension fields (#74939) This PR adds the following constraints to dimension fields: It must be an indexed field and must has doc values It cannot be multi-valued The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit) keyword fields cannot be more than 1024 bytes long keyword fields must not use a normalizer Based on the code added in PR #74450 Relates to #74660 * Expand DocumentMapperTests (#76368) Adds a test for setting the maximum number of dimensions setting and tests the names and types of the metadata fields in the index. Previously we just asserted the count of metadata fields. That made it hard to read failures. * Fix broken test for dimension keywords (#75408) Test was failing because it was testing 1024 bytes long keyword and assertion was failing. Closes #75225 * Checkstyle * Add time_series_metric parameter (#76766) This PR adds the time_series_metric parameter to the following field types: Numeric field types histogram aggregate_metric_double * Rename `dimension` mapping parameter to `time_series_dimension` (#78012) This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766) Relates to #74450 and #74014 * Add time series params to `unsigned_long` and `scaled_float` (#78204) Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types Added the time_series_dimension mapping parameter to the unsigned_long field type Fixes #78100 Relates to #76766, #74450 and #74014 Co-authored-by: Nik Everett <nik9000@gmail.com>

Add constraints to dimension fields:

07710a4

- It must be either indexed or doc_value - It cannot be multi-valued (WIP)

csoulios added >non-issue v8.0.0 :StorageEngine/TSDB You know, for Metrics v7.15.0 labels Jul 5, 2021

csoulios commented Jul 5, 2021

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java Outdated Show resolved Hide resolved

csoulios added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jul 5, 2021

csoulios added 5 commits July 5, 2021 20:57

Added single-valued field constraint to ip

1204f57

Cleaned up code

Minor improvement

979ccb3

Added single-value field checks for numbers

091991f

Ensure index and doc_values for dimensions

cc241aa

Dimension fields must be both indexed and doc_values

Merge branch 'master' into ts-dimensions-constraints

601cd31

imotov mentioned this pull request Jul 6, 2021

Add better support for metric data types (TSDB) #74660

Closed

csoulios added 5 commits July 6, 2021 18:53

Add constraint for dimension fields per index

7899433

Merge branch 'master' into ts-dimensions-constraints

7fb2f6b

Fixed broken tests

d3a6262

Fix broken test

43257b1

Do not allow keywords > 1024 chars long

12a2f9a

csoulios marked this pull request as ready for review July 6, 2021 18:37

elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels Jul 6, 2021

csoulios requested review from jimczi and nik9000 July 6, 2021 18:38

nik9000 reviewed Jul 6, 2021

View reviewed changes

csoulios added 2 commits July 6, 2021 22:34

Do not allow keywords > 1024 bytes long

e4eb87b

The limit should be 1025 bytes, not chars

Do not allow normalizer to best for kw

e4a5f05

csoulios requested a review from nik9000 July 6, 2021 19:46

nik9000 requested changes Jul 7, 2021

View reviewed changes

Move getByKey right above addWithKey for clarity

d755c38

nik9000 approved these changes Jul 7, 2021

View reviewed changes

Merge branch 'master' into ts-dimensions-constraints

e578713

csoulios merged commit 91a1591 into elastic:master Jul 9, 2021

csoulios deleted the ts-dimensions-constraints branch July 9, 2021 11:57

jrodewig mentioned this pull request Jul 16, 2021

[DOCS] Document dimension mapping parameter #75414

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

csoulios added v7.16.0 backport pending and removed v7.15.0 labels Sep 9, 2021

csoulios mentioned this pull request Sep 23, 2021

[7.x] Add time_series_dimension and time_series_metric mapping parameters #78265

Merged

7 tasks

csoulios removed the backport pending label Sep 27, 2021

wchaparro assigned csoulios and unassigned csoulios Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add constraints to dimension fields#74939

Add constraints to dimension fields#74939
csoulios merged 15 commits intoelastic:masterfrom
csoulios:ts-dimensions-constraints

csoulios commented Jul 5, 2021 •

edited

Loading

Uh oh!

Uh oh!

elasticmachine commented Jul 6, 2021

Uh oh!

elasticmachine commented Jul 6, 2021

Uh oh!

nik9000 left a comment

Uh oh!

nik9000 Jul 6, 2021

Uh oh!

csoulios Jul 6, 2021

Uh oh!

nik9000 Jul 7, 2021

Uh oh!

nik9000 Jul 7, 2021

Uh oh!

nik9000 Jul 6, 2021

Uh oh!

nik9000 Jul 6, 2021

Uh oh!

nik9000 Jul 7, 2021

Uh oh!

nik9000 left a comment

Uh oh!

nik9000 commented Aug 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

csoulios commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

elasticmachine commented Jul 6, 2021

Uh oh!

elasticmachine commented Jul 6, 2021

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 6, 2021

Choose a reason for hiding this comment

Uh oh!

csoulios Jul 6, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 7, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 7, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 6, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 6, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 7, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

csoulios commented Jul 5, 2021 •

edited

Loading