Add constraints to dimension fields#74939
Conversation
- It must be either indexed or doc_value - It cannot be multi-valued (WIP)
server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
Outdated
Show resolved
Hide resolved
Cleaned up code
Dimension fields must be both indexed and doc_values
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Pinging @elastic/es-search (Team:Search) |
nik9000
left a comment
There was a problem hiding this comment.
Huh. I wish there were a cleaner way to check if you had multiple values - but I see this is how we do it. I left some comments around the edges, but I think its the right thing for the most part. I'd love to try to clean up the addWithKey stuff to check for duplicates to be more appropriate for what we're doing.
| context.doc().addWithKey(fieldType().name(), field); | ||
| } else { | ||
| context.doc().add(field); | ||
| } |
There was a problem hiding this comment.
I see why you did it this way. I wish there were something cleaner - the addWithKey thing does seem to be something we do in other places though.
I wonder if we need the getByKey at all - maybe we could make addWithKey return the old field and we could check and throw an exception we like? Or something like that. I dunno, feels wasteful to check up front when we already have the check in addWithKey.
There was a problem hiding this comment.
I could totally omit the getByKey() call. addWithKey() checks if there is an entry already and will throw a throw new IllegalStateException("Only one field can be stored per key");
I used getByKey() because I wanted to produce a better error message
There was a problem hiding this comment.
It looks like we do the if (context.doc().getByKey(fieldType().name()) != null) { dance all over the place. Would you be ok refactoring them all in a follow up? It feels like a good thing to clean up to me but not a good thing to mix into this change.
There was a problem hiding this comment.
If we're going to keep the getByKey thing in this change, could you move it to right above addWithKey line? I feel like putting them together would make it a little more readable.
| throw new IllegalArgumentException( | ||
| "Field [" + ignoreAbove.name + "] cannot be set in conjunction with field [dimension]" | ||
| ); | ||
| } |
There was a problem hiding this comment.
I think we should also refuse when there is a normalizer - it'd make it pretty complex to reason about the normalized values.
| throw new IllegalArgumentException( | ||
| "Dimension field [" + fieldType().name() + "] cannot be more than [" + DIMENSION_MAX_LENGTH + "] characters long." | ||
| ); | ||
| } |
There was a problem hiding this comment.
Could you do the assertion on the length of the binaryValue instead? That's the length we really care about.
The limit should be 1025 bytes, not chars
| context.doc().addWithKey(fieldType().name(), field); | ||
| } else { | ||
| context.doc().add(field); | ||
| } |
There was a problem hiding this comment.
If we're going to keep the getByKey thing in this change, could you move it to right above addWithKey line? I feel like putting them together would make it a little more readable.
|
When you backport this please grab the second half of #76368 as well. |
…rameters (#78265) Backports the following PRs: * Add dimension mapping parameter (#74450) Added the dimension parameter to the following field types: keyword ip Numeric field types (integer, long, byte, short) The dimension parameter is of type boolean (default: false) and is used to mark that a field is a time series dimension field. Relates to #74014 * Add constraints to dimension fields (#74939) This PR adds the following constraints to dimension fields: It must be an indexed field and must has doc values It cannot be multi-valued The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit) keyword fields cannot be more than 1024 bytes long keyword fields must not use a normalizer Based on the code added in PR #74450 Relates to #74660 * Expand DocumentMapperTests (#76368) Adds a test for setting the maximum number of dimensions setting and tests the names and types of the metadata fields in the index. Previously we just asserted the count of metadata fields. That made it hard to read failures. * Fix broken test for dimension keywords (#75408) Test was failing because it was testing 1024 bytes long keyword and assertion was failing. Closes #75225 * Checkstyle * Add time_series_metric parameter (#76766) This PR adds the time_series_metric parameter to the following field types: Numeric field types histogram aggregate_metric_double * Rename `dimension` mapping parameter to `time_series_dimension` (#78012) This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766) Relates to #74450 and #74014 * Add time series params to `unsigned_long` and `scaled_float` (#78204) Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types Added the time_series_dimension mapping parameter to the unsigned_long field type Fixes #78100 Relates to #76766, #74450 and #74014 Co-authored-by: Nik Everett <nik9000@gmail.com>
This PR adds the following constraints to
dimensionfields:index.mapping.dimension_fields.limit)keywordfields cannot be more than 1024 bytes longkeywordfields must not use anormalizerBased on the code added in PR #74450
Relates to #74660