Skip to content

Add constraints to dimension fields#74939

Merged
csoulios merged 15 commits intoelastic:masterfrom
csoulios:ts-dimensions-constraints
Jul 9, 2021
Merged

Add constraints to dimension fields#74939
csoulios merged 15 commits intoelastic:masterfrom
csoulios:ts-dimensions-constraints

Conversation

@csoulios
Copy link
Copy Markdown
Contributor

@csoulios csoulios commented Jul 5, 2021

This PR adds the following constraints to dimension fields:

  • It must be an indexed field and must has doc values
  • It cannot be multi-valued
  • The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit)
  • keyword fields cannot be more than 1024 bytes long
  • keyword fields must not use a normalizer

Based on the code added in PR #74450
Relates to #74660

- It must be either indexed or doc_value
- It cannot be multi-valued (WIP)
@csoulios csoulios added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jul 5, 2021
@csoulios csoulios marked this pull request as ready for review July 6, 2021 18:37
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels Jul 6, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)

@csoulios csoulios requested review from jimczi and nik9000 July 6, 2021 18:38
Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I wish there were a cleaner way to check if you had multiple values - but I see this is how we do it. I left some comments around the edges, but I think its the right thing for the most part. I'd love to try to clean up the addWithKey stuff to check for duplicates to be more appropriate for what we're doing.

context.doc().addWithKey(fieldType().name(), field);
} else {
context.doc().add(field);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you did it this way. I wish there were something cleaner - the addWithKey thing does seem to be something we do in other places though.

I wonder if we need the getByKey at all - maybe we could make addWithKey return the old field and we could check and throw an exception we like? Or something like that. I dunno, feels wasteful to check up front when we already have the check in addWithKey.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could totally omit the getByKey() call. addWithKey() checks if there is an entry already and will throw a throw new IllegalStateException("Only one field can be stored per key");

I used getByKey() because I wanted to produce a better error message

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we do the if (context.doc().getByKey(fieldType().name()) != null) { dance all over the place. Would you be ok refactoring them all in a follow up? It feels like a good thing to clean up to me but not a good thing to mix into this change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to keep the getByKey thing in this change, could you move it to right above addWithKey line? I feel like putting them together would make it a little more readable.

throw new IllegalArgumentException(
"Field [" + ignoreAbove.name + "] cannot be set in conjunction with field [dimension]"
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also refuse when there is a normalizer - it'd make it pretty complex to reason about the normalized values.

throw new IllegalArgumentException(
"Dimension field [" + fieldType().name() + "] cannot be more than [" + DIMENSION_MAX_LENGTH + "] characters long."
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do the assertion on the length of the binaryValue instead? That's the length we really care about.

@csoulios csoulios requested a review from nik9000 July 6, 2021 19:46
context.doc().addWithKey(fieldType().name(), field);
} else {
context.doc().add(field);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to keep the getByKey thing in this change, could you move it to right above addWithKey line? I feel like putting them together would make it a little more readable.

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csoulios csoulios merged commit 91a1591 into elastic:master Jul 9, 2021
@csoulios csoulios deleted the ts-dimensions-constraints branch July 9, 2021 11:57
@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Aug 11, 2021

When you backport this please grab the second half of #76368 as well.

csoulios added a commit that referenced this pull request Sep 27, 2021
…rameters (#78265)

Backports the following PRs:

* Add dimension mapping parameter (#74450)

Added the dimension parameter to the following field types:

    keyword
    ip
    Numeric field types (integer, long, byte, short)

The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.

Relates to #74014

* Add constraints to dimension fields (#74939)

This PR adds the following constraints to dimension fields:

    It must be an indexed field and must has doc values
    It cannot be multi-valued
    The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit)
    keyword fields cannot be more than 1024 bytes long
    keyword fields must not use a normalizer

Based on the code added in PR #74450
Relates to #74660

* Expand DocumentMapperTests (#76368)

Adds a test for setting the maximum number of dimensions setting and
tests the names and types of the metadata fields in the index.
Previously we just asserted the count of metadata fields. That made it
hard to read failures.

* Fix broken test for dimension keywords (#75408)

Test was failing because it was testing 1024 bytes long keyword and assertion was failing.

Closes #75225

* Checkstyle

* Add time_series_metric parameter (#76766)

This PR adds the time_series_metric parameter to the following field types:

    Numeric field types
    histogram
    aggregate_metric_double

* Rename `dimension` mapping parameter to `time_series_dimension` (#78012)

This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766)

Relates to #74450 and #74014

* Add time series params to `unsigned_long` and `scaled_float` (#78204)

    Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types
    Added the time_series_dimension mapping parameter to the unsigned_long field type

Fixes #78100

Relates to #76766, #74450 and #74014

Co-authored-by: Nik Everett <nik9000@gmail.com>
@wchaparro wchaparro assigned csoulios and unassigned csoulios Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v7.16.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants