Skip to content

Make data streams in APIs resolvable.#54726

Merged
martijnvg merged 39 commits intoelastic:masterfrom
martijnvg:make_data_streams_resolvable_in_apis
Apr 16, 2020
Merged

Make data streams in APIs resolvable.#54726
martijnvg merged 39 commits intoelastic:masterfrom
martijnvg:make_data_streams_resolvable_in_apis

Conversation

@martijnvg
Copy link
Copy Markdown
Member

@martijnvg martijnvg commented Apr 3, 2020

This change makes data stream resolvable in APIs.

The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.

Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).

Relates to #53100

This change makes data stream resolvable in all APIs.
For APIs that have `FORBID_ALIASES_TO_MULTIPLE_INDICES` set
in the index options, this means that a data stream is resolved
to the latest backing index. For all other APIs this means
that a data stream is resolved to all backing indices.

Additionally also a `IGNORE_DATA_STREAMS` indices option is added,
so that for specific APIs data stream resolvability can be disabled.

Relates to elastic#53100
@martijnvg martijnvg added WIP :StorageEngine/Data streams Data streams and their lifecycles labels Apr 3, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Data streams)

@martijnvg martijnvg marked this pull request as ready for review April 8, 2020 10:53
@martijnvg martijnvg requested a review from danhermann April 8, 2020 10:54
to a change in the error message, this pr now returns `alias` with
originates from the index abstraction type's display name.
… indices option.

Data streams are not resolved unless the INCLUDE_DATA_STREAMS indices option is specified.
Only search, msearch, index (with op type create) and refresh requests are configured with
the INCLUDE_DATA_STREAMS indices option. All other requests / apis don't resolve data streams.
Copy link
Copy Markdown
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest PR updates LGTM

@henningandersen
Copy link
Copy Markdown
Contributor

I think we should we add a few REST tests too that checks that doing index admin ops against a data-stream fails? Like we have also for aliases, for instance when deleting an index. I am not sure we need full coverage on that, but would be good to ensure a handful of most important cases are handled, like delete index, update settings, update mapping, GET index (should that fail, I think so?).

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg let me know your thoughts on moving the includeDataStreams option from IndicesOptions to IndexNameExpressionResolver and failing always when hitting data-stream (and not wanting to resolve them)?

@@ -133,12 +133,19 @@ public enum Option {
new IndicesOptions(EnumSet.of(Option.ALLOW_NO_INDICES, Option.FORBID_CLOSED_INDICES),
EnumSet.of(WildcardStates.OPEN, WildcardStates.HIDDEN));
public static final IndicesOptions STRICT_EXPAND_OPEN_FORBID_CLOSED_IGNORE_THROTTLED =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stick to this approach, this should also be renamed?

concreteIndices.add(writeIndex.getIndex());
}
} else {
if (indexAbstraction.getIndices().size() > 1 && !options.allowAliasesToMultipleIndices()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reusing this flag for data streams is not ideal. I wonder if we should instead flat out fail, if an expression matches a data-stream and we are not in one of the cases where data-streams are resolved.

I think I prefer this over skipping data-streams. Index, alias and data-stream are in the same namespace. I think it will be confusing to have some index admin operations that match indices and aliases, but ignores data-streams. Suppose you have:

xalias -> yindex
xindex
xdatastream

and you do POST /x*/_close, I think it would close yindex, xindex, but silently ignore xdatastream. Obviously, users should design their namings to avoid such scenarios, but I think a better default would be to fail, if an operation unexpectedly resolves to a data-stream.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether an operation or api is expected to expand a wildcard expression to a data stream that depends on whether that api is setup to resolve data streams. So in this case that would mean that ignoring data streams is expected behaviour. This is at least my expectation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Henning and I chatted about this and we both now agree that the request should fail if data streams are not resolvable both the expression matches with one or more data streams. It makes it clear that an api doesn't support data streams (just like if a user specified a concrete data stream name) and this makes it easier to add data stream support if an api should support data streams.

}
}

public void testDataSteams() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
public void testDataSteams() {
public void testDataStreams() {

…ssion expands to a data stream then fail.

This means the api doesn't support data streams, but the wildcard expression is resolving to a data stream.
Prior to this commit data streams would be quietly ignored.

We think that failing is more clear, so that it is clear that a specific api call isn't executed over a data stream.
Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

out.writeEnumSet(options);
if (out.getVersion().before(Version.V_8_0_0) && options.contains(Option.INCLUDE_DATA_STREAMS)) {
EnumSet<Option> copy = EnumSet.copyOf(options);
copy.remove(Option.INCLUDE_DATA_STREAMS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put in a check into CreateDataStreamAction to not allow creating data streams before all nodes are on the target release (7.8)? To ensure that there are no surprises in a mixed cluster.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will make this change in a followup change.

@martijnvg martijnvg merged commit fada09a into elastic:master Apr 16, 2020
martijnvg added a commit that referenced this pull request Apr 17, 2020
Backport from: #54726

The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.

In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.

Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).

Relates to #53100
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Apr 17, 2020
Enabled data streams and itv2 feature enabled system properties in server module's integ test task.

PR elastic#54726 added java integration tests for data streams, so this is why these system properties
need to be enabled when running release build.
martijnvg added a commit that referenced this pull request Apr 17, 2020
Enabled data streams and itv2 feature enabled system properties in server module's integ test task.

PR #54726 added java integration tests for data streams, so this is why these system properties
need to be enabled when running release build.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Apr 17, 2020
Enabled data streams and itv2 feature enabled system properties in server module's integ test task.

PR elastic#54726 added java integration tests for data streams, so this is why these system properties
need to be enabled when running release build.
martijnvg added a commit that referenced this pull request Apr 17, 2020
)

Enabled data streams and itv2 feature enabled system properties in server module's integ test task.

PR #54726 added java integration tests for data streams, so this is why these system properties
need to be enabled when running release build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants