Test by nik9000 · Pull Request #4 · nik9000/elasticsearch

nik9000 · 2019-01-24T17:00:52Z

I am a test PR.

…tic#46506) * Add retention to Snapshot Lifecycle Management (elastic#46407) This commit adds retention to the existing Snapshot Lifecycle Management feature (elastic#38461) as described in elastic#43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria. An example policy would look like: ``` PUT /_slm/policy/snapshot-every-day { "schedule": "0 30 2 * * ?", "name": "<production-snap-{now/d}>", "repository": "my-s3-repository", "config": { "indices": ["foo-*", "important"] }, // Newly configured retention options "retention": { // Snapshots should be deleted after 14 days "expire_after": "14d", // Keep a maximum of thirty snapshots "max_count": 30, // Keep a minimum of the four most recent snapshots "min_count": 4 } } ``` SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour. Included in this work is a new SLM stats API endpoint available through ``` json GET /_slm/stats ``` That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. elastic#45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API. * Add base framework for snapshot retention (elastic#43605) * Add base framework for snapshot retention This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask` to start as the basis for SLM's retention implementation. Relates to elastic#38461 * Remove extraneous 'public' * Use a local var instead of reading class var repeatedly * Add SnapshotRetentionConfiguration for retention configuration (elastic#43777) * Add SnapshotRetentionConfiguration for retention configuration This commit adds the `SnapshotRetentionConfiguration` class and its HLRC counterpart to encapsulate the configuration for SLM retention. Currently only a single parameter is supported as an example (we still need to discuss the different options we want to support and their names) to keep the size of the PR down. It also does not yet include version serialization checks since the original SLM branch has not yet been merged. Relates to elastic#43663 * Fix REST tests * Fix more documentation * Use Objects.equals to avoid NPE * Put `randomSnapshotLifecyclePolicy` in only one place * Occasionally return retention with no configuration * Implement SnapshotRetentionTask's snapshot filtering and delet… (elastic#44764) * Implement SnapshotRetentionTask's snapshot filtering and deletion This commit implements the snapshot filtering and deletion for `SnapshotRetentionTask`. Currently only the expire-after age is used for determining whether a snapshot is eligible for deletion. Relates to elastic#43663 * Fix deletes running on the wrong thread * Handle missing or null policy in snap metadata differently * Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>> * Use the `OriginSettingClient` to work with security, enhance logging * Prevent NPE in test by mocking Client * Allow empty/missing SLM retention configuration (elastic#45018) Semi-related to elastic#44465, this allows the `"retention"` configuration map to be missing. Relates to elastic#43663 * Add min_count and max_count as SLM retention predicates (elastic#44926) This adds the configuration options for `min_count` and `max_count` as well as the logic for determining whether a snapshot meets this criteria to SLM's retention feature. These options are optional and one, two, or all three can be specified in an SLM policy. Relates to elastic#43663 * Time-bound deletion of snapshots in retention delete function (elastic#45065) * Time-bound deletion of snapshots in retention delete function With a cluster that has a large number of snapshots, it's possible that snapshot deletion can take a very long time (especially since deletes currently have to happen in a serial fashion). To prevent snapshot deletion from taking forever in a cluster and blocking other operations, this commit adds a setting to allow configuring a maximum time to spend deletion snapshots during retention. This dynamic setting defaults to 1 hour and is best-effort, meaning that it doesn't hard stop a deletion at an hour mark, but ensures that once the time has passed, all subsequent deletions are deferred until the next retention cycle. Relates to elastic#43663 * Wow snapshots suuuure can take a long time. * Use a LongSupplier instead of actually sleeping * Remove TestLogging annotation * Remove rate limiting * Add SLM metrics gathering and endpoint (elastic#45362) * Add SLM metrics gathering and endpoint This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The stats stored include the number of snapshots taken, failed, deleted, the number of retention runs, as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount of time spent deleting snapshots from SLM retention. This commit also adds an endpoint for retrieving all stats (further commits will expose this in the SLM get-policy API) that looks like: ``` GET /_slm/stats { "retention_runs" : 13, "retention_failed" : 0, "retention_timed_out" : 0, "retention_deletion_time" : "1.4s", "retention_deletion_time_millis" : 1404, "policy_metrics" : { "daily-snapshots2" : { "snapshots_taken" : 7, "snapshots_failed" : 0, "snapshots_deleted" : 6, "snapshot_deletion_failures" : 0 }, "daily-snapshots" : { "snapshots_taken" : 12, "snapshots_failed" : 0, "snapshots_deleted" : 12, "snapshot_deletion_failures" : 6 } }, "total_snapshots_taken" : 19, "total_snapshots_failed" : 0, "total_snapshots_deleted" : 18, "total_snapshot_deletion_failures" : 6 } ``` This does not yet include HLRC for this, as this commit is quite large on its own. That will be added in a subsequent commit. Relates to elastic#43663 * Version qualify serialization * Initialize counters outside constructor * Use computeIfAbsent instead of being too verbose * Move part of XContent generation into subclass * Fix REST action for master merge * Unused import * Record history of SLM retention actions (elastic#45513) This commit records the deletion of snapshots by the retention component of SLM into the SLM history index for the purposes of reviewing operations taken by SLM and alerting. * Retry SLM retention after currently running snapshot completes (elastic#45802) * Retry SLM retention after currently running snapshot completes This commit adds a ClusterStateObserver to wait until the currently running snapshot is complete before proceeding with snapshot deletion. SLM retention waits for the maximum allowed deletion time for the snapshot to complete, however, the waiting time is not factored into the limit on actual deletions. Relates to elastic#43663 * Increase timeout waiting for snapshot completion * Apply patch From https://github.com/original-brownbear/elasticsearch/commit/2374316f0d1912c9e1498bece195546a1dc60bce.patch * Rename test variables * [TEST] Be less strict for stats checking * Skip SLM retention if ILM is STOPPING or STOPPED (elastic#45869) This adds a check to ensure we take no action during SLM retention if ILM is currently stopped or in the process of stopping. Relates to elastic#43663 * Check all actions preventing snapshot delete during retention (elastic#45992) * Check all actions preventing snapshot delete during retention run Previously we only checked to see if a snapshot was currently running, but it turns out that more things can block snapshot deletion. This changes the check to be a check for: - a snapshot currently running - a deletion already in progress - a repo cleanup in progress - a restore currently running This was found by CI where a third party delete in a test caused SLM retention deletion to throw an exception. Relates to elastic#43663 * Add unit test for okayToDeleteSnapshots * Fix bug where SLM retention task would be scheduled on every node * Enhance test logging * Ignore if snapshot is already deleted * Missing import * Fix SnapshotRetentionServiceTests * Expose SLM policy stats in get SLM policy API (elastic#45989) This also adds support for the SLM stats endpoint to the high level rest client. Retrieving a policy now looks like: ```json { "daily-snapshots" : { "version": 1, "modified_date": "2019-04-23T01:30:00.000Z", "modified_date_millis": 1556048137314, "policy" : { "schedule": "0 30 1 * * ?", "name": "<daily-snap-{now/d}>", "repository": "my_repository", "config": { "indices": ["data-*", "important"], "ignore_unavailable": false, "include_global_state": false }, "retention": {} }, "stats": { "snapshots_taken": 0, "snapshots_failed": 0, "snapshots_deleted": 0, "snapshot_deletion_failures": 0 }, "next_execution": "2019-04-24T01:30:00.000Z", "next_execution_millis": 1556048160000 } } ``` Relates to elastic#43663 * Rewrite SnapshotLifecycleIT as as ESIntegTestCase (elastic#46356) * Rewrite SnapshotLifecycleIT as as ESIntegTestCase This commit splits `SnapshotLifecycleIT` into two different tests. `SnapshotLifecycleRestIT` which includes the tests that do not require slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an integration test using `MockRepository` to simulate a snapshot being in progress. Relates to elastic#43663 Resolves elastic#46205 * Add error logging when exceptions are thrown * Update serialization versions * Fix type inference * Use non-Cancellable HLRC return value * Fix Client mocking in test * Fix SLMSnapshotBlockingIntegTests for 7.x branch * Update SnapshotRetentionTask for non-multi-repo snapshot retrieval * Add serialization guards for SnapshotLifecyclePolicy

…astic#69765) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}#4,ASC,LAST]]] ! OrderBy[[Order[i{r}#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes elastic#67237

…astic#69765) (elastic#70322) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}#4,ASC,LAST]]] ! OrderBy[[Order[i{r}#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes elastic#67237

…astic#69765) (elastic#70325) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}#4,ASC,LAST]]] ! OrderBy[[Order[i{r}#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes elastic#67237

…tic#140027) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes elastic#139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}#2, n{r}#4]] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}#2, n{r}#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}#2, n{r}#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}#2, n{r}#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null.

…tic#140027) (elastic#141095) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes elastic#139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}#2, n{r}#4]] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}#2, n{r}#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}#2, n{r}#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}#2, n{r}#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null. (cherry picked from commit f3ccb70) Co-authored-by: kanoshiou <uiaao@tuta.io>

nik9000 added 4 commits January 24, 2019 11:58

Test

7f83f6c

adsfa:

de05a8e

ASDF

914d30a

aSDF

69f73d0

nik9000 force-pushed the foo1 branch from 187f655 to 69f73d0 Compare January 24, 2019 19:07

nik9000 closed this Jan 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test#4

Test#4
nik9000 wants to merge 4 commits intomasterfrom
foo1

nik9000 commented Jan 24, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nik9000 commented Jan 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nik9000 commented Jan 24, 2019 •

edited

Loading