Skip to content

[DSL Global Retention] Use data stream global retention metadata#106221

Merged
gmarouli merged 26 commits intoelastic:mainfrom
gmarouli:use-data-stream-global-retention-metadata
Mar 20, 2024
Merged

[DSL Global Retention] Use data stream global retention metadata#106221
gmarouli merged 26 commits intoelastic:mainfrom
gmarouli:use-data-stream-global-retention-metadata

Conversation

@gmarouli
Copy link
Copy Markdown
Contributor

@gmarouli gmarouli commented Mar 12, 2024

In this PR we make use in the code the global retention metadata that we introduced in #106170. We use it in responses of GET requests when data stream lifecycle was included.

{
  ...
  "lifecycle": {
    "data_retention": "365d",
    "effective_retention": "90d",
    "retention_determined_by": "max_global_retention"
  } 

Currently, there is no way to set the global retention for the user, so this PR is also marked as a >non-issue, it adds two new fields but the global retention functionality cannot influence them yet.

@gmarouli gmarouli added >non-issue :StorageEngine/Data streams Data streams and their lifecycles labels Mar 19, 2024
@gmarouli gmarouli marked this pull request as ready for review March 19, 2024 13:34
@elasticsearchmachine elasticsearchmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Mar 19, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@gmarouli
Copy link
Copy Markdown
Contributor Author

Known failure: #106457

public static final ConstructingObjectParser<DataStreamLifecycle, Void> PARSER = new ConstructingObjectParser<>(
"lifecycle",
false,
true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this change. I think it's because we don't expect (or allow) the user to pass in effective_retention or retention_determined_by, but we use this parser for round-trip serialization tests, which need to use that. Is that right? It might be worth a short comment in the code since the ConstructingObjectParser constructor's javadocs recommend against setting it to true (This should generally be set to true only when parsing responses from external systems, never when parsing requests from users).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's more than just the round-trip serialization test -- I tried setting this to false to see what else was impacted, and a yaml rest test also failed when the xcontent was being deserialized by DataStreamLifecycle.fromXContent().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for raising this topic @masseyke . I spent some more time looking around and I decided to change the approach for the following reasons:

  • Well, javadoc is very specific for the use of ignoreUnknownFields:

@param ignoreUnknownFields Should this parser ignore unknown fields? This should generally be set to true only when parsing responses from external systems, never when parsing requests from users.

  • DataStreamLifecycle is used for user input and using ignoreUnknownFields means that a user could add different fields including effective_retention which will be ignored but the user won't know.
  • The problem here I think is that DataStreamLifecycle is not only used for user input but it's also deserialised in the cluster state as XContent as part of the data stream.

As you see we do not have this problem with the rollover information, because they are only used when creating a response to the user. So, I decided to follow this approach. I added a flag, that is set only when we are creating a response to the user and it's false otherwise. This will ensure that the new and unexpected fields will only be present in the user responses and not during other serialisations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I requested another review because the approach changed and I would like your input. Thanks again for bringing this up! I think the original approach was not thought through properly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And uncovered a mistake on a payload... that's why the yaml was failing, or at least one of the reasons

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you did makes sense to me. I like that we no longer ignore unknown fields.

@masseyke
Copy link
Copy Markdown
Member

It looks good to me. It would probably be good to have some yaml rest tests -- maybe you're waiting to add those until you've added a way to set global retention?

Copy link
Copy Markdown
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few minor questions/comments, but it looks good to me.

@gmarouli
Copy link
Copy Markdown
Contributor Author

It looks good to me. It would probably be good to have some yaml rest tests -- maybe you're waiting to add those until you've added a way to set global retention?

Yes, indeed that would be in the final PR.

@gmarouli gmarouli requested a review from masseyke March 20, 2024 09:42
@masseyke
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/Data streams Data streams and their lifecycles Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants