Skip to content

[Serverless] Add retries to get default Fleet Server URL#2901

Merged
mrodm merged 6 commits intoelastic:mainfrom
mrodm:http_retry_serverless
Sep 15, 2025
Merged

[Serverless] Add retries to get default Fleet Server URL#2901
mrodm merged 6 commits intoelastic:mainfrom
mrodm:http_retry_serverless

Conversation

@mrodm
Copy link
Contributor

@mrodm mrodm commented Sep 12, 2025

Closes #2902

While creating Elastic stacks using the serverless provider, sometimes the elastic-package stack up command fails to find the Fleet Server URL. This can be observed specially in the CI daily jobs. For instance:

Considering that the Kibana Client already performs some retries in case of failures, this PR adds retries to the query that gets the default Fleet Server.

Example of this new wait process (buildkite link):

2025/09/12 08:58:17 DEBUG project <project> initialized
2025/09/12 08:58:18 DEBUG Fleet Server URL not found yet, retrying...
2025/09/12 08:58:20 DEBUG Fleet Server found
Elasticsearch host: ...

@mrodm mrodm self-assigned this Sep 12, 2025
@mrodm mrodm changed the title Add retries to get default Fleet Server URL [Serverless]Add retries to get default Fleet Server URL Sep 12, 2025
@mrodm mrodm changed the title [Serverless]Add retries to get default Fleet Server URL [Serverless] Add retries to get default Fleet Server URL Sep 12, 2025
@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm mrodm force-pushed the http_retry_serverless branch from 3ba63fc to 831ceca Compare September 12, 2025 09:24
@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

Comment on lines 113 to 128
fleetServerURL := ""
wait.UntilTrue(ctx, func(ctx context.Context) (bool, error) {
fleetServerURL, err = project.DefaultFleetServerURL(ctx, sp.kibanaClient)
if errors.Is(err, kibana.ErrFleetServerNotFound) {
logger.Debug("Fleet Server URL not found yet, retrying...")
return false, nil
}
if err != nil {
return false, err
}
logger.Debug("Fleet Server found")
return true, nil
}, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout)
if fleetServerURL == "" {
return Config{}, fmt.Errorf("failed to get fleet URL: %w", err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kibana request could be a valid response , but it could not contain any default Fleet Server.
This is why I've added this loop to query a few times until getting the value if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should DefaultFleetServerURL return kibana.ErrFleetServerNotFound in this case?

Copy link
Contributor Author

@mrodm mrodm Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultFleetServerURL (from the Kibana Client code) already returns that error in that scenario:

return "", ErrFleetServerNotFound

I use that error in the retry logic to keep iterating until a fleet server URL is retrieved or until the timeout is reached

https://github.com/elastic/elastic-package/pull/2901/files#diff-b4efb1ac558d15d3077525cb9be59dce12ee723a4749253e9ca7cc860419f1f8R116

EDIT: kibana.ErrFleetServerNotFound is part of the error (wrapped) in project.DefaultFleetServerURL:

return "", fmt.Errorf("failed to query fleet server hosts: %w", err)

@elastic-vault-github-plugin-prod

@mrodm mrodm marked this pull request as ready for review September 12, 2025 10:34
@mrodm mrodm requested a review from a team September 12, 2025 10:34
Comment on lines +123 to 128
logger.Debug("Fleet Server found")
return true, nil
}, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout)
if fleetServerURL == "" {
return Config{}, fmt.Errorf("failed to get fleet URL: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this check inside the retry logic? Or if an empty url is returned this is never solved?

Suggested change
logger.Debug("Fleet Server found")
return true, nil
}, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout)
if fleetServerURL == "" {
return Config{}, fmt.Errorf("failed to get fleet URL: %w", err)
}
if fleetServerURL == "" {
return false, fmt.Errorf("failed to get fleet URL: %w", err)
}
logger.Debug("Fleet Server found")
return true, nil
}, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC if it is returned an error by introducing the check inside the retry logic, this retry loop will stop without finding the actual fleet server:

return false, err

Here, in this retry loop it is likely that a valid response does not contain any fleet url, so we need to try in that scenario.

Comment on lines 113 to 128
fleetServerURL := ""
wait.UntilTrue(ctx, func(ctx context.Context) (bool, error) {
fleetServerURL, err = project.DefaultFleetServerURL(ctx, sp.kibanaClient)
if errors.Is(err, kibana.ErrFleetServerNotFound) {
logger.Debug("Fleet Server URL not found yet, retrying...")
return false, nil
}
if err != nil {
return false, err
}
logger.Debug("Fleet Server found")
return true, nil
}, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout)
if fleetServerURL == "" {
return Config{}, fmt.Errorf("failed to get fleet URL: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should DefaultFleetServerURL return kibana.ErrFleetServerNotFound in this case?

config.Parameters[ParamServerlessFleetURL], err = project.DefaultFleetServerURL(ctx, sp.kibanaClient)
if err != nil {
fleetServerURL := ""
wait.UntilTrue(ctx, func(ctx context.Context) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the error returned by wait.UntilTrue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed that check.
I'll add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 91accb9

@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 12, 2025

test serverless

@elastic-vault-github-plugin-prod

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mrodm

@mrodm
Copy link
Contributor Author

mrodm commented Sep 15, 2025

test serverless

@elastic-vault-github-plugin-prod

@elastic-vault-github-plugin-prod

@elastic-vault-github-plugin-prod

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 15, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 15, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 15, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm
Copy link
Contributor Author

mrodm commented Sep 15, 2025

test serverless

@elastic-vault-github-plugin-prod

@mrodm mrodm requested a review from jsoriano September 15, 2025 12:54
@mrodm mrodm merged commit 4912d2f into elastic:main Sep 15, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creating Serverless projects fail to find Fleet Server URL

3 participants