[Serverless] Add retries to get default Fleet Server URL#2901
[Serverless] Add retries to get default Fleet Server URL#2901mrodm merged 6 commits intoelastic:mainfrom
Conversation
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/557 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/558 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/559 |
3ba63fc to
831ceca
Compare
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/560 |
|
test serverless |
internal/stack/serverless.go
Outdated
| fleetServerURL := "" | ||
| wait.UntilTrue(ctx, func(ctx context.Context) (bool, error) { | ||
| fleetServerURL, err = project.DefaultFleetServerURL(ctx, sp.kibanaClient) | ||
| if errors.Is(err, kibana.ErrFleetServerNotFound) { | ||
| logger.Debug("Fleet Server URL not found yet, retrying...") | ||
| return false, nil | ||
| } | ||
| if err != nil { | ||
| return false, err | ||
| } | ||
| logger.Debug("Fleet Server found") | ||
| return true, nil | ||
| }, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout) | ||
| if fleetServerURL == "" { | ||
| return Config{}, fmt.Errorf("failed to get fleet URL: %w", err) | ||
| } |
There was a problem hiding this comment.
This kibana request could be a valid response , but it could not contain any default Fleet Server.
This is why I've added this loop to query a few times until getting the value if possible.
There was a problem hiding this comment.
Should DefaultFleetServerURL return kibana.ErrFleetServerNotFound in this case?
There was a problem hiding this comment.
DefaultFleetServerURL (from the Kibana Client code) already returns that error in that scenario:
elastic-package/internal/kibana/fleet.go
Line 71 in 3cbb744
I use that error in the retry logic to keep iterating until a fleet server URL is retrieved or until the timeout is reached
EDIT: kibana.ErrFleetServerNotFound is part of the error (wrapped) in project.DefaultFleetServerURL:
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/561 |
| logger.Debug("Fleet Server found") | ||
| return true, nil | ||
| }, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout) | ||
| if fleetServerURL == "" { | ||
| return Config{}, fmt.Errorf("failed to get fleet URL: %w", err) | ||
| } |
There was a problem hiding this comment.
Should we include this check inside the retry logic? Or if an empty url is returned this is never solved?
| logger.Debug("Fleet Server found") | |
| return true, nil | |
| }, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout) | |
| if fleetServerURL == "" { | |
| return Config{}, fmt.Errorf("failed to get fleet URL: %w", err) | |
| } | |
| if fleetServerURL == "" { | |
| return false, fmt.Errorf("failed to get fleet URL: %w", err) | |
| } | |
| logger.Debug("Fleet Server found") | |
| return true, nil | |
| }, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout) |
There was a problem hiding this comment.
IIUC if it is returned an error by introducing the check inside the retry logic, this retry loop will stop without finding the actual fleet server:
elastic-package/internal/wait/wait.go
Line 23 in 3cbb744
Here, in this retry loop it is likely that a valid response does not contain any fleet url, so we need to try in that scenario.
internal/stack/serverless.go
Outdated
| fleetServerURL := "" | ||
| wait.UntilTrue(ctx, func(ctx context.Context) (bool, error) { | ||
| fleetServerURL, err = project.DefaultFleetServerURL(ctx, sp.kibanaClient) | ||
| if errors.Is(err, kibana.ErrFleetServerNotFound) { | ||
| logger.Debug("Fleet Server URL not found yet, retrying...") | ||
| return false, nil | ||
| } | ||
| if err != nil { | ||
| return false, err | ||
| } | ||
| logger.Debug("Fleet Server found") | ||
| return true, nil | ||
| }, sp.retriesDefaultFleetServerPeriod, sp.retriesDefaultFleetServerTimeout) | ||
| if fleetServerURL == "" { | ||
| return Config{}, fmt.Errorf("failed to get fleet URL: %w", err) | ||
| } |
There was a problem hiding this comment.
Should DefaultFleetServerURL return kibana.ErrFleetServerNotFound in this case?
internal/stack/serverless.go
Outdated
| config.Parameters[ParamServerlessFleetURL], err = project.DefaultFleetServerURL(ctx, sp.kibanaClient) | ||
| if err != nil { | ||
| fleetServerURL := "" | ||
| wait.UntilTrue(ctx, func(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
We should check the error returned by wait.UntilTrue.
There was a problem hiding this comment.
Right, I missed that check.
I'll add it.
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/562 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/563 |
💚 Build Succeeded
History
cc @mrodm |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/567 |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/568 |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/569 |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/570 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/573 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/574 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/575 |
|
test serverless |
|
Triggered serverless pipeline: https://buildkite.com/elastic/elastic-package-test-serverless/builds/576 |
Closes #2902
While creating Elastic stacks using the serverless provider, sometimes the
elastic-package stack upcommand fails to find the Fleet Server URL. This can be observed specially in the CI daily jobs. For instance:Considering that the Kibana Client already performs some retries in case of failures, this PR adds retries to the query that gets the default Fleet Server.
Example of this new wait process (buildkite link):