Skip to content

AWS SD: ECS Discover Standalone Tasks#18029

Merged
sysadmind merged 1 commit intoprometheus:mainfrom
matt-gp:ecs-sd-bug-17920
Feb 11, 2026
Merged

AWS SD: ECS Discover Standalone Tasks#18029
sysadmind merged 1 commit intoprometheus:mainfrom
matt-gp:ecs-sd-bug-17920

Conversation

@matt-gp
Copy link
Collaborator

@matt-gp matt-gp commented Feb 6, 2026

The current ECS role in AWS SD assumes that a task is part of a service. This means that tasks that are started as part of AWS Batch will get missed and not be discovered.

This change completely reworks/overhauls the ECS role in order to discover ECS Tasks regardless of if they are part of a service or not.

Also this change optimises some of the external calls so that they are done once per cluster instead of once per task and also makes sure that these external calls use pagination and/or batching wherever possible in order to handle situations where there are a lot of resources to discover. Plus removes the custom batching function in favour of the slices.Chunk function from the standard library.

The tests have also been updated also to cover the scenario where a task is standalone and not part of a service.

Which issue(s) does the PR fix:

#17920

Does this PR introduce a user-facing change?

NONE

@matt-gp matt-gp requested review from a team and sysadmind as code owners February 6, 2026 10:24
@matt-gp matt-gp marked this pull request as draft February 6, 2026 10:24
@matt-gp matt-gp force-pushed the ecs-sd-bug-17920 branch 27 times, most recently from c7754ff to 2b550d2 Compare February 7, 2026 21:52
@matt-gp matt-gp force-pushed the ecs-sd-bug-17920 branch 7 times, most recently from a931fff to 7375fc1 Compare February 8, 2026 15:03
The current ECS role in AWS SD assumes that a task is part of a service.
This means that tasks that are started as part of AWS Batch will get
missed and not be discovered. This changed fixes this so that standalone
tasks can be discovered as well.

Signed-off-by: matt-gp <small_minority@hotmail.com>
@matt-gp matt-gp marked this pull request as ready for review February 10, 2026 10:58
@matt-gp
Copy link
Collaborator Author

matt-gp commented Feb 10, 2026

@SuperQ @sysadmind Would you be able to review?

This one is quite big but it does fundamentally change how the ecs discovery works, plus adds quite a lot of optimisations.

@sysadmind
Copy link
Contributor

Looks like GitHub now supports reviewing individual commits - that would be helpful in a case like this to break down each of these functions into a very easy to review piece. Hopefully the new UI helps with this.

@sysadmind sysadmind merged commit 68df59b into prometheus:main Feb 11, 2026
32 checks passed
@matt-gp matt-gp deleted the ecs-sd-bug-17920 branch February 11, 2026 09:09
clusterMap := make(map[string]types.Cluster)
errg, ectx := errgroup.WithContext(ctx)
errg.SetLimit(d.cfg.RequestConcurrency)
for batch := range slices.Chunk(clusters, 100) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a const ecsBatchSize = 100 just for cleanliness?

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems mostly reasonable, just minor nit about the 100 constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants