-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Description
List of issues we want to be fixed or planned before advertising Testground EKS support. This is a follow-up to #1499
Contributes to Milestone #1529
Tasks
Some issues were already broken in the previous k8s cluster, and we want to take fix/implement them carefully,
Others are simple quick fixes.
- Re-enable the k8s tests
Lines 53 to 55 in 8629aa2
# SYNC_SERVICE_HOST=localhost ./integration_tests/01_k8s_kind_placebo_ok.sh # SYNC_SERVICE_HOST=localhost ./integration_tests/02_k8s_kind_placebo_stall.sh echo these tests where disabled temporarily (https://github.com/testground/testground/pull/1515) - Review this task list and clarify them, ideally create issues with clear tests & deliverables
- Error while fetching a lot of logs in k8s runner #1450
- metrics
- we have a configuration, but we don't know why it was not enabled as before (Testing testground/testground-infra#78 #1518 (comment))
- It's not enabled by default
- Testing testground/testground-infra#78 #1518 (comment)
- InfluxDB (testground SDK)
- Prometheus / Grafana (pods)
- we have metrics from the testground daemon machine
- we don't have graphs,
- we don't have metrics from the worker machines.
- fix the persistentVolumeClaims error
- fix networking policy (find a "better" approach)
- fix measurement errors (this was broken before EKS)
- Fix the deploy script redirect to isolate stdout and stderr
- Identify limitations due to IPv6
- Fix the CPU & memory measure: eks: Implement deploy script and guide for EKS infra#78 (comment)
- Fix the dashboard links: eks: Implement deploy script and guide for EKS infra#78 (comment)
- Fix the cleanup: eks: Implement deploy script and guide for EKS infra#78 (comment)
- Resource constraints on the daemon: eks: Implement deploy script and guide for EKS infra#78 (comment)
- why do we use this arbitrary value?
- verify how new test instances impact the daemon's resource usage (do we need to scale it's limits with the size of the cluster?).
- Investigate how we upgrade multus eks: Implement deploy script and guide for EKS infra#78 (comment)
- Moving the CI to publish on ECR Migrate Docker images from DockerHub to AWS ECR on AWS Testground #1295
- Why is the testground goproxy failing "some" of the time,
- How to get logs from the testground goproxy container (we couldn't get it during the session)
- Why is testground detecting errors with graphana/prometheus/redis
- Implement a "real" automated test for the sysctl configuration (https://github.com/testground/infra/blob/4a3f62510244ea2d7e245a045d49fb0368e25507/k8s/eks/bash/functions.sh#L340)
- pod scheduling issue eks: Implement deploy script and guide for EKS infra#78 (comment)
- Complete the monitoring work
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels