What happened?
The memory usage is observed with container_memory_working_set_bytes.
Before 1.30:
After upgrading to 1.30:
We haven't changed anything related to Kafka configuration in the meantime. Version used: quay.io/strimzi/kafka:0.33.0-kafka-3.2.0.
The problem with new behaviour is that sometimes we can get NodeHasInsufficientMemory which means more time for Kafka to recover.
The change in behaviour is present in other Java applications like Cassandra as well.
What did you expect to happen?
I'd expect memory to fill the cache and stay near the limit like before 1.30.
How can we reproduce it (as minimally and precisely as possible)?
One can run Kafka cluster in 1.29 and 1.30. Kafka will always fill all the memory it can (either up to memory limit or node memory limit).
You will see pattern of clearing cache memory in 1.30.
Anything else we need to know?
This is happening with multiple Kafka clusters. Some of those are running in cgroup v1 nodes, but some in cgroup v2.
Kubernetes version
Cloud provider
OS version
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=395e8b40dd8bc3fe97fa563ffa370c25bd1da560
GOOGLE_CRASH_ID=Lakitu
VERSION=113
VERSION_ID=113
BUILD_ID=18244.151.27
$ uname -a
Linux 6.1.100+ #1 SMP PREEMPT_DYNAMIC Sat Aug 24 16:19:44 UTC 2024 x86_64 AMD EPYC 7B13 AuthenticAMD GNU/Linux
What happened?
The memory usage is observed with
container_memory_working_set_bytes.Before 1.30:
After upgrading to 1.30:
We haven't changed anything related to Kafka configuration in the meantime. Version used:
quay.io/strimzi/kafka:0.33.0-kafka-3.2.0.The problem with new behaviour is that sometimes we can get
NodeHasInsufficientMemorywhich means more time for Kafka to recover.The change in behaviour is present in other Java applications like Cassandra as well.
What did you expect to happen?
I'd expect memory to fill the cache and stay near the limit like before 1.30.
How can we reproduce it (as minimally and precisely as possible)?
One can run Kafka cluster in 1.29 and 1.30. Kafka will always fill all the memory it can (either up to memory limit or node memory limit).
You will see pattern of clearing cache memory in 1.30.
Anything else we need to know?
This is happening with multiple Kafka clusters. Some of those are running in cgroup v1 nodes, but some in cgroup v2.
Kubernetes version
Cloud provider
OS version
$ uname -a