Skip to content

KAFKA-9123 Test a large number of replicas#7621

Merged
mumrah merged 30 commits into
apache:trunkfrom
mumrah:KAFKA-9123-lots-of-replicas
Nov 23, 2019
Merged

KAFKA-9123 Test a large number of replicas#7621
mumrah merged 30 commits into
apache:trunkfrom
mumrah:KAFKA-9123-lots-of-replicas

Conversation

@mumrah

@mumrah mumrah commented Oct 31, 2019

Copy link
Copy Markdown
Member

This PR adds two new system tests to exercise the system with a large number of replicas.

The first test creates 500 topics with 34 partitions and x3 replication for a total of 51,000 replicas. This is done across 8 brokers which is 6375 replicas per broker. Once the topics have been created and verified, a controlled shutdown of each broker is performed. Finally, each of the topics is deleted.

The other test runs produce and consume benchmark utilities against a small number of topics.

Comment thread tests/kafkatest/tests/core/replica_scale_test.py Outdated

@soondenana soondenana left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David, do you plan to bump up the # of topics created to 100K in the two tests? Currently they are 10 and 1000 in two tests.

Comment thread tests/kafkatest/services/kafka/kafka.py Outdated

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few small comments.

Comment thread tests/kafkatest/services/kafka/kafka.py Outdated
Comment thread tests/kafkatest/services/kafka/kafka.py Outdated
kafka_topic_script = self.path.script("kafka-topics.sh", node)

cmd = kafka_topic_script + " "
cmd += "--zookeeper %(zk_connect)s --delete --topic %(topic)s " % {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we may as well use --bootstrap-server since delete_topic is a new addition?

@cluster(num_nodes=12)
@parametrize(topic_count=500, partition_count=34, replication_factor=3)
def test_produce_consume(self, topic_count, partition_count, replication_factor):
t0 = time.time()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe use meaningful names? e.g. topic_creation_start

Even better would be to add some kind of timed function

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any timer utilities in the python tests?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any, but I haven't checked. I thought it might be straightforward to add one, but it's up to you.

"replication-factor": replication_factor,
"configs": {"min.insync.replicas": 1}
}
self.kafka.create_topic(topic_cfg, describe=False)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this patch, but we should do a KIP to add support for batch topic creation.

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming tests pass. Just had one question.

for i in range(topic_count):
topic = "topic-%04d" % i
self.logger.info("Deleting topic %s" % topic)
self.kafka.delete_topic(topic)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding batch deletion here would be useful also. This has caused problems in Kafka previously.

@mumrah mumrah Nov 22, 2019

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. Exposing the batch create/delete through kafka-topics.sh would be nice and help out a lot in the tests (since our kafka fixture uses the CLIs).

Comment thread tests/kafkatest/tests/core/replica_scale_test.py Outdated

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mumrah

mumrah commented Nov 23, 2019

Copy link
Copy Markdown
Member Author

@mumrah mumrah merged commit b15e05d into apache:trunk Nov 23, 2019
mumrah added a commit to mumrah/kafka that referenced this pull request Nov 26, 2019
Two tests using 50k replicas on 8 brokers:
* Do a rolling restart with clean shutdown, delete topics
* Run produce bench and consumer bench on a subset of topics

Reviewed-By: David Jacot <djacot@confluent.io>, Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants