Skip to content

KAFKA-2373: Add Kafka-backed offset storage for Copycat.#202

Closed
ewencp wants to merge 8 commits into
apache:trunkfrom
ewencp:kafka-2373-copycat-distributed-offset
Closed

KAFKA-2373: Add Kafka-backed offset storage for Copycat.#202
ewencp wants to merge 8 commits into
apache:trunkfrom
ewencp:kafka-2373-copycat-distributed-offset

Conversation

@ewencp

@ewencp ewencp commented Sep 10, 2015

Copy link
Copy Markdown
Contributor

No description provided.

@ewencp

ewencp commented Sep 10, 2015

Copy link
Copy Markdown
Contributor Author

@gwenshap Sorry, this turned out a bit bigger than intended because some of the MockConsumer stuff was incomplete, but much more useful than trying to use an EasyMock object.

One issue with the current patch is that it is only unit tested. Two additional JIRAs (2374, 2375) address other components necessary for the full distributed version. Until then, it doesn't make much sense to have more extensive tests. I did, however, manually verify by changing the Worker to use this implementation then running the system tests using that version. Not sure if we want something more intermediate until the other two patches are in place or if we should just leave it to the last one to integrate them all and test them end-to-end.

@asfbot

asfbot commented Sep 10, 2015

Copy link
Copy Markdown

kafka-trunk-git-pr #386 SUCCESS
This pull request looks good

Comment thread checkstyle/import-control.xml Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate line

@asfbot

asfbot commented Sep 10, 2015

Copy link
Copy Markdown

kafka-trunk-git-pr #388 SUCCESS
This pull request looks good

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean you manually assign this consumer to read all partitions? So the consumer group id doesn't matter or is not used? I couldn't see where a consumer group was being set.

That means that each instance of this code consumes the entire topic, right? Which is exactly what you want.

I ask because I have many use cases where I want a consumer to get all partitions. We currently do it by trying to create unique consumer group ids, but that is kind of annoying.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are using this in simple consumer mode. I initially started with the approach you're describing. I didn't actually want a consumer group since each consumer reads the entire topics, we don't need offset commits, etc. However, one drawback with this approach is that it doesn't automatically pick up changes in the # of partitions in the topic.

However, I think this shouldn't be a problem anyway because if you wan to use a compacted topic for this (which should be reasonable), you can't just change the # of partitions since it'll break the key -> partition mapping.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this seems much nicer.

You said "simple consumer mode". So this is the New Consumer in simple consumer mode, correct?

Will the New Consumer automatically handle rebalances due to leader failover (broker failure where the partition leader changes), even when in simple consumer mode? Because that was a downside to the previous Simple Consumer -- you had to handle leadership changes on your own.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new consumer, if you specify what to consume by using assign(), it consumes from those topic partitions and doesn't use any of the group management features. This only makes sense if you want a single consumer instance to see everything. Everything related to which broker you need to be fetching data for should be handled for you. If you use subscribe(), then you join your consumer group and are assigned a subset of the topic partitions by the consumer coordinator. All the consumer group rebalancing is handled automatically in that case.

@ewencp

ewencp commented Sep 14, 2015

Copy link
Copy Markdown
Contributor Author

@gwenshap Updated with some issues after some updates to trunk. Also added some basic system tests. They are very close to the standalone tests for now. As we add the other components we can improve them to do a better job of sanity checking the distributed mode.

Also, if you can keep a careful eye on the MockConsumer/MockConsumerTest that'd be helpful. I noticed what appeared to be an error with how it was reporting offsets, and I want to make sure we get those semantics right. MockConsumer isn't much use if it doesn't actually match KafkaConsumer behavior....

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your IDE did this automatically and we normally stick to the no wildcard rule?

@ewencp ewencp force-pushed the kafka-2373-copycat-distributed-offset branch from b3eacbe to 04bcc1c Compare September 23, 2015 01:35
@asfbot

asfbot commented Sep 23, 2015

Copy link
Copy Markdown

kafka-trunk-git-pr #496 FAILURE
Looks like there's a problem with this pull request

@ewencp

ewencp commented Sep 23, 2015

Copy link
Copy Markdown
Contributor Author

@gwenshap Rebased to fix some conflicts after (I think) resolving all your comments. Tests pass locally, and we'll see what @asfbot says. I'm running system tests here but don't expect any issues since the final fixes were pretty minimal. Anything else you want addressed before commit?

@gwenshap

Copy link
Copy Markdown
Contributor

Looks like there are some conflicts now :(

Mind rebasing again?

LGTM otherwise.

@asfgit asfgit closed this in 48b4d69 Sep 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants