Skip to content

Refactor TMTTableFlusher#2

Merged
flowbehappy merged 1 commit intoraftfrom
region-flush-refactor
Feb 22, 2019
Merged

Refactor TMTTableFlusher#2
flowbehappy merged 1 commit intoraftfrom
region-flush-refactor

Conversation

@flowbehappy
Copy link
Contributor

@flowbehappy flowbehappy commented Feb 18, 2019

This PR asynchronize the flush process from raft log apply process.

@flowbehappy flowbehappy force-pushed the region-flush-refactor branch from 242f51e to 46b8d25 Compare February 20, 2019 07:22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say for a specific region r1 containing two tables t1 and t2, a raft command put 10 bytes of t1 and 10 bytes of t2 into r1. Then delta here would be 20 bytes, and would be added to relative tables (both t1 and t2). That is not accurate. Am I correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct. But actually we don't need to be 100% percent accurate. And a region belonging to more than one table is very rare.

Copy link
Contributor

@zanmato1984 zanmato1984 Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about let Region::onCommand return table id and delta bytes pair and pass into updateRegion? So that in updateRegion we could know how many bytes each table changes. This doesn't require much code I assume, but I could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can theoretically, but I still don't think it is necessary to make the logic more complicated. It doesn't loss anything even if we flush a partition by mistake. And again this situation is very rare.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought this wouldn't take much code, and the logic could be quite less confusing, so worth doing. But it's your call after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should add some documents to help reducing confuses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good too.

Copy link
Contributor

@zanmato1984 zanmato1984 Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation will the storage be null? Does it imply some potential risks that we should do more than simply log and ignore (return), like throwing an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we should throw an exception. Some potential bugs may cause this I guess. Although haven't found it happen yet.

Copy link
Contributor

@zanmato1984 zanmato1984 Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw updateRegion was after Region::onCommand. What if background flush happens in between? More specifically: 1. some data was inserted into a region; 2. background flush grabbed the mutex in RegionPartition and flushed the data of the region (time limit reached for example); 3. updateRegion then executed, it saw an empty region with wrong update arguments (before_cache_bytes for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk does exist. And it is one of the many multi-thread issues in our system. I would like to file an issue and fix it until we have a better global design. https://internal.pingcap.net/jira/browse/FLASH-132

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@zanmato1984
Copy link
Contributor

LGTM as long as comments addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflict with 42c6a91

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants