-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Add rowset id generator to FE and BE #1678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| OLAPStatus FeBasedRowsetIdGenerator::get_next_id(RowsetId* gen_rowset_id) { | ||
| std::lock_guard<std::mutex> l(_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::mutex is too heavy for get next id. In most cases, it is a only add operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| import com.google.common.collect.Maps; | ||
|
|
||
| public class RowsetIdGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name it a PersistedIdGernator? I think it can not only be used for RowsetId, but also be used for other id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could not. because rowsetid generator could have many implementations, for example, we could scan all rowset id in storage engine at start time and then generate rowset it one by one. In this case it is not persistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Id generator logic, we can use a class.
Then you can let RowsetIdGenerator contain it.
gensrc/thrift/FrontendService.thrift
Outdated
| // (start_rowset_id, end_rowset_id) not include start and end | ||
| struct TGetRowsetIdResponse { | ||
| 1: optional i64 start_rowset_id = -1 | ||
| 2: optional i64 end_rowset_id = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer [start, end) with start_id and id_count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
f94cb37 to
7f14f10
Compare
|
Can anyone explain what's the purpose of this PR and give a brief description of what's changed? |
efbc29d to
0d3988a
Compare
| vector<string> error_msgs; | ||
| TStatusCode::type status_code = TStatusCode::OK; | ||
| return_value.__set_snapshot_version(PREFERRED_SNAPSHOT_VERSION); | ||
| int32_t return_snapshot_version = PREFERRED_SNAPSHOT_VERSION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comment here to describe the version meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
be/src/olap/storage_engine.h
Outdated
|
|
||
| bool rowset_id_in_use(RowsetId& rowset_id) { return _rowset_id_generator->id_in_use(rowset_id); }; | ||
|
|
||
| void release_rowset_id(RowsetId& rowset_id) { return _rowset_id_generator->release_id(rowset_id); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| std::mutex _lock; | ||
| RowsetId _next_id = -1; | ||
| RowsetId _id_batch_end = -1; | ||
| virtual void release_id(RowsetId& rowset_id) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment to describe the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // generator a id according to data dir | ||
| // rowsetid is not globally unique, it is dir level | ||
| // it saves the batch end id into meta env | ||
| OLAPStatus next_id(RowsetId* rowset_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| OLAPStatus next_id(RowsetId* rowset_id); | |
| OLAPStatus next_id(RowsetId* rowset_id) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // it saves the batch end id into meta env | ||
| OLAPStatus next_id(RowsetId* rowset_id); | ||
|
|
||
| bool id_in_use(RowsetId& rowset_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bool id_in_use(RowsetId& rowset_id); | |
| bool id_in_use(const RowsetId& rowset_id) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| lo = (id_version << 56) + (low & LOW_56_BITS); | ||
| } | ||
|
|
||
| std::string to_string() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to_string and serialize and deserialize api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_string is ok, because it is a string value in pb.
be/src/olap/rowset/beta_rowset.cpp
Outdated
|
|
||
| namespace doris { | ||
|
|
||
| std::string BetaRowset::segment_file_path(const std::string& dir, RowsetId rowset_id, int segment_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::string BetaRowset::segment_file_path(const std::string& dir, RowsetId rowset_id, int segment_id) { | |
| std::string BetaRowset::segment_file_path(const std::string& dir, const RowsetId& rowset_id, const int segment_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return _valid_rowset_ids.find(rowset_id) != _valid_rowset_ids.end(); | ||
| } | ||
|
|
||
| void UniqueRowsetIdGenerator::release_id(RowsetId& rowset_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void UniqueRowsetIdGenerator::release_id(RowsetId& rowset_id) { | |
| void UniqueRowsetIdGenerator::release_id(const RowsetId& rowset_id) { |
| } | ||
|
|
||
|
|
||
| TEST_F(UniqueRowsetIdGeneratorTest, GenerateIdTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a case to test BE restart situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // options | ||
| doris::EngineOptions options; | ||
| options.store_paths = paths; | ||
| options.backend_uid = doris::UniqueId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is backend_uid act like a random seed that will be generated every time BE started?
If so, I think the name is confused that make people think of it as the ID of a BE?
Maybe random_seed is better?
| static const uint16_t OLAP_STRING_MAX_LENGTH = 65535; | ||
|
|
||
| static const int32_t PREFERRED_SNAPSHOT_VERSION = 2; | ||
| static const int32_t PREFERRED_SNAPSHOT_VERSION = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comment to explain the version 1 , 2, and 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| request.setVersion_hash(versionHash); | ||
| request.setList_files(true); | ||
| request.setPreferred_snapshot_version(2); | ||
| request.setPreferred_snapshot_version(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can make this a global variable? Its better not to write magic number
| @@ -0,0 +1,111 @@ | |||
| // Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should add this file to run-ut.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
be/src/olap/storage_engine.h
Outdated
| std::unique_ptr<TabletManager> _tablet_manager; | ||
| std::unique_ptr<TxnManager> _txn_manager; | ||
|
|
||
| RowsetIdGenerator* _rowset_id_generator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should delete it in destructor or use std::unique_ptr
|
|
||
| namespace doris { | ||
|
|
||
| UniqueRowsetIdGenerator::UniqueRowsetIdGenerator(UniqueId backend_uid) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| UniqueRowsetIdGenerator::UniqueRowsetIdGenerator(UniqueId backend_uid) : | |
| UniqueRowsetIdGenerator::UniqueRowsetIdGenerator(const UniqueId& backend_uid) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| _backend_uid(backend_uid), _inc_id(1) { | ||
| } | ||
| // generator a id according to data dir | ||
| // rowsetid is not globally unique, it is dir level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rowset id is global unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
594ee87 to
218e07e
Compare
Fix bug Compile success Build unit test successfully Rowsetid version error Add rowsetid gc logic Fix bug Add unique rowset id unit test
imay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Fix BE UT failure introduced by apache#1526 in TabletSchemaCache (apache#1661) * Fix be ut * Revert "Fix BE UT failure introduced by apache#1526 in TabletSchemaCache (apache#1647)" This reverts commit bca5d32. * (selectdb-cloud) Modify debug checks for detached schema kv (apache#1678)
No description provided.