Conversation
264726d to
a910cf2
Compare
633fc44 to
780208f
Compare
This commit allows soft deletion of database using `influxdb3 database delete <db_name>` command. The write buffer and last value cache are cleared as well. closes: #25523
8a265cc to
8f0c25a
Compare
- In previous commit, the deletion of database immediately triggered clearing last cache and query buffer. But on restarts same logic had to be repeated to allow deleting database when starting up. This commit removes immediate deletion by explicitly calling necessary methods and moves the logic to `apply_catalog_batch` which already applies `CatalogOp` and also clearing cache and buffer in `buffer_ops` method which has hooks to call other places. closes: #25523
8f0c25a to
2b729e4
Compare
hiltontj
left a comment
There was a problem hiding this comment.
This mostly looks good. The only comment I think you need to address is to add the deleted field into the DatabaseSnapshot struct to ensure the deleted field round-trips to and from JSON when the catalog is serialized. Once that is in and you fix compiler warnings / insta snapshots, then I will approve.
The other comments are up to you if you want to address them.
influxdb3/tests/server/configure.rs
Outdated
| .send() | ||
| .await | ||
| .expect("delete database call succeed"); | ||
| assert_eq!(StatusCode::INTERNAL_SERVER_ERROR, resp.status()); |
There was a problem hiding this comment.
Can we send back 404 NOT_FOUND in this case?
| if let Some(db) = self.databases.get(&catalog_batch.database_id) { | ||
| let existing_table_count = db.tables.len(); | ||
|
|
||
| if let Some(new_db) = db.new_if_updated_from_batch(catalog_batch)? { | ||
| let new_table_count = new_db.tables.len() - existing_table_count; | ||
| if table_count + new_table_count > Catalog::NUM_TABLES_LIMIT { | ||
| return Err(Error::TooManyTables); | ||
| } | ||
| let new_db = Arc::new(new_db); | ||
| self.databases.insert(new_db.id, Arc::clone(&new_db)); | ||
| self.sequence = self.sequence.next(); | ||
| self.updated = true; | ||
| self.db_map.insert(new_db.id, Arc::clone(&new_db.name)); | ||
| if let Some(new_db) = DatabaseSchema::new_if_updated_from_batch(db, catalog_batch)? { | ||
| check_overall_table_count(db, &new_db, table_count)?; | ||
| self.upsert_db(new_db); | ||
| } | ||
| } else { | ||
| if self.databases.len() >= Catalog::NUM_DBS_LIMIT { | ||
| return Err(Error::TooManyDbs); | ||
| } | ||
|
|
||
| let new_db = DatabaseSchema::new_from_batch(catalog_batch)?; | ||
| if table_count + new_db.tables.len() > Catalog::NUM_TABLES_LIMIT { | ||
| return Err(Error::TooManyTables); | ||
| } | ||
|
|
||
| let new_db = Arc::new(new_db); | ||
| self.databases.insert(new_db.id, Arc::clone(&new_db)); | ||
| self.sequence = self.sequence.next(); | ||
| self.updated = true; | ||
| self.db_map.insert(new_db.id, Arc::clone(&new_db.name)); | ||
| let new_db = self.check_db_count(catalog_batch, table_count)?; | ||
| self.upsert_db(new_db); |
There was a problem hiding this comment.
IMHO the upsert_db method is nice here to DRY things up, but the logic captured in the check_overall_table_count and Self::check_db_count is simple enough to keep in line. Furthermore, I don't think their names as is capture what they are doing very well.
There was a problem hiding this comment.
Sure, think check_db_count does a bit more creates db and checks table count again. I'll rejig it and see if you still think it should be in-lined. Having said that am I missing something in check_overall_table_count? I don't mind in-lining by the way, just wanted to see if I've missed anything.
There was a problem hiding this comment.
No I don't think you missed anything the logic is still good. I'm being pedantic, but felt that moving the logic out to a helpers that aren't re-used otherwise was overkill.
There was a problem hiding this comment.
Yea definitely, what I wanted to tease out was the table count check as that gets used in both branches. I've kept the db check and creation inline now.
influxdb3_catalog/src/serialize.rs
Outdated
| // todo: check if it's right to default to false here, | ||
| // not sure where this is called | ||
| deleted: false, |
There was a problem hiding this comment.
Should be fine to add a deleted field on the DatabaseSnapshot struct, then base it off of that.
There was a problem hiding this comment.
This bit wasn't too obvious when I saw the From impls, I see all the types have ..Snapshot structs for serializing and deserializing, is that avoiding to build deserializer by hand? I guess in Deserialize impls for partial structs (..Snapshot) you can use deserialize directly and then just build the relevant fields for the target type manually without involving visitor impl - is that the motivation?
There was a problem hiding this comment.
Yeah, that about sums it up. The main motivation is the bi-directional maps that map from ID to name and vice versa. Those only need to be held in memory because the information in that bi-directional map is contained in the main map that maps the ID to the whole object, and which is serialized with the SerdeVecMap. No need to also serialize the bi-directional map and waste bytes.
Earlier on in the project, the motivation for the _Snapshot types was to capture the info that is contained in the arrow Schema, since we could not rely on its Serialize/Deserialize impls being stable.
Co-authored-by: Trevor Hilton <thilton@influxdata.com>
- `DatabaseSchema` serialization/deserialization is delegated to `DatabaseSnapshot`, so the `deleted` flag should be included in `DatabaseSnapshot` as well. - insta test snapshots fixed closes: #25523
d1a914d to
f447bde
Compare
f447bde to
5fbf7b4
Compare
5fbf7b4 to
c7d62aa
Compare
This commit allows to soft delete database using
influxdb3 database delete <db_name>. The write buffer and last value cache are cleared as well.closes: #25523