Add DB_OBJECT_APPROXIMATE_SIZE and DB_OBJECT_APPROXIMATE_TOTAL_SIZE#4070
Add DB_OBJECT_APPROXIMATE_SIZE and DB_OBJECT_APPROXIMATE_TOTAL_SIZE#4070katzyn merged 2 commits intoh2database:masterfrom
Conversation
|
I just wonder what is the use of this function at all? If it tries determine ("precisely" or approximate) disk space used by DB object, shouldn't all pages related to it be included, not just current version of the tree? On the other hand, implementation already became complex enough, so is it really worth it? Maybe we can drop it altogether instead. Looks to me like a perfect solution for a non-existent problem. |
|
Such functions are used for data storage optimizations. Simple queries can be written to get data size estimation for all tables and all indexes to determine good candidates for further investigation. Sometimes disk space is mostly used by very unexpected tables or indexes. Disk space used by different versions isn't really important here. Current implementation returns very useful results, but it is horribly slow, so it will be nice to have an inaccurate but fast alternative version. Maybe separate semi-accurate and fast versions aren't needed, but I'm not sure that this fast version doesn't have any flaws, everything depends on data distribution. |
|
I don't like my implementation of |
|
|
@andreitokar: We at manticore have a very real use case and need for this, because at banks and insurances we run very large history tables without a clear retention policy -- because naturally you want to keep as much business data as possible until space constraints touch you with the ground of reality. So we regularly check the database for top 10 largest objects (tables + indices) and then start to prune. @katzyn: at least for us large deviations do not matter, we only need an indicator where to start pruning. A typical 20% : 80% approach (find the 20% of the tables consuming 80% of the space) is more than sufficient. |
Massive updates have an own overhead, but this is a different problem and it isn't critical in many cases when H2 is used as a production database. For example, size of some production database usually floats between 40 GiB after full defragmentation and 75 GiB during normal operation, this behavior is known and acceptable. Situation 1. Sizes of database, its backup copies and especially of SQL dumps were increased significantly, we know that it isn't a garbage collection problem. There are no tables with too many rows comparing with our expectations, so what happened? Oh, table X has 30 GiB of data in it, but it has only 20,000 rows, what's it? Totally unexpected large values were put into Situation 2. We need to decrease overall size of poorly designed database in an old large project and currently there is no time to change everything, so we need to figure out most important parts first. After that we can check what can be done with them. These problems aren't database-specific, I used similar functions in PostgreSQL too. |
Situation 3: History tables (adding cards onto the stack periodically) without a clear retention policy (which is mostly a political problem, not a technical problem -- the DBA is still stuck with it). It really does not depend on the RDBMS, but on the business logic and operations only. |
|
Personally, I'm perfectly fine with the current performance, even if it's slow. It's not the information one need every time a DB is accessed. My question is: why does the web console need to check table/index size while opening a DB? This information is not displayed upfront. When I open a DB via JdbcDataSource.getConnection it gets opened instantly and the web console should behave in the same way. |
Because JDBC metadata was designed many years ago in such bad way. Some its parts look too MySQL-specific, but other database systems are very different from MySQL. H2 Console reads information about indexes from
Unfortunately, it can't, because it needs to read various information about this database. But we can speed up problematic |
Page.getDiskSpaceUsed()and related methods in tables and indexes now have anapproximateparameter. Withapproximate = truethis method doesn't read every page any more and therefore works much faster with large tables. The initial implementation for this case isn't perfect and can be improved in the future. At least it is faster, it is stable, and it is accurate enough.New
DB_OBJECT_APPROXIMATE_SIZEandDB_OBJECT_APPROXIMATE_TOTAL_SIZEfunctions are introduced. They are similar toDB_OBJECT_SIZEandDB_OBJECT_TOTAL_SIZE, but they use this parameter for a faster evaluation.DatabaseMetaData.getIndexInfo()withapproximate = truenow also uses a fast inaccurate version to resolve a problem with H2 Console: #4058 (comment)