-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Description
So, large databases again. Here's stats for a block-deduplicated yet still large database. There are a couple of custom classes:
% stindex -mode account index-v0.14.0.db
0x00: 2206187 items, 182629 KB keys + 1463447 KB data, 82 B + 663 B avg, 2573 B max
0x01: 185656 items, 10388 KB keys + 1790909 KB data, 55 B + 9646 B avg, 924525 B max
0x02: 916890 items, 84795 KB keys + 3667 KB data, 92 B + 4 B avg, 192 B max
0x03: 384 items, 27 KB keys + 5 KB data, 72 B + 15 B avg, 87 B max
0x04: 1109 items, 17 KB keys + 17 KB data, 15 B + 15 B avg, 69 B max
0x06: 383 items, 3 KB keys + 0 KB data, 9 B + 2 B avg, 18 B max
0x07: 510 items, 4 KB keys + 12 KB data, 9 B + 24 B avg, 41 B max
0x09: 194 items, 0 KB keys + 123 KB data, 5 B + 634 B avg, 11484 B max
0x0a: 3 items, 0 KB keys + 0 KB data, 14 B + 20 B avg, 58 B max
0x0b: 181836 items, 2363 KB keys + 10694 KB data, 13 B + 58 B avg, 173 B max
0x0d: 44282 items, 1461 KB keys + 59899 KB data, 33 B + 1352 B avg, 1606431 B max
0xf0: 206952 items, 23084 KB keys + 31301 KB data, 111 B + 151 B avg, 340 B max
0xf1: 6725022 items, 665162 KB keys + 1116642 KB data, 98 B + 166 B avg, 4788 B max
Total 10469408 items, 969939 KB keys + 4476724 KB data.
The two interesting ones here are f0 (deleted files < 90 days old) and f1 (deleted files > 90 days old). These are actually class 00 that I've just broken out for accounting. Compared to 00 (normal, live files) the old deleted files are almost half of the total volume (1.1 GB out of 1.4+1.1 GB) and a vast majority of the items (6.7 M out of 6.7+2.2 M). The latter especially is a lot of iteration that wouldn't need to happen, when we iterate. Oh, and volume wise it's the vast majority of keys as well.
I propose that we add a config to garbage collect old, deleted files. We might have it default to a 180 day limit or something conservative like that, or even leave it off by default. But when set we could periodically iterate and clean out entries that are old, when all known devices agree with the delete. That is, only if the global list contains only delete entries and the entry itself is older than the cutoff.
Manual cleanup would also be acceptable, but there's really no way today to clean out deleted items and have them stay cleaned out.