Skip to content

[WIP] FASTER Log Compaction#112

Merged
badrishc merged 17 commits intomasterfrom
logcompaction
Mar 28, 2019
Merged

[WIP] FASTER Log Compaction#112
badrishc merged 17 commits intomasterfrom
logcompaction

Conversation

@badrishc
Copy link
Collaborator

Until now, the only way to clean up the log has been via log truncation (using ShiftBeginAddress), which has the model that older keys expire when they fall off the head of the log (if they have not been updated after that point).

Log compaction introduces the ability to scan the head of the log, until the specified address, for the keys that will get deleted in that range. Then, it scans the rest of the log to check which keys are live. Live keys are then copied over to the tail of the log, resulting in compaction by eliminating deleted or subsequently updated keys.

Log compaction will be exposed via an API call on FASTER as follows:

   fht.Log.Compact(compactUntil)

Our first iteration of the feature is single threaded, so that log compaction can be run as a background process. We will not support parallel instantiation of multiple log compaction threads. Log compaction is blocking, i.e., it returns after compaction until the specified address is complete, and BeginAddress has been shifted to compactUntil.

Also refer to issue #70.

Not yet ready for merge or review, but adding @gunaprsd and @peterfreiling for comments.

@badrishc badrishc added enhancement New feature or request work in progress Work in progress labels Mar 20, 2019
@gunaprsd
Copy link
Contributor

I have not talked about concurrency between updates and reinsertion in #70

The reinsertion algorithm needs to check for updates post the time we checked a record for GC. A straightforward way is to record tail address at start and always check the record chain after this address before doing the CAS for reinsertion.

@badrishc
Copy link
Collaborator Author

Here is my plan for handling the final case (NYI). We first complete a scan until SafeReadOnly (may need to continue the scan in case SRO has moved forward in the interim during the main log scan).

Then, for each live key to be re-inserted, I will perform a range-limited TraceBack until that SRO point, in order to check if the key exists in the region between tail and that point. If not found, we will perform a CAS to install the key.

I was thinking we cannot use "tail address" above instead of SRO, because the log may just be getting allocated and the record may be in an unstable state beyond SRO. In general, it is unsafe to scan beyond SRO because records in that region of memory may not yet have been successfully upserted, i.e., chained into the hash table.

The Log Scan capability has made compaction significantly simpler, and I am using an in-memory instance of FasterKv for temporarily storing keys to apply the validity test.

… An Upsert/RMW after a delete will create a new record at the tail. We never unset Tombstone bit - necessary for correctness.
@badrishc
Copy link
Collaborator Author

We now have a fully working prototype that supports deletes and log compaction. We will keep testing and cleaning up to bring the support up to production quality before merging. The API for log compaction also needs some thinking, e.g., do we provide a fixed memory budget or let users specify a log head range to compact from.

@badrishc badrishc merged commit dbe75e2 into master Mar 28, 2019
@badrishc badrishc deleted the logcompaction branch March 28, 2019 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request work in progress Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants