[WIP] FASTER Log Compaction by badrishc · Pull Request #112 · microsoft/FASTER

badrishc · 2019-03-20T00:24:06Z

Until now, the only way to clean up the log has been via log truncation (using ShiftBeginAddress), which has the model that older keys expire when they fall off the head of the log (if they have not been updated after that point).

Log compaction introduces the ability to scan the head of the log, until the specified address, for the keys that will get deleted in that range. Then, it scans the rest of the log to check which keys are live. Live keys are then copied over to the tail of the log, resulting in compaction by eliminating deleted or subsequently updated keys.

Log compaction will be exposed via an API call on FASTER as follows:

   fht.Log.Compact(compactUntil)

Our first iteration of the feature is single threaded, so that log compaction can be run as a background process. We will not support parallel instantiation of multiple log compaction threads. Log compaction is blocking, i.e., it returns after compaction until the specified address is complete, and BeginAddress has been shifted to compactUntil.

Also refer to issue #70.

Not yet ready for merge or review, but adding @gunaprsd and @peterfreiling for comments.

…not be deleted outright from memory).

gunaprsd · 2019-03-20T21:39:52Z

I have not talked about concurrency between updates and reinsertion in #70

The reinsertion algorithm needs to check for updates post the time we checked a record for GC. A straightforward way is to record tail address at start and always check the record chain after this address before doing the CAS for reinsertion.

badrishc · 2019-03-20T21:48:51Z

Here is my plan for handling the final case (NYI). We first complete a scan until SafeReadOnly (may need to continue the scan in case SRO has moved forward in the interim during the main log scan).

Then, for each live key to be re-inserted, I will perform a range-limited TraceBack until that SRO point, in order to check if the key exists in the region between tail and that point. If not found, we will perform a CAS to install the key.

I was thinking we cannot use "tail address" above instead of SRO, because the log may just be getting allocated and the record may be in an unstable state beyond SRO. In general, it is unsafe to scan beyond SRO because records in that region of memory may not yet have been successfully upserted, i.e., chained into the hash table.

The Log Scan capability has made compaction significantly simpler, and I am using an in-memory instance of FasterKv for temporarily storing keys to apply the validity test.

… An Upsert/RMW after a delete will create a new record at the tail. We never unset Tombstone bit - necessary for correctness.

cs/test/BlittableLogCompactionTests.cs

…using a mix of log scan and lookup via hash table.

badrishc · 2019-03-22T17:39:23Z

We now have a fully working prototype that supports deletes and log compaction. We will keep testing and cleaning up to bring the support up to production quality before merging. The API for log compaction also needs some thinking, e.g., do we provide a fixed memory budget or let users specify a log head range to compact from.

…API.

…nto logcompaction

badrishc added 2 commits March 19, 2019 15:58

Initial commit for rough prototype of single threaded log compaction

8a752f5

Added two testcases for log compaction over blittable log

ef1a11c

badrishc added enhancement New feature or request work in progress Work in progress labels Mar 20, 2019

First checkin for native delete support (via tombstones if record can…

87099c7

…not be deleted outright from memory).

badrishc mentioned this pull request Mar 20, 2019

Is removing in the road map? #85

Closed

badrishc added 3 commits March 20, 2019 18:02

Updates

cd58413

Fixing corner cases for deletes mixed with RMW and Upsert operations.…

d494f4f

… An Upsert/RMW after a delete will create a new record at the tail. We never unset Tombstone bit - necessary for correctness.

Further commits and testcases for log compaction

c2ab80c

peterfreiling-zz reviewed Mar 21, 2019

View reviewed changes

cs/test/BlittableLogCompactionTests.cs Show resolved Hide resolved

badrishc and others added 4 commits March 21, 2019 15:40

Added code to correctly handle mutable region during log compaction, …

8b498ca

…using a mix of log scan and lookup via hash table.

Merge branch 'master' into logcompaction

03a1f1b

Added testcases for log compaction with generic objects.

a1a9d92

Fixing receiving async callbacks after dispose

b955e10

badrishc and others added 7 commits March 26, 2019 11:18

Merge branch 'master' into logcompaction

89aae8b

Added long running sample, small fixes, added CurrentAddress to scan …

1a23b6a

…API.

Merge branch 'master' into logcompaction

82dbfae

Updates

3be19e9

Merge branch 'logcompaction' of https://github.com/Microsoft/FASTER i…

3673652

…nto logcompaction

Update

d8dfbbf

Fixing testcase dir paths

203fd55

badrishc merged commit dbe75e2 into master Mar 28, 2019

badrishc deleted the logcompaction branch March 28, 2019 04:58

badrishc mentioned this pull request Aug 29, 2019

[C++] Support for log compaction (port from C#) #172

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] FASTER Log Compaction#112

[WIP] FASTER Log Compaction#112
badrishc merged 17 commits intomasterfrom
logcompaction

badrishc commented Mar 20, 2019

Uh oh!

gunaprsd commented Mar 20, 2019

Uh oh!

badrishc commented Mar 20, 2019

Uh oh!

Uh oh!

badrishc commented Mar 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

badrishc commented Mar 20, 2019

Uh oh!

gunaprsd commented Mar 20, 2019

Uh oh!

badrishc commented Mar 20, 2019

Uh oh!

Uh oh!

badrishc commented Mar 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants