Skip to content

ARROW-12986: [C++][Gandiva] Implement new cache eviction policy algorithm in Gandiva#10465

Closed
jpedroantunes wants to merge 39 commits intoapache:masterfrom
s1mbi0se:feature/change-cache-policy
Closed

ARROW-12986: [C++][Gandiva] Implement new cache eviction policy algorithm in Gandiva#10465
jpedroantunes wants to merge 39 commits intoapache:masterfrom
s1mbi0se:feature/change-cache-policy

Conversation

@jpedroantunes
Copy link
Copy Markdown
Contributor

@jpedroantunes jpedroantunes commented Jun 7, 2021

This PR replaces the LRU based cache for gandiva with a new cache which takes into account the LLVM build time along with the LRU factor.
Here is a description of the suggested algorithm:

// A particular cache based on the GreedyDual-Size cache which is a generalization of LRU
// which defines costs for each cache values.
// The algorithm associates a cost, C, with each cache value. Initially, when the value 
// is brought into cache, C is set to be the cost related to the value (the cost is 
// always non-negative). When a replacement needs to be made, the value with the lowest C
// cost is replaced, and then all values reduce their C costs by the minimum value of C 
// over all the values already in the cache. 
// If a value is accessed, its C value is restored to its initial cost. Thus, the C costs 
// of recently accessed values retain a larger portion of the original cost than those of
// values that have not been accessed for a long time. The C costs are reduced as time 
// goes and are restored when accessed.

More info here

@jpedroantunes jpedroantunes changed the title ARROW-12986: [C++][Gandiva] Implement new cache eviction policy in Gandiva ARROW-12986: [C++][Gandiva] Implement new cache eviction policy algorithm in Gandiva Jun 7, 2021
@jpedroantunes jpedroantunes force-pushed the feature/change-cache-policy branch from e1bb1fc to d94618d Compare June 8, 2021 12:23
@jpedroantunes jpedroantunes force-pushed the feature/change-cache-policy branch 2 times, most recently from fa13d02 to 92f5984 Compare June 15, 2021 11:32
@jpedroantunes
Copy link
Copy Markdown
Contributor Author

@augustoasilva can you review, please?

Copy link
Copy Markdown
Contributor

@augustoasilva augustoasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is almost all good, just review this points that I've marked and evaluate them.

@jpedroantunes jpedroantunes force-pushed the feature/change-cache-policy branch from 92f5984 to e1911e7 Compare June 23, 2021 11:07
@jpedroantunes jpedroantunes force-pushed the feature/change-cache-policy branch 2 times, most recently from 35a3812 to 1083f91 Compare June 30, 2021 00:49
@jpedroantunes jpedroantunes force-pushed the feature/change-cache-policy branch from 1083f91 to 92708bc Compare July 6, 2021 22:58
@github-actions
Copy link
Copy Markdown

@jpedroantunes jpedroantunes force-pushed the feature/change-cache-policy branch from 92708bc to 213b74a Compare July 12, 2021 12:09
jvictorhuguenin pushed a commit to s1mbi0se/arrow that referenced this pull request Sep 21, 2021
…ithm in Gandiva

This PR replaces the LRU based cache for gandiva with a new cache which takes into account the LLVM build time along with the LRU factor.
Here is a description of the suggested algorithm:

```
// A particular cache based on the GreedyDual-Size cache which is a generalization of LRU
// which defines costs for each cache values.
// The algorithm associates a cost, C, with each cache value. Initially, when the value
// is brought into cache, C is set to be the cost related to the value (the cost is
// always non-negative). When a replacement needs to be made, the value with the lowest C
// cost is replaced, and then all values reduce their C costs by the minimum value of C
// over all the values already in the cache.
// If a value is accessed, its C value is restored to its initial cost. Thus, the C costs
// of recently accessed values retain a larger portion of the original cost than those of
// values that have not been accessed for a long time. The C costs are reduced as time
// goes and are restored when accessed.
```
More info [here](https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html)

Closes apache#10465 from jpedroantunes/feature/change-cache-policy and squashes the following commits:

02a7998 <João Pedro> Add todo for overflow handling for correctness
a62f2d4 <João Pedro> Add overflow handler for cache algorithm
540ac76 <João Pedro> Remove unused constructor for ValueCacheObject
58c64a9 <João Pedro> Apply linter corrections
213b74a <João Pedro> Apply corrections on the greedydualsize cache definition
4ab1bf1 <João Pedro> Remove base cache header file not used anymore
e66bff9 <João Pedro> Remove lru cache and change to use the greedy dual by default
6136f7c <João Pedro> Add string variations on cache tests
0d25678 <João Pedro> Correct linter errors
e008f8c <João Pedro> Add identation to PriorityItem class
e2c38a9 <João Pedro> Correct linter errors
08f1bd6 <João Pedro> Change cache implementation to handle special structure objects as values
d9ef056 <João Pedro> Change cache main abstraction to consider the usage of a ValueCacheObject
8658b1e <João Pedro> Remove unused getCacheType function
a45adb9 <João Pedro> Change BaseCache insert method to receive only key and value
9e46ce3 <João Pedro> Remove unused operator< implementation from cache keys
dabea56 <João Pedro> Remove unused operator< implementation from cache keys
88af686 <João Pedro> Add base logic for gd-size algorithm implementation on the new cache
3b3746e <João Pedro> Rename the created cache files to consider the new approach using greedy-dual-size algorithm
abed895 <João Pedro> Apply corrections and optimization on new cache classes
a8b0bd8 <João Pedro> Change lvu cache for not using unnecessarily a pair as value
3c013c8 <João Pedro> Change cache logic to use unique_ptr
8871018 <João Pedro> Fix wrong use of u_long to use uint64 instead
ddd90fc <João Pedro> Correct missing identation on gandiva cache files
a04726a <João Pedro> Change for not using u_long and use u_int64 on cache
10a7016 <João Pedro> Fix lint problems found on new cache documents on CI builds
f4874f6 <João Pedro> Fix lint problems on all added files
bcec5fc <João Pedro> Add logic for calculating the llvm build time to be considered on cache logic
253c826 <João Pedro> Adapt lru cache test for using the new insertion method definition
cd49c23 <João Pedro> Remove unused method definition from base cache class
0bee1a5 <João Pedro> Add inheritance definition as public on cache child classes
6763be6 <João Pedro> Add cache method for defining order parameter
24ec647 <João Pedro> Change cache class to handle with BaseClass pointer
e31d922 <João Pedro> Change base cache methods to be virtual
b8ff8b8 <João Pedro> Add base cache file as attemp to generalize caches
c32f90e <João Pedro> Add operator< implementation necessary for filter and project cache keys for compatibility with the new cache
1ece64d <João Pedro> Add new cache unit test file to the project CMakeLists.txt
8ebf667 <João Pedro> Add unit test for the new cache implementation
a08832a <João Pedro> Add implementation for cache based on a lower value policy

Authored-by: João Pedro <joaop@simbioseventures.com>
Signed-off-by: Praveen <praveen@dremio.com>
(cherry picked from commit a628ee0)
@anthonylouisbsb anthonylouisbsb deleted the feature/change-cache-policy branch January 27, 2022 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants