[Colossus] Add pruning service#4845
Conversation
mnaamani
left a comment
There was a problem hiding this comment.
Overall this looks good and will be quite valuable. Left some feedback. Also needs an update from master to fix some merge conflicts.
storage-node/src/commands/server.ts
Outdated
| default: 1, | ||
| }), | ||
| cleanup: flags.boolean({ | ||
| char: 's', |
There was a problem hiding this comment.
char 's' already assigned for sync. Better remove it as I don't know what the behavior is if short flag is used and multiple flags assigned same char.
There was a problem hiding this comment.
Done, changed the short flag to c. WDYT?
storage-node/src/commands/server.ts
Outdated
| cleanupInterval: flags.integer({ | ||
| char: 'i', | ||
| description: 'Interval between periodic cleanup actions (in minutes)', | ||
| default: 1, |
There was a problem hiding this comment.
I would say this should be much higher like 1h, maybe even longer?
Depends on how much work/time it takes per cleanup run.
There was a problem hiding this comment.
Yeah, ideally it should be higher that the sync interval, I just set it to 1min, basically test the pruning behavior locally. Now I have set it to 6hrs
| ): Promise<{ dataObjectId: DataObjectId; pinnedCount: DataObjectPinCount } | undefined> { | ||
| if (idCache.has(dataObjectId)) { | ||
| await lock.acquireAsync() | ||
| const id = { dataObjectId, pinnedCount: idCache.get(dataObjectId) as DataObjectPinCount } |
There was a problem hiding this comment.
return type of function doesn't seem to match id, dataObjectId is a string where as in the return type it is DataObjectId (u64)
There was a problem hiding this comment.
The DataObjectId is also a string, as you can see
There was a problem hiding this comment.
Ah yes, i didn't spot that type definition.
|
|
||
| const timeoutMs = 60 * 1000 // 1 minute since it's only a HEAD request | ||
| const deletionTasks: DeleteLocalFileTask[] = [] | ||
| await Promise.all( |
There was a problem hiding this comment.
| await Promise.all( | |
| await Promise.allSettled( |
To avoid the promise being rejected when HEAD request to one data object fails, Promise.allSettled() is more appropriate.
Also I can image if the number of dataobjects could potentially be very large, doing a massive number of HTTP HEAD requests in parallel might be a bad idea. Perhaps we should do it in batches.
There was a problem hiding this comment.
Done, but I have to change target option in tsconfig.json to es2020 as Promise.allSettled does not seem to be available in es2017
mnaamani
left a comment
There was a problem hiding this comment.
One final small change and this should be ready to test in production :)
And updating from master to resolve merge conflict.
| dataObjectId: string | ||
| ): Promise<{ dataObjectId: DataObjectId; pinnedCount: DataObjectPinCount } | undefined> { | ||
| if (idCache.has(dataObjectId)) { | ||
| await lock.acquireAsync() |
There was a problem hiding this comment.
shouldn't the lock be acquired before checking idCache.has() ?
There was a problem hiding this comment.
Addressed. However, placing a lock before the if condition can lead to unnecessary blocking if the dataObjectId is not present in the cache. So I reimplemented by first checking the cache without acquiring the lock. This is a quick operation and if the dataObjectId is not present, just return without the overhead of acquiring the lock.
5e4be6e to
f692d15
Compare
mnaamani
left a comment
There was a problem hiding this comment.
I will merge this work into a new branch colossus-beta where we can do final tweaks for a "beta" release.
addresses #4813
This PR adds:
How the pruning of assets work
The
cleanupServiceruns the data objects cleanup/pruning workflow. It removes all the locally stored data objects that the operator is no longer obliged to keep, because either the data object has been deleted from the runtime or it has been moved to some other bucket/sPRECONDITIONS:
MAXIMUM_QN_LAGGING_THRESHOLDblocks, otherwise the cleanup workflow would not be performed in order to avoid pruning assets based on outdated state.MINIMUM_REPLICATION_THRESHOLD