Add support for external datastore as source of historical ledger data#437
Conversation
9f70442 to
83c48d4
Compare
Shaptic
left a comment
There was a problem hiding this comment.
Just a lil' drive by review 🚗
| DefaultValue: 5 * time.Second, | ||
| }, | ||
| { | ||
| Name: "serve-ledgers-from-datastore", |
There was a problem hiding this comment.
Isn't this implied by the presence of a datastore config?
There was a problem hiding this comment.
reason
Adds a new flag serve_ledger_from_datastore. The flag used instead of just relying on the presence of a datastore config because same datastore config can in future be also be used for ingestion from a datastore, so this flag control over which behavior is enabled.
84e5f50 to
33ab837
Compare
16e5297 to
abba4e4
Compare
| return protocol.GetLedgersResponse{ | ||
| Ledgers: ledgers, | ||
| Ledgers: ledgers, | ||
| // TODO: update these fields using ledger range from datastore |
There was a problem hiding this comment.
I think we should consider using separate fields for the ledger range belonging to the data store. I think it would be useful to know which ledgers are available directly from rpc and which ledgers require access to the data store.
There was a problem hiding this comment.
Can you describe a client use case which benefits from the breakout of ledgers?
This pushes forward an internal aspect of RPC, totally understand it given the possible difference of meta content mentioned. It seems like this introduces additional work for clients to handle and possibly some confusion.
Ideally, the outcome from sourcing other backends for ledger metadata by RPC should be transitive at the client API layer, i.e. data provided by RPC is from the network, nothing more?
One thought..Can the potential difference of ledger close meta between RPC or datastores be mitigated if Galexie were to always enable all captive core flags that affect ledger close meta? In that way a datastore would always provide the superset of meta avoiding missing data cases when interleaved with meta from any other source.
|
Now that there are several stellar-core flags (diagnostic events, classic cap-67 events, etc) which affect the contents of ledger close meta, I think we should consider the scenario where ledgers in the datastore are inconsistent with the ledgers served locally from rpc. This issue can be addressed separate from this PR but I think we'll need to introduce some way to determine which features are enabled in the ledgers found from the data lake vs the rpc captive core. https://github.com/stellar/go/issues/5507 could be relevant |
tamirms
left a comment
There was a problem hiding this comment.
there are 2 issues that imo warrant further design discussions but I think those can be addressed separately from this PR
This is a really good point. Another way to solve this might be, that if you have a datastore configured as your backend, RPC's ledger retention window gets automatically set to 0, so RPC is effectively just acting as a lightweight facade/API layer on top of the data lake. I think I would be more comfortable with doing that if |
… available ledger range from the datastore.
2f0397a to
6b62f20
Compare
|
I'm going to merge the PR but I've tried to capture all the open items from the PR in this issue #453. Please add anything I might have missed. |
What
This PR addresses #425 - adding support for handling getLedgers requests that fall outside the rpc's local ledger retention window by proxies out of range requests to the datastore.
gen-config-filecommand now includes defaultdatastore_configandbuffered_storage_backend_configsections (commented out by default).Adds a new flag
serve_ledger_from_datastore. The flag used instead of just relying on the presence of a datastore config because same datastore config can in future be also be used for ingestion from a datastore, so this flag control over which behavior is enabled.Includes basic unit tests.
Why
#425
Known limitations
Currently, this assumes that the datastore has all ledgers starting from genesis. This is because there's no way yet to query datastore what range of ledgers it actually has. That limitation will be fixed once this stellar/go#5498) is done. We will then also use the real range to fill in the latestLedger and oldestLedger fields more accurately.
The
datastore_configandbuffered_storage_backend_configcan only be set through the config file and not with command-line flags. That’s because these settings can be complex or different depending on the type of datastore being used (like S3 or GCS). Supporting them in the config file keeps things simpler and more flexible.