Skip to content

General Connection Record#882

Merged
zonotope merged 44 commits intomainfrom
refactor/general-connection
Oct 15, 2024
Merged

General Connection Record#882
zonotope merged 44 commits intomainfrom
refactor/general-connection

Conversation

@zonotope
Copy link
Contributor

@zonotope zonotope commented Sep 1, 2024

This patch remove the specific connection types in favor of a generalized connection record with a configuration system to provide specific features the old connections had. The configuration and system is managed with integrant, and I think we should add validation to connection configs in a later pull request, but I left that out here.

This patch also removes many methods from the connection protocols, but more work needs to be done to disentangle and remove circular dependencies from the code that operates on connections before we can remove the connection protocols all together, and that work will come in a later pull request.

This patch also adds an index-store to back FlakeDB records that combines storage, serialization, and caching so that FlakeDB records will eventually no longer need access to connections to function. More work has to be done to disentangle dbs from connections which will come in a later pull request.

These changes are based on #858, so please review that first if you haven't already.

@zonotope zonotope requested a review from a team September 1, 2024 20:01
@zonotope zonotope self-assigned this Sep 1, 2024

(derive :fluree.storage/file :fluree/content-storage)
(derive :fluree.storage/file :fluree/byte-storage)
(derive :fluree.storage/file :fluree/json-archive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the semantic distinction between these types of storage? content-storage and byte-storage I get, but what's a json-archive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json-archive corresponds to the fluree.db.storage/JsonArchive protocol. It's a read-only protocol with a single method for retrieving json data. It's the other side of the content-addressed store protocol, which only has a method for writing data. most of the storage implementations implement both protocols, but the remote storage implementation only implements JsonArchive because you can't modify data in a remote store; you can only read it.

(-index-file-delete [conn file-address] "Deletes an index item from storage"))
(-txn-read [conn txn-key] "Reads a transaction from storage"))

(comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still accurate? My understanding is that the Connection ledger state is a map of ledger-alias->channel[Ledger].

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this looks good to me.

Here's how I understand what's different:

There is now only one Connection (!!!). The connection has these storage keys:

  • :store, for implementing iConnection, reading and writing commits and txns
  • :index-store, for reading and writing index nodes
  • :primary-ns and :aux-nses, which use a mutable :byte-storage store.

I'm a fan of all the different protocols for storage, but I'm not sure I understand why :index-store is separate from :store, since I think they are the same in practice (I think, for instance, that both keys point to the same store instance).

A Connection is still a single-store-for-all-ledgers record, you couldn't have one ledger using a MemoryStore and another using a FileStore in the same Connection, I think.

[parallelism cache-max-mb defaults]
(-> (base-config parallelism cache-max-mb defaults)
(assoc :fluree.storage/memory {})
(assoc-in [:fluree.nameservice/storage-backed :address-prefix] "fluree:memory://")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a personal bugaboo, so feel free to not take it seriously, but I would really love to see the double-slash // removed. Tim Berners-Lee calls the decision to add the double slash his chief regret, and I'd really love to not perpetuate it : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here has always been that this allows us to distinguish a location. it is probably misused a bit here with 'memory' as doesn't really make sense in this situation.

When a // exists, we know a remote location is being specified (e.g. IP/dns address, but it could be a P2P network address or otherwise).

I've looked into multi-addr as an alternative, but it has more complexity than I think we'll ever need and it is something new people have to learn. For that reason I have preferred the // as, like it or not, it is an established convention.

@zonotope
Copy link
Contributor Author

I'm a fan of all the different protocols for storage, but I'm not sure I understand why :index-store is separate from :store, since I think they are the same in practice (I think, for instance, that both keys point to the same store instance).

An index store might be backed by the same underlying storage found under the :store key, but it is a different in practice. An index store isn't just storage, it is a record that encompasses storage, a cache, and a serializer. The system writes index data to/reads index data from an index store, and the store itself handles the serialization and caching. We don't do this for commits, so that's why there are two separate entries.

But, besides all that, I would like to further decouple index storage from commit storage because these are two different types of data that the system uses in different ways, so there shouldn't be a constraint that they use the same storage medium. I could easily imagine a use case where a user would like to store indexes on the local filesystem, but commits on ipfs, for example.

A Connection is still a single-store-for-all-ledgers record, you couldn't have one ledger using a MemoryStore and another using a FileStore in the same Connection, I think.

This should also change eventually. I think a user should be able to configure storage on a per-ledger basis. They should even be able to configure multiple storage media on a single ledger, but I'm changing enough right now so I think this specific change should wait until later.

@zonotope zonotope mentioned this pull request Oct 4, 2024
Base automatically changed from refactor/nameservice-protocol to main October 15, 2024 07:48
@zonotope zonotope merged commit 31f0e8e into main Oct 15, 2024
@zonotope zonotope deleted the refactor/general-connection branch October 15, 2024 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants