Skip to content

Feature/remote resources#906

Merged
zonotope merged 80 commits intomainfrom
feature/remote-resources
Oct 15, 2024
Merged

Feature/remote resources#906
zonotope merged 80 commits intomainfrom
feature/remote-resources

Conversation

@zonotope
Copy link
Contributor

@zonotope zonotope commented Oct 4, 2024

This patch removes the remote connection and replaces it with the fluree.db.remote-system/RemoteSystem component that is integrated in the generalized connection objects. Remote system objects implement the fluree.db.storage/JsonArchive protocol, so they behave like read only storage for both commits and index nodes. They also implement the fluree.db.nameservice/iNameService protocol to allow for commit lookups and the fluree.db.nameservice/Publication protocols to allow for subscriptions.

General connections can have multiple name services, storage components, publications, and publishers, and the remote systems just sit along side all of those components and behave the same way as they do.

Because there can now be multiple storage components, this patch also adds the fluree.db.storage/Catalog record to integrate storage objects and route reads or writes to the correct storage components based on the address of the data being written or read.

To differentiate between addresses with the same storage method on different systems (e.g. fluree:file://... on the local and remote system), I've introduced an optional storage identifier associated with storage components. These are not necessary for a fully local system with only a single storage component per storage method, but it's required for multiple components with the same method or for remote systems. The identifier has a namespace and a name separated by a '/' and can't contain a ':', and they work the same way as maven coordinates. The namespace should be something the user controls, like a domain, and the name part should be unique within the namespace. When storage identifiers exist on the storage component, addresses for that component will contain the identifier. This allows those addresses to be usable as part of a broader network of interconnected systems.

Besides these changes, I have done some more work to disentangle namespace dependencies making a lot of functionality go through connections, and start in the connection namespace.

This patch builds on #876, #858, #882, and #888, in that order, so please review those first if you haven't already.

@zonotope zonotope requested a review from a team October 4, 2024 06:37
@zonotope zonotope self-assigned this Oct 4, 2024
1.12.0 triggers a bug in either eastwood or the new clojure version that causes
a weird exception error during eastwood linting. I bumped it to 1.12 previously
to save a few kestrokes in non-essential code in some tests so it's easiest to
bump it back down until the bug is fixed.
Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I've understood the changes here:

Instead of a singular store, a Connection now has a storage Catalog that encapsulates multiple stores. The connection can now read from any of the specified stores, using the address to discern which store in the catalog is the desired one. The connection must pick a specific storage to write to.

We also have a RemoteSystem, but I don't actually see that being used. They are Publications, which can subscribe to a ledger. Subscribed publications are sent new commits via websocket. There's no concept of a remote storage, just a subscribed ledger. They are also JsonArchives, so we can read from them.

Everything is now run through the Connection, appropriately, since it connects everything together.


;; ledger operations

(defn parse-connection-options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not a fan of putting a function in a different ns than the only place it is used, but I think we might want to tuck this into connection or system. And maybe move promise-wrap into a util ns while we're at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a work in progress, but I think the changes I'm envisioning are big enough to warrant their own pr so I decided to leave this as is for now.

Since the cyclic namespace dependencies have (mostly) been cleaned up, I think we should do two things to clean up the clojure api a bit more: (1) remove the "ledger" as a first class api object and (2) have the fluree.db.connection namespace be the api entry point (appropriately renamed, of course).

Ledger data is already cached on the connection object, and the current ledger object is another, in my opinion redundant, source of state. We should load dbs and transact through the connection object, both localizing the application state to the connection and simplifying the api.

We have a fluree.db.api namespace because the namespace hierarchy was so complex and cyclic. now that most functionality flows through the connection, the code in the connection namespace can serve as the main api with a little more cleanup, and the separate api (as well as transact and query api namespaces) would be made redundant.

(let [{:keys [method] :as opts*} (parse-connection-options opts)

config (case method
:remote (<? (remote-conn/connect opts*))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we no longer have a :remote method we can remove that from the parse-connection-options function.

{:status 400, :error :db/unsupported-operation}))))]
(system/start config)))))

(defn connect-file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lose this one and not the others? Just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to lose all of them eventually. This one just wasn't used by any of the tests.

(:require [fluree.db.connection :as connection]
[fluree.db.cache :as cache]
[fluree.db.storage :as storage]
[fluree.db.remote-system :as remote]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used, but only by code in fluree/server as of now.


(defrecord Ledger [id address alias did state cache primary-publisher
secondary-publishers commit-storage index-storage reasoner])
(defrecord Ledger [conn id address alias did state cache commit-storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be commit-catalog and index-catalog?

(go-try
(loop [nses nameservices]
(when-let [nameservice (first nses)]
(or (<? (nameservice/lookup nameservice ledger-address))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any way to ensure that this address is the most up-to-date? Say we have two nameservices, one is offline for quite a while but then comes back on right as we load, will we fork at that point? And will we have a mechanism for "rebasing" if we need to later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing, this will create a fork. Also, if the only nameservice with a ledger is read only and the user transacts to it, a local fork will be created.

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📺

Base automatically changed from refactor/encapsulate-connection to main October 15, 2024 07:49
@zonotope zonotope merged commit 3f05a64 into main Oct 15, 2024
@zonotope zonotope deleted the feature/remote-resources branch October 15, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants