Feature/remote resources by zonotope · Pull Request #906 · fluree/db

zonotope · 2024-10-04T06:37:11Z

This patch removes the remote connection and replaces it with the fluree.db.remote-system/RemoteSystem component that is integrated in the generalized connection objects. Remote system objects implement the fluree.db.storage/JsonArchive protocol, so they behave like read only storage for both commits and index nodes. They also implement the fluree.db.nameservice/iNameService protocol to allow for commit lookups and the fluree.db.nameservice/Publication protocols to allow for subscriptions.

General connections can have multiple name services, storage components, publications, and publishers, and the remote systems just sit along side all of those components and behave the same way as they do.

Because there can now be multiple storage components, this patch also adds the fluree.db.storage/Catalog record to integrate storage objects and route reads or writes to the correct storage components based on the address of the data being written or read.

To differentiate between addresses with the same storage method on different systems (e.g. fluree:file://... on the local and remote system), I've introduced an optional storage identifier associated with storage components. These are not necessary for a fully local system with only a single storage component per storage method, but it's required for multiple components with the same method or for remote systems. The identifier has a namespace and a name separated by a '/' and can't contain a ':', and they work the same way as maven coordinates. The namespace should be something the user controls, like a domain, and the name part should be unique within the namespace. When storage identifiers exist on the storage component, addresses for that component will contain the identifier. This allows those addresses to be usable as part of a broader network of interconnected systems.

Besides these changes, I have done some more work to disentangle namespace dependencies making a lot of functionality go through connections, and start in the connection namespace.

This patch builds on #876, #858, #882, and #888, in that order, so please review those first if you haven't already.

…sources

1.12.0 triggers a bug in either eastwood or the new clojure version that causes a weird exception error during eastwood linting. I bumped it to 1.12 previously to save a few kestrokes in non-essential code in some tests so it's easiest to bump it back down until the bug is fixed.

…sources

dpetran

Here's how I've understood the changes here:

Instead of a singular store, a Connection now has a storage Catalog that encapsulates multiple stores. The connection can now read from any of the specified stores, using the address to discern which store in the catalog is the desired one. The connection must pick a specific storage to write to.

We also have a RemoteSystem, but I don't actually see that being used. They are Publications, which can subscribe to a ledger. Subscribed publications are sent new commits via websocket. There's no concept of a remote storage, just a subscribed ledger. They are also JsonArchives, so we can read from them.

Everything is now run through the Connection, appropriately, since it connects everything together.

dpetran · 2024-10-04T17:15:54Z

src/clj/fluree/db/api.cljc


-;; ledger operations
-
 (defn parse-connection-options


I'm generally not a fan of putting a function in a different ns than the only place it is used, but I think we might want to tuck this into connection or system. And maybe move promise-wrap into a util ns while we're at it.

This is still a work in progress, but I think the changes I'm envisioning are big enough to warrant their own pr so I decided to leave this as is for now.

Since the cyclic namespace dependencies have (mostly) been cleaned up, I think we should do two things to clean up the clojure api a bit more: (1) remove the "ledger" as a first class api object and (2) have the fluree.db.connection namespace be the api entry point (appropriately renamed, of course).

Ledger data is already cached on the connection object, and the current ledger object is another, in my opinion redundant, source of state. We should load dbs and transact through the connection object, both localizing the application state to the connection and simplifying the api.

We have a fluree.db.api namespace because the namespace hierarchy was so complex and cyclic. now that most functionality flows through the connection, the code in the connection namespace can serve as the main api with a little more cleanup, and the separate api (as well as transact and query api namespaces) would be made redundant.

dpetran · 2024-10-08T19:29:58Z

src/clj/fluree/db/api.cljc

      (let [{:keys [method] :as opts*} (parse-connection-options opts)

            config (case method
-                     :remote (<? (remote-conn/connect opts*))


If we no longer have a :remote method we can remove that from the parse-connection-options function.

dpetran · 2024-10-08T19:33:31Z

src/clj/fluree/db/api.cljc

                                                {:status 400, :error :db/unsupported-operation}))))]
        (system/start config)))))

-(defn connect-file


Why lose this one and not the others? Just curious.

We're going to lose all of them eventually. This one just wasn't used by any of the tests.

dpetran · 2024-10-08T20:51:31Z

src/clj/fluree/db/connection/system.cljc

  (:require [fluree.db.connection :as connection]
            [fluree.db.cache :as cache]
+            [fluree.db.storage :as storage]
+            [fluree.db.remote-system :as remote]


Is this actually used?

It is used, but only by code in fluree/server as of now.

dpetran · 2024-10-08T20:54:22Z

src/clj/fluree/db/ledger.cljc


-(defrecord Ledger [id address alias did state cache primary-publisher
-                   secondary-publishers commit-storage index-storage reasoner])
+(defrecord Ledger [conn id address alias did state cache commit-storage


Should this be commit-catalog and index-catalog?

dpetran · 2024-10-08T21:03:44Z

src/clj/fluree/db/connection.cljc

+  (go-try
+    (loop [nses nameservices]
+      (when-let [nameservice (first nses)]
+        (or (<? (nameservice/lookup nameservice ledger-address))


Do we have any way to ensure that this address is the most up-to-date? Say we have two nameservices, one is offline for quite a while but then comes back on right as we load, will we fork at that point? And will we have a mechanism for "rebasing" if we need to later?

After discussing, this will create a fork. Also, if the only nameservice with a ledger is read only and the user transacts to it, a local fork will be created.

…sources

dpetran

📺

zonotope added 30 commits September 12, 2024 09:16

add explicit function to read file address to connection api

aab6ad2

move committing under the domain of the connection

9343551

add a function to replicate index nodes

0f67505

update namespace for commit function in sid migration

8836d94

update commit! location in deprecated api ns

63ba264

remove unused requires

eea7353

Merge branch 'refactor/encapsulate-connection' into feature/remote-re…

1bef911

…sources

add fn to split address into method and local parts

52c3cc2

content-write -> content-write-bytes

776c5c2

parse storage identifier from an address if present

6932de9

use parse-local-path function to parse local paths

97e18b0

remove replication connection api

5efd77d

add Catalog record to group multiple storage media on connections

6c905c7

incorporate identifier if present when building addresses

d2cb6b1

add function to build storage location strings

e89b12c

consolidate s3 namespaces

c04e22f

include auxiliary data in address; s3 bucket and prefix is auxiliary

627c513

add identifiers to storage records

67b6abe

add addressable protocol that allows implementers to report location

875d75a

add a component for a remote system used by remote storage and ns

aeaad6c

create -> open to match other storages

4dd66cb

fluree.db.method.remote -> fluree.db.remote-system

deb66d5

add remote system component from remote ns

15129af

parse-local-path -> get-local-path

75a1830

integrate remote systems into catalogs; address identifiers fn

4ef8058

add default location to write to catalogs

2cfa52d

split storage and catalog namespaces to remove circular dependency

4fbb2d6

go back to expecting fully realized components in conection/connect

3860e70

add independent catalog ns

a736810

add identifiable protocol for storage address identifiers

404a5e3

zonotope added 16 commits October 3, 2024 01:29

remove unused requires

bc10c5b

remove unused and nonexistent subscribers opt

6fc2cc4

remove unused ledger keys

42c8973

read primary address from the primary publisher directly

f446cf3

ns-addresses -> publish-addresses

f727641

only consider publishers when looking up ledger addresses

041927a

move address method to publisher protocol

23cc147

rename storage nameservice

2dbadfd

add protocol method for addresses known to a publication

934110c

try all known addresses when loading an alias, not just the primary

aa1b45b

published-ledger? returns a channel

22bb0ab

use only published addresses when loading ledger aliases

1b3427b

use storage fns to parse nameservice addresses

5cfe805

add byte stores to catalogs; add function to replicate index nodes

d160de8

disallow colon in identifiers

4f3e84d

appease cljs compiler

0e61b17

zonotope requested a review from a team October 4, 2024 06:37

zonotope self-assigned this Oct 4, 2024

zonotope mentioned this pull request Oct 4, 2024

Federated queries across servers fluree/server#96

Merged

zonotope added 2 commits October 5, 2024 22:13

Merge branch 'refactor/encapsulate-connection' into feature/remote-re…

aaa60b6

…sources

dpetran reviewed Oct 8, 2024

View reviewed changes

zonotope added 4 commits October 9, 2024 14:05

add fn to extract graph from jsonld; use it in parsing transactions

c2041bc

update json-ld dependency

8465e42

Merge branch 'refactor/encapsulate-connection' into feature/remote-re…

be4b461

…sources

Merge branch 'refactor/encapsulate-connection' into feature/remote-re…

82c3252

…sources

dpetran approved these changes Oct 14, 2024

View reviewed changes

Base automatically changed from refactor/encapsulate-connection to main October 15, 2024 07:49

zonotope merged commit 3f05a64 into main Oct 15, 2024

zonotope deleted the feature/remote-resources branch October 15, 2024 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/remote resources#906

Feature/remote resources#906
zonotope merged 80 commits intomainfrom
feature/remote-resources

zonotope commented Oct 4, 2024

Uh oh!

dpetran left a comment

Uh oh!

dpetran Oct 4, 2024

Uh oh!

zonotope Oct 12, 2024

Uh oh!

dpetran Oct 8, 2024

Uh oh!

dpetran Oct 8, 2024

Uh oh!

zonotope Oct 14, 2024

Uh oh!

dpetran Oct 8, 2024

Uh oh!

zonotope Oct 12, 2024

Uh oh!

dpetran Oct 8, 2024

Uh oh!

dpetran Oct 8, 2024

Uh oh!

dpetran Oct 10, 2024

Uh oh!

dpetran left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zonotope commented Oct 4, 2024

Uh oh!

dpetran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dpetran left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants