Skip to content

Federated queries across servers#96

Merged
zonotope merged 81 commits intomainfrom
refactor/encapsulate-connection
Oct 15, 2024
Merged

Federated queries across servers#96
zonotope merged 81 commits intomainfrom
refactor/encapsulate-connection

Conversation

@zonotope
Copy link
Contributor

@zonotope zonotope commented Oct 4, 2024

This patch integrates fluree/db#906 to implement federated queries across distinct server instances. Please see fluree.server.test.integration.remote-system-test for a demonstration of three separate server instances running on 3 different ports that are each able to pull data in from the others when processing queries.

We needed to make fluree connections and server instances much more flexible to facilitate this functionality, so as part of this work I've also added a new json-ld based configuration format that allows users to describe different configurations and how they fit together using the same format that the data in the database is stored in.

@zonotope zonotope requested a review from a team October 4, 2024 06:46
@zonotope zonotope self-assigned this Oct 4, 2024
@zonotope
Copy link
Contributor Author

zonotope commented Oct 4, 2024

@fluree/core There are 6 test failures here, but they are unrelated to this work. Updating the db dependency to the latest on fluree/server main HEAD has the same failures. I will try to get to the bottom of those failures later.

[_ config]
config)

(def default-resource-name "config.jsonld")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zonotope Right now it seems config.jsonld is an example of what's possible (e.g. with the s3 & ipns publishers). But since resources/config.jsonld is the default that's loaded if no config is passed, we may want to create a new file like resources/full-example-config.jsonld and then make config.jsonld a more representative default. Maybe something like this

{
  "@context": {
    "@base": "https://ns.flur.ee/dev/config/main/",
    "@vocab": "https://ns.flur.ee/system#",
    "profiles": {
      "@container": [
        "@graph",
        "@id"
      ]
    }
  },
  "@id": "standaloneServer",
  "@graph": [
    {
      "@id": "localDiskStorage",
      "@type": "Storage",
      "addressIdentifier": "ee.flur/main",
      "filePath": "/opt/fluree-server/data"
    },
    {
      "@id": "connection",
      "@type": "Connection",
      "parallelism": 4,
      "cacheMaxMb": 200,
      "commitStorage": {
        "@id": "localDiskStorage"
      },
      "indexStorage": {
        "@id": "localDiskStorage"
      },
      "primaryPublisher": {
        "@type": "Publisher",
        "storage": {
          "@id": "localDiskStorage"
        }
      }
    },
    {
      "@id": "consensus",
      "@type": "Consensus",
      "consensusProtocol": "standalone",
      "maxPendingTxns": 42,
      "connection": {
        "@id": "connection"
      }
    },
    {
      "@id": "http",
      "@type": "API",
      "httpPort": 8090,
      "maxTxnWaitMs": 120000
    }
  ]
}

Currently if you run server as an uberjar with no config option or env vars passed in, it fails w/ aws client connect errors, I'm assuming because the default config.jsonld includes those hypothetical examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zonotope Right now it seems config.jsonld is an example of what's possible (e.g. with the s3 & ipns publishers). But since resources/config.jsonld is the default that's loaded if no config is passed, we may want to create a new file like resources/full-example-config.jsonld and then make config.jsonld a more representative default. Maybe something like this

{
  "@context": {
    "@base": "https://ns.flur.ee/dev/config/main/",
    "@vocab": "https://ns.flur.ee/system#",
    "profiles": {
      "@container": [
        "@graph",
        "@id"
      ]
    }
  },
  "@id": "standaloneServer",
  "@graph": [
    {
      "@id": "localDiskStorage",
      "@type": "Storage",
      "addressIdentifier": "ee.flur/main",
      "filePath": "/opt/fluree-server/data"
    },
    {
      "@id": "connection",
      "@type": "Connection",
      "parallelism": 4,
      "cacheMaxMb": 200,
      "commitStorage": {
        "@id": "localDiskStorage"
      },
      "indexStorage": {
        "@id": "localDiskStorage"
      },
      "primaryPublisher": {
        "@type": "Publisher",
        "storage": {
          "@id": "localDiskStorage"
        }
      }
    },
    {
      "@id": "consensus",
      "@type": "Consensus",
      "consensusProtocol": "standalone",
      "maxPendingTxns": 42,
      "connection": {
        "@id": "connection"
      }
    },
    {
      "@id": "http",
      "@type": "API",
      "httpPort": 8090,
      "maxTxnWaitMs": 120000
    }
  ]
}

Currently if you run server as an uberjar with no config option or env vars passed in, it fails w/ aws client connect errors, I'm assuming because the default config.jsonld includes those hypothetical examples

Yes, we should definitely do this. I was planning to in the final cleanup push, but I hadn't had a chance to yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also convert the raft example too.

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about why we've moved the config to be json-ld. It seems like a source of significant additional complexity.

@zonotope
Copy link
Contributor Author

I'm curious about why we've moved the config to be json-ld. It seems like a source of significant additional complexity.

The additional complexity in the configuration format was necessitated by the additional flexibility in the systems the configuration format needed to support. Any configuration format that wasn't JSON-LD and yet was flexible enough to support starting all the different types of systems we need to start would just be an ad-hoc re-implementation of JSON-LD and yet another syntax for our users to learn. This is a JSON-LD database, so that's the one configuration format we know our users either already know or soon will.

We can have pre-built configs for common scenarios, but we needed a richer format for the general case, and all the essential required features just happened to be supported by JSON-LD.

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@zonotope zonotope merged commit d5eb31b into main Oct 15, 2024
@zonotope zonotope deleted the refactor/encapsulate-connection branch October 15, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants