Fix/federated graph crawl by zonotope · Pull Request #744 · fluree/db

zonotope · 2024-03-22T04:32:40Z

This patch add support for selecting and formatting subgraphs rooted from subject nodes in federated queries. It required a reorganization of our graph crawl code to run in parallel so that we could execute the queries on each component of the federated graph, and stitch the results together with each recursive search.

I also added support for fuel tracking to the graph crawling code and activated the fuel tracker during the normal query execution, but we still have to turn it on both for history and simple subject crawl queries.

I also made a few ergonomic changes to make the index reading and dataset internal apis easier to use.

We'll eventually need to resolve references in a combined dataset, and not just a single database, so I've isolated this step to allow for refactoring later.

These were never read. We definitely need fuel tracking for this component, but we should hook in to the same fuel tracking system used by queries and transactions. A few more architectural changes have to happen first before that's possible.

We never call resolve-flake-slices without immediately authorizing all the flakes it returns, so this should be done in one step.

SIDs are not unique across databases, so caching without considering the database as well could lead to some nasty bugs

…rawl

This test breakage has nothing to do with the work on this branch and was only coincidentally passing before. I have marked it pending for now so as not to hold up merging this branch until we fix it.

dpetran · 2024-03-29T15:01:49Z

src/clj/fluree/db/query/exec/where.cljc

           async/merge)
-      (match-tuple active-graph fuel-tracker solution pattern filters error-ch))))
+      (match-tuple active-graph fuel-tracker solution pattern filters error-ch))
+    (go)))


What's the deal with this go call? I've not seen something like that before.

Oh, is it just meant to return a channel with nothing in it, because the other if branches return a channel? That makes sense, though maybe you could help me understand when we wouldn't have an active graph.

I just changed it to reuse the same closed channel so as not to create any unnecessary go blocks. not a big deal if we did, but there's no need to in this situation. this is just the equivalent of nil in the async execution environment.

We don't have an active graph in the case where a query specifies a from-named clause but no from clause, and the particular where clause pattern does not specify a named graph. We used to throw an inscrutable error, but it actually just means there's no data to query. We could throw another error during validation to catch this situation, and I went back and forth over whether we should do that, but I think that's outside the scope of this work and doesn't match with how we treat other situations like this: if there's no data to query, we return an empty set.

dpetran · 2024-03-29T15:05:46Z

src/clj/fluree/db/query/json_ld/response.cljc

+          (resolve-references ds cache context compact-fn select-spec current-depth fuel-tracker error-ch)
+          (append-id ds iri select-spec cache compact-fn error-ch)))))
+
+(defn format-subject-flakes


Love the new name and the refactor of this whole namespace 🥇

dpetran

🦀

zonotope added 30 commits March 13, 2024 03:26

check cache first in caching function

8910433

calculate type value in its own function

e40d7e7

add list-element? to determine if a flake is in a list

05a8970

refactor reference formatting and retrieval into its own function

461e9f7

add unwrap-singleton fn to format lists

f839c62

perform nil check as early as possible for spec

944020f

don't pass in depth because it's present in select spec

8f1eef4

delay resolving reference until later

adfed81

We'll eventually need to resolve references in a combined dataset, and not just a single database, so I've isolated this step to allow for refactoring later.

refactor to add independent fns to format properties and objects

e24c4f5

use transducer instead of loop for easier reuse and refactoring

a7ccdcb

add function that returns a sequence of all dbs in a dataset

7664e93

add function that retrieves the property values for a subject iri

c040bb3

remove unnecessary sequence traversal if no flake-xf provided

3c46967

move this function to the only ns where it's used

7d58973

add fn to build cached t-range resolvers from connections

7c04a53

more general flake bound function

17121c2

correct transduce reducing fn

eab053e

make flakes->res take an error channel instead of throwing

50d257e

do dataset checks in dataset functions

5272770

use error channel when resolving references

ab52f6e

dedicated fn to format references

f6970c9

add fn that resolves initial attributes before formatting subject

ce4a83f

remove track-fuel fn because we still always have to check for nil

41d2d1f

don't resolve initial attributes in flake->res

985d913

make format-reference take sid instead of flake

0b613f0

make resolve references fn take a channel

327e0fa

add fuel-tracker to json-ld response formatter

a5975c3

add function to reverse properties in parallel

fa846ff

remove unused function

7f4aacb

zonotope added 15 commits March 20, 2024 17:16

more accurate names for format-subgraph and flakes->res functions

42b5535

cleanup display reference to remove extra async channel

408708b

more consistent names for reverse property functions

86a27aa

use pipe instead of extra go block

a0a6587

store references as iris, not sids so they can be used in datasets

193e54b

filter out unauthorized reverse property flakes

9048d39

authorize flakes directly in resolve-flake-slices

9e77743

We never call resolve-flake-slices without immediately authorizing all the flakes it returns, so this should be done in one step.

combine forward properties across components if dataset provided

b0460ac

combine reverse properties if dataset provided

413a77b

change arg to indicate that reference fns take datasets

ae91222

make append-id take iri over sid so it can be used across datasets

ca066ed

store the db along with the sid when caching iri conversion

1ce4049

SIDs are not unique across databases, so caching without considering the database as well could lead to some nasty bugs

check across the full dataset to see if the id attribute is allowed

aa2ea07

return no results when there's no active graph

a61353a

add subgraph tests for federated queries

e6c63d6

zonotope requested a review from a team March 22, 2024 04:32

zonotope self-assigned this Mar 22, 2024

zonotope added 5 commits March 22, 2024 11:01

add arity using default depth for format-node

259f108

update pre-existing docstring

9c2bcf3

Merge remote-tracking branch 'origin/main' into fix/federated-graph-c…

61dee37

…rawl

fix select-spec argument

50d97b5

mark broken test pending

5c47627

This test breakage has nothing to do with the work on this branch and was only coincidentally passing before. I have marked it pending for now so as not to hold up merging this branch until we fix it.

dpetran reviewed Mar 29, 2024

View reviewed changes

dpetran approved these changes Mar 29, 2024

View reviewed changes

zonotope added 2 commits March 29, 2024 12:27

use pre-defined nil channel instead of a new empty go block

007d86e

remove redundant let binding

de1894b

zonotope merged commit ce981c9 into main Mar 29, 2024

zonotope deleted the fix/federated-graph-crawl branch March 29, 2024 16:52

zonotope mentioned this pull request Apr 8, 2024

Refactor Consensus Protocol fluree/server#48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/federated graph crawl#744

Fix/federated graph crawl#744
zonotope merged 61 commits intomainfrom
fix/federated-graph-crawl

zonotope commented Mar 22, 2024

Uh oh!

dpetran Mar 29, 2024

Uh oh!

dpetran Mar 29, 2024

Uh oh!

zonotope Mar 29, 2024

Uh oh!

dpetran Mar 29, 2024

Uh oh!

dpetran left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zonotope commented Mar 22, 2024

Uh oh!

dpetran Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

dpetran Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

zonotope Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

dpetran Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

dpetran left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants