Skip to content

SPARQL Query Results JSON format#973

Merged
dpetran merged 36 commits intomainfrom
feature/sparql-output
Mar 13, 2025
Merged

SPARQL Query Results JSON format#973
dpetran merged 36 commits intomainfrom
feature/sparql-output

Conversation

@dpetran
Copy link
Contributor

@dpetran dpetran commented Feb 19, 2025

This adds the ability to produce query results in SPARQL 1.1 Query Results JSON Format.

https://github.com/fluree/core/issues/170

This introduces the :output query option. If it is not supplied it defaults to :fql, which is what we currently return. Supplying :sparql provides the output like this:

{
  "head" : {
    "vars" : [ "favNum", "handle" ]
  },
  "results" : {
    "bindings" : [ {
      "handle" : {
        "value" : "bbob",
        "type" : "literal"
      },
      "favNum" : {
        "value" : "23",
        "type" : "literal",
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer"
      }
    }, {
      "handle" : {
        "value" : "jdoe",
        "type" : "literal"
      },
      "favNum" : {
        "value" : "42",
        "type" : "literal",
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer"
      }
    }, {
      "handle" : {
        "value" : "jdoe",
        "type" : "literal"
      },
      "favNum" : {
        "value" : "99",
        "type" : "literal",
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer"
      }
    } ]
  }
}

Note that it retains the datatype and language metadata about values, as well as distinguishes IRIs from strings. It's also a tabular format, so aggregates are "unnested" and flattened.

@dpetran
Copy link
Contributor Author

dpetran commented Feb 19, 2025

There's still room for further cleanup, especially around the select/display abstraction, but I want to leave it as-is while I tackle CONSTRUCT support, which will introduce a new case to display results as json-ld. I consider this a working but still immature abstraction.

@dpetran dpetran requested a review from a team February 19, 2025 16:51
"Format a where-pattern match for presentation based on the match's datatype.
Return an async channel that will eventually contain the formatted match."
(fn [match _compact]
(fn [match output-format _compact]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new output-format binding exists only to provide runtime polymorphism. This implementation also litters this output-format argument throughout the call stack forcing already working existing code to be modified to support this new feature.

We should use the built in language facilities to support run time polymorphism instead of rolling our own. Either new implementations of existing protocols/multi-methods, or a new protocol or multimethod would be a better fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current plan is to change the where/match into a record and implement a display method for it. I've got it work-in-progress for construct support, but didn't want to block Nexus and CAM work while I was still figuring out the proper abstraction.

@dpetran dpetran force-pushed the feature/sparql-output branch from b15d51b to 511e200 Compare February 28, 2025 18:43
@dpetran
Copy link
Contributor Author

dpetran commented Feb 28, 2025

Ok, I've overhauled the approach here. Instead of checking the output type each time we format a result, we now just check during query parsing and add the correct implementation of format-value for the output type.

@dpetran dpetran force-pushed the feature/sparql-output branch from f74b898 to 4b1fd82 Compare March 10, 2025 15:52
@dpetran
Copy link
Contributor Author

dpetran commented Mar 10, 2025

Rebased on main

@dpetran dpetran force-pushed the feature/sparql-output branch from 4b1fd82 to 1193d39 Compare March 12, 2025 15:26
@dpetran
Copy link
Contributor Author

dpetran commented Mar 12, 2025

Rebased on main.

@@ -35,6 +35,7 @@
;; ensure :max-fuel key is present
(-> (assoc opts :max-fuel max-fuel)
(merge opts override-opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though not explicitly part of this pr, there is an extra opts on line 36. I think line 36 and 37 should be replaced with:

(-> opts
    (assoc :max-fuel max-fuel)
    (merge override-opts)
    ...


(defn wrap-sparql-results
[results]
(let [format (fn [results] {"head" {"vars" (vec (sort (keys (first results))))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this anonymous function defined at the top level? it doesn't look like it closes over anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to keep the function definition local. This entire function is obsolete, in the CONSTRUCT PR (#985) I move all the wrapping into the collect-results phase of query execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge that PR into this one so the whole chain is visible.

_ (log/debug "query-connection SPARQL fql: " fql "override-opts:" override-opts)
results (<? (query-connection-fql conn fql override-opts))]
(if (= (:output override-opts) :sparql)
(wrap-sparql-results results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to disallow someone from setting the output format to sparql for an fql query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and this is made possible in the CONSTRUCT PR by moving the wrapping into the collect-results phase of query execution.


(defmethod display/fql ::grouping
[match compact]
((display-aggregate display/fql) match compact))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code would be easier to understand if these functions derived from apply display-aggregate were pre-defined at the top level.

(some-> match where/get-value vec))

(defprotocol ValueSelector
:extend-via-metadata true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this namespace is convoluted by relying on this, and I don't see any benefit.

I think instead, there should be two additional namespaces: fluree.db.query.exec.select.fql, and fluree.db.query.exec.select.sparql for the different output formats, each with their own implementations of the ValueSelector protocol that don't reference each other.

Then, during parse time, you can read the output format once and decide which ValueSelector records to instantiate based on the select clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose :extend-via-metadata because the other protocols the records implement (ValueAdapter, SolutionModifier) aren't affected by the display and if I create separate Selector records for FQL vs SPARQL I'd have to move the common implementations to a third, common namespace or duplicate them, and on the balance it seemed that :extend-via-metadata was the simpler choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'm worried about the performance implications of extending via metadata as well as the semantic implications of having opaque functions hanging off of records.

This pr has been held open for long enough so I'll let this through and we can address these concerns later if they prove to be a problem.

[fluree.db.query.exec.where :as where]
[fluree.db.util.json :as json]))

(defmulti fql (fn [match compact] (where/get-datatype-iri match)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be split. Instead of this new namespace with these two multimethods, we should instead put the display/fql multimethod in fluree.db.query.exec.select.fql and the sparql multimethod in the sparql ns.

Essentially, we should take everything except for the protocol definitions from fluree.db.exec.select.fql and move them to a new namespace fluree.db.query.exec.select.fql, and then define a parallel fluree.db.query.exec.select.sparql namespace with the sparql functionality.

variable is a string, symbol, or keyword whose name starts with '?'."
[x]
(when (v/variable? x)
(when (or (v/variable? x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to define a more general function for detecting variables because you repeat this or call a lot. It might even be useful to just change the definition of the variable? function to include that of bnode-variable? if the current variable? function is never called alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually need to distinguish between the two types - in a transaction you can have bnodes and they are not variables, they are "fresh" identifiers that will be mapped to new, internal identifiers. So it's only inside of a query where pattern that we need these or statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I think it makes sense to have a general function for detecting variables in queries to remove the or repetition here.

I see this branch is already merged, so we can address this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a follow-up PR to address it:
#989

(solution-value
[_ _ solution]
solution))
(solution-value [_ _ solution] solution))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the argument list and function definitions should both be on new lines.

ValueSelector
(format-value
[_ _ _ _ _ _ error-ch solution]
[_ _ _ _ compact _ error-ch solution]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like compact is used.

(some-> match where/get-value vec))

(defprotocol ValueSelector
:extend-via-metadata true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'm worried about the performance implications of extending via metadata as well as the semantic implications of having opaque functions hanging off of records.

This pr has been held open for long enough so I'll let this through and we can address these concerns later if they prove to be a problem.

@dpetran dpetran merged commit e345dd9 into main Mar 13, 2025
6 checks passed
@dpetran dpetran deleted the feature/sparql-output branch March 13, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants