Skip to content

italia/daf-semantic-triplifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

107 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDF triplifier

[README last update: 2018-07-11] pre-alpha version

This component provides a simple microservice for creating an RDF representation of data from a JDBC connector.

The RDF processor used is Ontop, which implements the W3C standard R2RML language for tabular to RDF conversion.

NOTE (Impala)

the ssl_impala folder should be created under the root folder of the project, and should contain the following files:

├──ssl_impala
│    jssecacerts
│      master-impala.jks
│      master-impala.pem

NOTE (dependencies)

this project uses third-party dependencies, that were included under the local /lib folder (there is currently no public available maven repository for the DAF components).

├──/lib
│    ImpalaJDBC41.jar                   (required for Impala)
│    TCLIServiceClient.jar              (required for Impala)
│    http-api-jersey-0.2.0-SNAPSHOT.jar (uber jar)

stateless endpoint

A simple (stateless) version of endpoint for executing the R2RML mapping process can be used as follow:

curl -X POST 'http://localhost:7777/kb/api/v1/triplify/process' \
	-H "accept: text/plain" \
	-H "content-type: application/x-www-form-urlencoded" \
	--data-binary "config=${config}" \
	--data-binary "r2rml=${r2rml}" \
	-d 'format=text/turtle'

NOTE that this version of the service expects the actual content of the mapping, so when using curl it's best to prepare it using a shell variable such as r2rml=`cat r2rml_file` before launching curl.

The directory /script contains some example, which can be extended.

otherwise we can test the endpoint by using the example page http://localhost:7777/static/r2rml.html:

http_rdf_processor


dataset-oriented endpoint

Another endpoint is provided, which may be useful for calling the process by different datasets, for example from a workflow/pipeline orchestrator. This could be useful when we need to divide a mapping process, creating some different dataset (for example a dataset for each resource type).

The structure of a call is the following:

/kb/api/v1/triplify/datasets/{group}/{dataset_path}.{ext}?cached={T|F}

where the idea is to expose the last created RDF representation for a dataset, unless an explicit cached=false parameter is provided. This way the first time the endpoint is called it will generate the dump, the other one we can choose to re-use data already created. The group parameter is simply a useful way to divide data for testing from the other, while the dataset_path can be used to create subdivisions. The mappings needs to be prepared on disk accordingly, as explained later.

example: creating the RDF for the regions example

curl -X GET 
	http://localhost:7777/kb/api/v1/triplify/datasets/test/territorial-classifications/regions.ttl?cached=true 
	-H  "accept: text/plain" 
	-H  "content-type: application/x-www-form-urlencoded"

each configuration on disk will have a structure similar to the one used for testing on SQLite example database:

├───/data
│   └───test
│       └───territorial-classifications
│           ├───cities
│           │   └─── ...
│           ├───provinces
│           │   └─── ...
│           └───regions
│           	├───regions.conf
│               ├───regions.metadata.ttl
│               └───regions.r2rml.ttl
...

TODO: + add config example + add R2RML example + add metadata example/ explaination


SEE ALSO: daf-semantics project


running locally

mvn clean package

# win
java -cp "target/triplifier-0.0.5.jar;target/libs/*" triplifier.main.MainHTTPTriplifier

# *nix
java -cp "target/triplifier-0.0.5.jar:target/libs/*" triplifier.main.MainHTTPTriplifier

TODO

  • update DockerFile
  • merge of external manual swagger definitions
  • fix swagger problems with multi-lines: try updating to version 3+
  • add an internal interlinking strategy, using silk/duke

Ideally we could imagine having some specific microservices:

  • one for handling merging of RDF, and direct publication
  • one for creating relations between the current datasource and an external target, using silk or duke

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages