Skip to content

kgsteward

Configuration

sib-swiss/kgsteward

^{back to TOC}

YAML syntax of kgsteward config file (version 2)

Preambule

YAML 1.1 syntax is supported.
A single YAML extension is supported: !include <filename>. This directive will insert in place the content of filename. The path of <filename> is interpreted with the directory of the parent YAML file as default directory. This inclusion mechanism is executed early, before the YAML configuration is validated.
Within the YAM config file(s), UNIX environment variables can by referred to using ${...} syntax. Evaluation of these is performed late, i.e. at the time of command execution. Hence ${...} syntax cannot be used in !include directive. The use of UNIX environment variables is recommended to ensure portability of the YAML config file(s). These variable are usually encoded with uppercase strings.
In addition to UNIX environment variables, kgsteward creates temporary variables reflecting the content of the YAML config file. For example ${kgsteward_server_brand} contains the … server brand, e.g. graphdb. The most useful of these variables is certainly ${kgsteward_dataset_context"} that contains the IRI of the current target context. These variable are encoded with lowercase strings.
The terminology adopted here is a compromise. Different server brands utilise different namings for the same concept. For example, ‘context’ in RDF4J/GraphDB terminology is the same as ‘named graph’ in RDF/SPARQL terminology. In this respect, kgsteward utilises ‘context’, because of the too many usages of ‘graph’.

YAML syntax

The entry point (top level keys) is KGStewardConf.

KGStewardConf

Top level YAML keys

Type: `object`

Property	Type	Required	Possible values	Description
version	`const`	✅	`kgsteward_yaml_2`	This mandatory fixed value determines the admissible YAML syntax
server	`object`	✅	FusekiConf or GraphDBConf or RDF4JConf
file_loader	`object`	✅	HttpServerFileLoader and/or RiotChunkStoreFileLoader and/or SparqlFileLoader and/or StoreFileLoader
url_loader	`object`	✅	CurlRiotChunkStoreUrlLoader and/or SparqlUrlLoader
dataset	`array`	✅	DatasetConf	Mandatory key to specify the content of the knowledge graph in the triplestore.
context_base_IRI	`string`	✅	string	Base IRI to construct the graph context. In doubt, give `http://example.org/context/` a try.
queries	`array` or `null`	✅	QueryConf	Structured list of SPARQL queries.
validations	`array` or `null`		string	A list of paths to files contining SPARQL queries used to validate the repository. Wildcards `*` can be used. By convention, a valid result should be empty, i.e. no row is returned. Failed results should return rows permitting to diagnose the problems.

Definitions

CurlRiotChunkStoreUrlLoader

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Default	Description
method	`const`	✅	`curl_riot_chunk_store`		URL are downloaded using curl to a temporary file, which is then loaded with `riot_chunk_store` method.
tmp_dir	`string` or `null`		string	`"/tmp"`	temporary directory
size	`integer` or `null`		integer	`100000000`	chunk size

DatasetConf

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
name	`string`	✅	`^[a-zA-Z]\w{0,31}$`	Mandatory name of a dataset record.
context	`string` or `null`		string	The IRI of the target context. If missing, it will be built by concataining `context_base_IRI` and `name`.
parent	`array` or `null`		string	A list of dataset names to declare dependency between dataset records. Updating the parent datset will provoke the update of its children, unless it is frozen.
frozen	`boolean` or `null`		boolean	Frozen record, can only be updated explicitely with the `-d <name>` option. The option `-C` has no effect
system	`array` or `null`		string	A list of system command. This is a simple convenience provided by kgsteward, which is not meant to be a replacement for serious Make-like system as for example git/dvc.
file	`array` or `null`		string	List of files containing RDF data. Wildcards `*` can be used. The strategy used to load these files will depends on if a file server is used (see `file_server_port` option`). With GraphDB, there might be a maximum file size (200 MB by default (?)) and compressed files may not be supported. Using a file server, these limitations are overcome, but see the security warning described above.
url	`array` or `null`		string	List of url from which to load RDF data
stamp	`array` or `null`		string	List of file paths or URLs to which last modification dates will used. The file contents are ignored. Wildcards `*` can be used.
replace	`object` or `null`		object	Dictionary to perform string substitution in SPARQL queries from `update` list. Of uttermost interest is the `${TARGET_GRAPH_CONTEXT}` which permit to restrict updates to the current context.
update	`array` or `null`		string	List of files containing SPARQL update commands. Wildcards are not recommended here, as the order of the SPARQL updates possibly matters!
zenodo	`array` or `null`		integer	Do not use! Fetch turtle files from zenodo. This is a completely ad hoc command developed for ENPKG, that will be suppressed sooner or later.
special	`array` or `null`		SpecialEnum	A list of special dataset records. Supported values are “sib_swiss_void”.

FusekiConf

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
brand	`const`	✅	`fuseki`	String identifying the server brand. One of ‘graphdb’, ‘rdf4j’, ‘fuseki’
location	`string`	✅	string	URL of the server. The SPARQL endpoint locations for queries, updates and stores are specific to a server brand. Fuseki has location ‘http://localhost:3030’ by default
repository	`string`	✅	`^\w{1,32}$`	The name of the ‘repository’ (GraphDB/RDF4J naming) or ‘dataset’ (fuseki) in the triplestore.
file_server_port	`integer` or `null`		integer	Integer, `0` by default, i.e. the file server is turned off. When set to a positive integer, say `8000`, local files will be exposed through a temporary HTTP server and loaded from it. Support for different RDF file types and their compressed version depend on the tripelstore. The benefit is the that RDF data from `file` are processed with the same protocol as those supplied remotely through `url`. Essentially for GraphDB, file-size limits are suppressed and compressed formats are supported. Beware that the used python-based server is potentially insecure (see here for details). This should however pose no real treat if used on a personal computer or on a server that is behind a firewall.

GraphDBConf

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
brand	`const`	✅	`graphdb`	String identifying the server brand. One of ‘graphdb’, ‘rdf4j’, ‘fuseki’
location	`string`	✅	string	URL of the server. The SPARQL endpoint locations for queries, updates and stores are specific to a server brand. GraphDB has location ‘http://localhost:7200’ by default
server_config	`string`	✅	string	Filename with the triplestore configuration, possibly a turtle file. This file can be saved from the UI interface of RDF4J/GraphDB after a first repository was created interactively, thus permitting to reproduce the repository configuration elsewhere. This file is used by the `-I` and `-F` options. Beware that the repository ID could be hard-coded in the config file and should be maintained in sync with `repository`.
repository	`string`	✅	`^\w{1,32}$`	The name of the ‘repository’ (GraphDB/RDF4J naming) or ‘dataset’ (fuseki) in the triplestore.
file_server_port	`integer` or `null`		integer	Integer, `0` by default, i.e. the file server is turned off. When set to a positive integer, say `8000`, local files will be exposed through a temporary HTTP server and loaded from it. Support for different RDF file types and their compressed version depend on the tripelstore. The benefit is the that RDF data from `file` are processed with the same protocol as those supplied remotely through `url`. Essentially for GraphDB, file-size limits are suppressed and compressed formats are supported. Beware that the used python-based server is potentially insecure (see here for details). This should however pose no real treat if used on a personal computer or on a server that is behind a firewall.
username	`string` or `null`		string	The name of a user with write-access rights in the triplestore.
password	`string` or `null`		string	The password of a user with write-access rights to the triplestore. It is recommended that the value of this variable is passed trough an environment variable. By this way the password is not stored explicitely in the config file. Alternatively `?` can be used and the password will be asked interactively at run time.
prefixes	`array` or `null`		string	A list of Turtle files from which prefix definitions can be obtained. This list will used to update the namespace definitions in GraphDB and RDF4J. Otherwise it is ignored

HttpServerFileLoader

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Default	Description
method	`const`	✅	`http_server`		Files are exposed through a temporary HTTP server. This is the recommended method with GraphDB.
port	`integer` or `null`		integer	`8000`	Integer, `0` by default, i.e. the file server is turned off. When set to a positive integer, say `8000`, local files will be exposed through a temporary HTTP server and loaded from it. Support for different RDF file types and their compressed version depend on the tripelstore. The benefit is the that RDF data from `file` are processed with the same protocol as those supplied remotely through `url`. Essentially for GraphDB, file-size limits are suppressed and compressed formats are supported. Beware that the used python-based server is potentially insecure (see here for details). This should however pose no real treat if used on a personal computer or on a server that is behind a firewall.

QueryConf

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Default	Description
name	`string`	✅	`^[a-zA-Z]\w{0,31}$`		Mandatory name of a set queries
system	`array` or `null`		string		A list of system command. This is a simple convenience provided by kgsteward, which is not meant to be a replacement for serious Make-like system as for example git/dvc.
test	`object` or `null`		TestConf		assert nothing/something
public	`boolean` or `null`		boolean	`true`	no description
file	`array` or `null`		string		List of files containing one SPARQL query each. Wildcards `*` can be used, and implied file names will be sorted alphabetically. The file name of each file is interpreted as the query label. In each file, lines starting with “#” are considered as the query documentation (comment)

RDF4JConf

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
brand	`const`	✅	`rdf4j`	String identifying the server brand. One of ‘graphdb’, ‘rdf4j’, ‘fuseki’
location	`string`	✅	string	URL of the server. The SPARQL endpoint locations for queries, updates and stores are specific to a server brand. RDF4J has location ‘http://localhost:8080’ by default
repository	`string`	✅	`^\w{1,32}$`	The name of the ‘repository’ (GraphDB/RDF4J naming) or ‘dataset’ (fuseki) in the triplestore.
file_server_port	`integer` or `null`		integer	Integer, `0` by default, i.e. the file server is turned off. When set to a positive integer, say `8000`, local files will be exposed through a temporary HTTP server and loaded from it. Support for different RDF file types and their compressed version depend on the tripelstore. The benefit is the that RDF data from `file` are processed with the same protocol as those supplied remotely through `url`. Essentially for GraphDB, file-size limits are suppressed and compressed formats are supported. Beware that the used python-based server is potentially insecure (see here for details). This should however pose no real treat if used on a personal computer or on a server that is behind a firewall.

RiotChunkStoreFileLoader

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Default	Description
method	`const`	✅	`riot_chunk_store`		riot_chunk_store
size	`integer` or `null`		integer	`100000000`	chunk size

SparqlFileLoader

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
method	`const`	✅	`sparql_load`	Files are loaded using the SPARQL update statement: “LOAD <file:// INTO…”. This strategy is likely to failed for large files, or worst silently truncate them.

SparqlUrlLoader

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
method	`const`	✅	`sparql_load`	URL are loaded using the SPARQL update statement: “LOAD INTO…”. This strategy could fail for large files, or worst silently truncate them.

SpecialEnum

No description provided for this model.

Type: `string`

Possible Values: sib_swiss_void or sib_swiss_prefix or sib_swiss_query

StoreFileLoader

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
method	`const`	✅	`file_store`	Files are loaded using the graph store protocol. This strategy is likely to failed for large files, or worst silently truncate them.

TestConf

No description provided for this model.

Type: `object`

Property	Type	Required	Possible values	Description
min_row_count	`integer` or `null`		integer
max_row_count	`integer` or `null`		integer

Markdown generated with jsonschema-markdown.

^{back to TOC}