^{back to TOC}

User guide

Practical advice for running real kgsteward projects. For the internal mechanics of each backend, see triplestore drivers; this page is about how to work effectively, not how the drivers work.

Maturity level

kgsteward as a client for GraphDB was developed over several years to gather and manage experimental data from chemistry (LC-MS/MS) and biology (bio-activities), together with reference chemical structures derived from public databases (LOTUS, Wikidata). This was published in A Sample-Centric and Knowledge-Driven Computational Framework for Natural Products Drug Discovery. Several other complex research projects whose RDF data are managed by kgsteward are currently ongoing, including collaborations with the industry.

The Fuseki, RDF4J and qlever drivers came later and are less battle-tested than the GraphDB one.

Portability across stores

In a perfect world, if the content of an RDF repository is specified strictly following W3C recommendations, one could expect to obtain the same resource whatever the store brand, although the performances may differ. In the real world it is a little more complicated. The examples in first steps yield exactly the same results, but on very small datasets.

Comparing store contents (debugging)

To check whether two backends really hold the same data, dump each store to sorted TSV and diff them:

kgsteward config.yaml --dump_all_dataset --dump_dir dumpA   # run against backend A
kgsteward config.yaml --dump_all_dataset --dump_dir dumpB   # run against backend B
diff -r dumpA dumpB

--dump_all_dataset writes one sorted <dataset>.tsv per named graph; --dump_all_select does the same for every configured SELECT query (--dump_dataset / --dump_select restrict to named ones). Sorting is enforced so triple/row order never shows up as a spurious diff, and every cell is escaped so embedded tabs or newlines inside a literal cannot misalign the grid.

This is strictly a debugging / comparison aid — the TSV is meant to be read and diffed, not to retrieve RDF for further processing. It intentionally drops datatype and language tags so that trivial serialization differences between stores do not register as diffs, which also means it is not a faithful RDF serialization. To move actual RDF between stores use the normal load path, or — for qlever — the checkpoint / quad-dump mechanisms described in triplestore drivers.

Develop on GraphDB, deploy where you need

A practical workflow that has worked well: author and debug a project against GraphDB, then point the same YAML at the production backend.

The reason is the ingestion model (detailed in triplestore drivers). On a live backend like GraphDB each update: is sent and persisted immediately, so when you iterate you see every SPARQL statement run — and fail — at its own position, which makes locating an error in a long update straightforward (especially with -v). qlever, by contrast, stages files and queues updates for a deferred index rebuild, so the develop-fix-rerun loop is heavier. GraphDB free edition is also extremely robust, well aligned with the W3C RDF/SPARQL specifications, and trouble-free across software updates — which is why it remains the recommended backend for the authoring phase.

Other servers

Many stores were de facto excluded because they do not support SPARQL update and/or named graphs (a.k.a. contexts), both of which kgsteward relies on.

^{back to TOC}