back to TOC
User guide
Practical advice for running real kgsteward projects. For the internal
mechanics of each backend, see triplestore drivers;
this page is about how to work effectively, not how the drivers work.
Maturity level
kgsteward as a client for GraphDB was developed over several years to gather
and manage experimental data from chemistry (LC-MS/MS) and biology
(bio-activities), together with reference chemical structures derived from
public databases (LOTUS, Wikidata). This was published in A Sample-Centric and
Knowledge-Driven Computational Framework for Natural Products Drug
Discovery. Several other complex
research projects whose RDF data are managed by kgsteward are currently
ongoing, including collaborations with the industry.
The Fuseki, RDF4J and qlever drivers came later and are less battle-tested than the GraphDB one.
Portability across stores
In a perfect world, if the content of an RDF repository is specified strictly following W3C recommendations, one could expect to obtain the same resource whatever the store brand, although the performances may differ. In the real world it is a little more complicated. The examples in first steps yield exactly the same results, but on very small datasets.
Comparing store contents (debugging)
To check whether two backends really hold the same data, dump each store to
sorted TSV and diff them:
kgsteward config.yaml --dump_all_dataset --dump_dir dumpA # run against backend A
kgsteward config.yaml --dump_all_dataset --dump_dir dumpB # run against backend B
diff -r dumpA dumpB
--dump_all_dataset writes one sorted <dataset>.tsv per named graph;
--dump_all_select does the same for every configured SELECT query (--dump_dataset
/ --dump_select restrict to named ones). Sorting is enforced so triple/row order
never shows up as a spurious diff, and every cell is escaped so embedded tabs or
newlines inside a literal cannot misalign the grid.
This is strictly a debugging / comparison aid — the TSV is meant to be read and diffed, not to retrieve RDF for further processing. It intentionally drops datatype and language tags so that trivial serialization differences between stores do not register as diffs, which also means it is not a faithful RDF serialization. To move actual RDF between stores use the normal load path, or — for qlever — the checkpoint / quad-dump mechanisms described in triplestore drivers.
Develop on GraphDB, deploy where you need
A practical workflow that has worked well: author and debug a project against GraphDB, then point the same YAML at the production backend.
The reason is the ingestion model (detailed in triplestore
drivers). On a live backend like GraphDB each update:
is sent and persisted immediately, so when you iterate you see every SPARQL
statement run — and fail — at its own position, which makes locating an error in
a long update straightforward (especially with -v). qlever, by contrast,
stages files and queues updates for a deferred index rebuild, so the
develop-fix-rerun loop is heavier. GraphDB free edition is also extremely
robust, well aligned with the W3C RDF/SPARQL specifications, and trouble-free
across software updates — which is why it remains the recommended backend for
the authoring phase.
Other servers
Many stores were de facto excluded because they do not support SPARQL update
and/or named graphs (a.k.a. contexts), both of which kgsteward relies on.
back to TOC