Skip to content

GBA1 ‘Lysosomal acid glucosylceramidase’

Learning outcomes

After having completed this chapter, you will be able to:

  • Understand the structure of a simple ‘biological’ database encoded in Turtle
  • Write simple SELECT SPARQL queries

Material

The exercises below follow the same structure as the music example, with a focus on the GBA1 gene involved in the Gaucher disease.

GBA1 graph exploration

Take some time to explore the GBA1 protein graph.

It has a lot of classes connecting the different knowledges curated and aggregated by the UniProt team.

Class hierarchy in the GBA1 graph

Class relationships in the GBA1 graph

GBA1 graph SPARQL queries

We will do some exercises on the GBA1 graph, with a focus on the reactions catalyzed by the enzyme GBA1.

DESCRIBE

You have seen during the graph exploration that some properties can be cryptic, their name is not always meaningful.

The DESCRIBE command is for you! As its name suggests, DESCRIBE provides a useful fragment of RDF, such as all the known details for each URI found.

Describe_up:catalyzedReaction.sparql
# Describe the up:catalyzedReaction property

PREFIX up: <http://purl.uniprot.org/core/>
DESCRIBE up:catalyzedReaction

Exercise:

  • Where is the up:catalyzedReaction property found in the graph?
  • In which predicates is it involved?
Answer
     subject                 predicate             object
1    up:catalyzedReaction    rdf:type              rdf:Property
2    up:catalyzedReaction    rdfs:subPropertyOf    up:catalyzedReaction
3    up:catalyzedReaction    rdfs:subPropertyOf    up:catalyzedReaction


SELECT

What is the GBA1 protein name?

The predicate qualifying the protein name is up:fullName.

Exercise: Use this predicate to find the GBA1 protein (full) name.

A SELECT manual can be found here.

Answer

Protein_name.sparql
# Retrieve the protein name associated with the P04062 UniProt entry

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein_name WHERE {
    ?s up:fullName ?protein_name .
}
####################################################################

      protein_name
1    "Lysosomal acid glucosylceramidase"

There are in fact several names, with time this protein has accumulated a lot of aliases.


SELECT and ORDER

Which reactions are catalyzed by this enzyme?

Exercise: Using the up:catalyzedReaction predicate, get the reactions catalyzed by GBA1.

Answer

This_enz_catalyses.sparql
# Which reactions are catalyzed by this enzyme?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?reactions WHERE {
    ?s up:catalyzedReaction ?reactions .
}
##########################################

      reactions
1     http://rdf.rhea-db.org/13269
2     http://rdf.rhea-db.org/14297
3     http://rdf.rhea-db.org/11956
4     http://rdf.rhea-db.org/58264
5     http://rdf.rhea-db.org/58324
6     http://rdf.rhea-db.org/58316
7     http://rdf.rhea-db.org/70303
8     http://rdf.rhea-db.org/70307
9     http://rdf.rhea-db.org/70311
10    http://rdf.rhea-db.org/70315
11    http://rdf.rhea-db.org/70235
12    http://rdf.rhea-db.org/70255
13    http://rdf.rhea-db.org/70239
14    http://rdf.rhea-db.org/70251

All GBA1 catalyzed reactions are reactions in Rhea.

Exercise: Order this list by descending Rhea ids

Answer

# Which reactions are catalyzed by this enzyme?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?reactions WHERE {
    ?s up:catalyzedReaction ?reactions .
}
ORDER BY DESC(?reactions)
##########################################

      reactions
1     http://rdf.rhea-db.org/70315
2     http://rdf.rhea-db.org/70311
3     http://rdf.rhea-db.org/70307
4     http://rdf.rhea-db.org/70303
5     http://rdf.rhea-db.org/70255
6     http://rdf.rhea-db.org/70251
7     http://rdf.rhea-db.org/70239
8     http://rdf.rhea-db.org/70235
9     http://rdf.rhea-db.org/58324
10    http://rdf.rhea-db.org/58316
11    http://rdf.rhea-db.org/58264
12    http://rdf.rhea-db.org/14297
13    http://rdf.rhea-db.org/13269
14    http://rdf.rhea-db.org/11956


SELECT (with multiple triples)

What are the reactions associated with an EC number?

The GBA1 graph contains also an enzyme class (up:enzymeClass predicate).

Exercise: Get the GBA1 Rhea reactions associated with an EC number

Answer

# What are Rhea reactions associated with an EC number?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?rhea ?EC WHERE {
    ?CatalyticActivity  up:catalyzedReaction   ?rhea .
    ?CatalyticActivity  up:enzymeClass         ?EC .
}
##########################################

     rhea                            EC
1    http://rdf.rhea-db.org/13269    enzyme:3.2.1.45
2    http://rdf.rhea-db.org/14297    enzyme:3.2.1.46

The two triples use the same subject named ?CatalyticActivity in this example.

The query can be simplified with the ; punctuation sign.

Exercise: Simplify the previous query with ;

Answer

# What are Rhea reactions associated with an EC number?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?rhea ?EC WHERE {
    ?CatalyticActivity  up:catalyzedReaction   ?rhea ;
                        up:enzymeClass         ?EC .
}
##########################################

     rhea                            EC
1    http://rdf.rhea-db.org/13269    enzyme:3.2.1.45
2    http://rdf.rhea-db.org/14297    enzyme:3.2.1.46


SELECT and OPTIONAL

What are the reactions associated with an EC number, and those which are not?

We have seen previously that GBA1 catalyzes 14 reactions. All of them are linked to Rhea, but not all of them are linked to an EC number.

Exercise: Get all the GBA1 Rhea reactions associated with an EC number or not

A OPTIONAL manual can be found here.

Answer

# What are reactions associated or not with an EC number?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?reaction ?EC  WHERE {
        ?CatalyticActivity  up:catalyzedReaction  ?reaction .
    OPTIONAL {
        ?CatalyticActivity  up:enzymeClass        ?EC .
    }
}
##########################################

     reaction                        EC
1    http://rdf.rhea-db.org/13269    enzyme:3.2.1.45
2    http://rdf.rhea-db.org/14297    enzyme:3.2.1.46
3    http://rdf.rhea-db.org/11956    
4    http://rdf.rhea-db.org/58264    
5    http://rdf.rhea-db.org/58324    
6    http://rdf.rhea-db.org/58316    
7    http://rdf.rhea-db.org/70303    
8    http://rdf.rhea-db.org/70307    
9    http://rdf.rhea-db.org/70311    
10   http://rdf.rhea-db.org/70315    
11   http://rdf.rhea-db.org/70235    
12   http://rdf.rhea-db.org/70255    
13   http://rdf.rhea-db.org/70239    
14   http://rdf.rhea-db.org/70251    


SELECT and FILTER

The results of the previous query are URI. You can see that by clicking on the Raw response button in the GraphDB result section.

To filter on them, and apply comparison operators you have seen in the music example, you have to turn them (cast them) in a category easier to work on.

You can stringify a URI/IRI with the STR function (A STR manual can be found here).

E.g. STR(?reaction)

Exercise: From the previous SPARQL query, filter them to get only reactions upper than “http://rdf.rhea-db.org/13269”

Answer

# What are reactions associated or not with an EC number, only with reactions upper than "http://rdf.rhea-db.org/13269"?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?reaction ?EC  WHERE {
        ?CatalyticActivity  up:catalyzedReaction  ?reaction .
    OPTIONAL {
        ?CatalyticActivity  up:enzymeClass        ?EC .
    }
    FILTER( STR(?reaction) > "http://rdf.rhea-db.org/13269" )
}
##########################################

     reaction                        EC
1    http://rdf.rhea-db.org/14297    enzyme:3.2.1.46
2    http://rdf.rhea-db.org/58264
3    http://rdf.rhea-db.org/58324
4    http://rdf.rhea-db.org/58316
5    http://rdf.rhea-db.org/70303
6    http://rdf.rhea-db.org/70307
7    http://rdf.rhea-db.org/70311
8    http://rdf.rhea-db.org/70315
9    http://rdf.rhea-db.org/70235
10   http://rdf.rhea-db.org/70255
11   http://rdf.rhea-db.org/70239
12   http://rdf.rhea-db.org/70251

The FILTER function is very powerful. It can be combined with the REGEX() function to do almost everything you can think of (REGEX manual).


SELECT and BIND

The STR() cast can be assigned in a new variable.

Exercise: Use the BIND function to do it

A BIND manual can be found here.

Answer
# What are reactions associated or not with an EC number, only with reactions upper than "http://rdf.rhea-db.org/13269"?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?reaction ?EC  WHERE {
        ?CatalyticActivity  up:catalyzedReaction  ?reaction .
    OPTIONAL {
        ?CatalyticActivity  up:enzymeClass        ?EC .
    }
    BIND ( STR(?reaction) AS ?reac_string )
    FILTER( ?reac_string > "http://rdf.rhea-db.org/13269" )
}

EC numbers are easily identifiable. We don’t really need the enzyme: prefix.

The REPLACE function is here for that (REPLACE manual). It replaces all occurences of a pattern by another pattern.

Exercise: Remove the enzyme: prefix, i.e. replace it by nothing, in a BIND function.

Think to stringify ?EC first.

Answer

# What are reactions associated or not with an EC number, only with reactions upper than "http://rdf.rhea-db.org/13269"?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?reaction ?ec  WHERE {
        ?CatalyticActivity  up:catalyzedReaction  ?reaction .
    OPTIONAL {
        ?CatalyticActivity  up:enzymeClass        ?EC .
    }
    BIND ( STR(?reaction) AS ?reac_string )
    BIND ( REPLACE( STR(?EC), "enzyme:", "" ) AS ?ec )
    FILTER( ?reac_string > "http://rdf.rhea-db.org/13269" )
}
##########################################

     reaction                        EC
1    http://rdf.rhea-db.org/14297    "http://purl.uniprot.org/enzyme/3.2.1.46"

The stringification transforms ?EC in its prefixed literal form "http://purl.uniprot.org/enzyme/3.2.1.46".

The right REPLACE pattern to apply is

Answer

BIND ( REPLACE( STR(?EC), "http://purl.uniprot.org/enzyme/", "" ) AS ?ec )
# for regex lovers
BIND ( REPLACE( STR(?EC), "^.*enzyme/", "" ) AS ?ec )
##########################################

     reaction                        EC
1    http://rdf.rhea-db.org/14297    "3.2.1.46"


SELECT and aggregation

Go back to the SELECT and OPTIONAL query.

Exercise: We want now to COUNT how many reactions are found by this SPARQL query.

A COUNT manual can be found here or here.

Answer

# How many reactions associated or not with an EC number?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT (COUNT(?reaction) AS ?count) WHERE {
        ?CatalyticActivity  up:catalyzedReaction  ?reaction .
    OPTIONAL {
        ?CatalyticActivity  up:enzymeClass        ?EC .
    }
}
##########################################

     count
1    "14"^^xsd:integer

You can notice that the returned result has the right type for a number i.e. xsd:integer.


SELECT and GROUP BY

Exercise: We want now to COUNT per EC number, i.e to know how many time each EC number is found.

A GROUP BY manual can be found here.

Answer

# How many reactions associated with each EC number?

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?EC (COUNT(?reaction) AS ?count) WHERE {
        ?CatalyticActivity  up:catalyzedReaction  ?reaction .
    OPTIONAL {
        ?CatalyticActivity  up:enzymeClass        ?EC .
    }
}
GROUP BY ?EC
##########################################

     EC                 count
1    enzyme:3.2.1.45    "1"^^xsd:integer
2    enzyme:3.2.1.46    "1"^^xsd:integer
3                       "12"^^xsd:integer


Property paths

Property paths are the way two items are connected. The simplest path is just a single property, which forms an ordinary triple:

?item  path  ?property
?item  --->  ?property

If items are not directly connected, their paths are longer. You can add path elements with a forward slash (/).

?item path1/path2/path3 ?property

This is equivalent to either of the following:

?item   path1  ?temp1 .
?temp1  path2  ?temp2 .
?temp2  path3  ?property .

or

?item path1 [ path2 [ path3 ?property ] ] .

A property paths manual can be found here.

Path view

?protein                        up:annotation         ?catalytic_activity_annotation .
?catalytic_activity_annotation  up:catalyticActivity  ?activity .
?activity                       up:catalyzedReaction  ?rhea .

Exercise: Using property paths, simplify the query above:

Answer

# Use property paths to simplify the previous query

PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein ?rhea WHERE {
    ?protein up:annotation/up:catalyticActivity/up:catalyzedReaction ?rhea .
}
##########################################

      protein                                   rhea
1     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/13269
2     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/14297
3     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/11956
4     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/58264
5     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/58324
6     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/58316
7     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70303
8     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70307
9     http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70311
10    http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70315
11    http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70235
12    http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70255
13    http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70239
14    http://purl.uniprot.org/uniprot/P04062    http://rdf.rhea-db.org/70251

Note we follow the directions shown as arrows on the graph picture, from the light green P04062, to the dark yellow SIPEEA5CAFFB8CFF4D9, to the light blue SIP7F4F633380447C8F, then to the red 13269.

Inverse path

To go in the other direction, i.e. in the opposite direction the arrows go, we have to use inverse path.

Adding the symbol ^ in front of a predicate (or a property path expression) makes it an inverse path expression. An inverse path expression simply flips the direction of the match: the subject of the triple pattern will match the object of the triple in the data, and the object of the triple pattern will match the subject.

Exercise: Write a query to display EC numbers and associated Rhea (from red 13269 to red 3.2.1.45 in the graph picture above).

Answer
PREFIX up: <http://purl.uniprot.org/core/>
SELECT * WHERE {
    ?rhea ^up:catalyzedReaction/up:enzymeClass ?EC .
}

Recursive path

An enzyme hierarchy RDF has also been added in the GBA1 graph.

The skos:broaderTransitive predicate allows to go up in the EC number hierarchy, one parent at the time.

Exercise: From the property path example, display the parents of the EC numbers found in GBA1.

Answer
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX up: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?EC2 WHERE {
    ?protein up:annotation/up:catalyticActivity/up:enzymeClass ?EC .
    ?EC skos:broaderTransitive+ ?EC2 .
}