Simple start to GRAPH DATABASE

Teepika R M
6 min readFeb 14, 2022

--

Though Data Engineers or Data Experts work primarily on data, often they go blank when it comes to graph database. It is generally a grey area. Let us break the core concepts in Graphical data structure to understand its technical details.

What is Data Model?

Once you get requirements in hands, the first step is to identify the entities and their relationships, represent them in ER Diagram. It is followed by data modeling, where architects or analysts organizes elements of data, relates them to one another and determine the structure of the data. Generally we design table schemas to represent the entities and relationships obtained ie, from the ER diagram, we try to map the tables and its relationships. Because of the rigid form of tables, it is not always possible to map all relationships in tabular design. This results in data model design not being exact replica of the actual requirements. Here comes graph data model for the rescue. The graph model helps in capturing the nuances of relationships.

Same requirement being designed in relational, key-valued and graph data models:

Same requirement being designed in relational, key-valued and graph data models:
https://www.nextplatform.com/2018/09/19/the-graph-database-poised-to-pounce-on-the-mainstream/

There are two popular graph data models, Resource Description Framework (RDF), and the property graph (PG) model. The query language for RDF is SPARQL, and the query language for the property graph model is Cypher.

We will next see overview of the data models and explore how to query them using SPARQL and Cypher.

RDF Data Model:

RDF data representation is in the format: a subject, a predicate, and an object.

RDF can be seen as node-edge-node as shown in the above picture. Node can be of three types — IRIs, literals or blank nodes. IRI (Internationalized Resource Identifier) is used to uniquely identify resources on web. A blank node is an anonymous variable that does not have an identifier. A literal is any datatype like integer, string etc.

Let us see the node types in examples to understand the differences,

IRI node types can be abbreviated by defining prefix and it shortens the RDF expression of data. The following RDF example represents knows relationship between people. FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects.

@prefix foaf: <http://xmlns.com/foaf/0.1/> // foaf is defined as prefix

@prefix ex: <http://example.org/> // ex is defined as prefix

@prefix knows: <http://xmlns.com/foaf/0.1/knows> //knows is defined as prefix

ex:bea foaf:knows ex:cory // Format of ->IRI IRI IRI, a person named bea knows person named cory

ex:bea foaf:age 23 // Format of ->IRI IRI Literal, a person named bea is of age 23. 23 is an Integer type value and is of literal type node.

ex:bea foaf:based_near _:o1 // Format of ->IRI IRI Blanknode, a person named bea is based near o1. o1 has type as _ which makes it anonymous and belongs to blank node type. Also, it is an internal identifier and has no meaning outside the present graph.

SPARQL — Simple Protocol and RDF Query Language (RDF):

SPARQL is a query language used to manipulate and retrieve data in RDF format. The results of SPARQL queries can be RDF graphs or sets.

@prefix foaf: <http://xmlns.com/foaf/0.1/>

@prefix ex: <http://example.org/>

@prefix knows: <http://xmlns.com/foaf/0.1/knows>

ex:art foaf:knows ex:bob // In plain english, art knows bob

ex:art foaf:knows ex:bea // In plain english, art knows bea

ex:bob foaf:knows ex:cal // In plain english, bob knows cal

ex:bob foaf:knows ex:cam // In plain english, bob knows cam

ex:bea foaf:knows ex:coe // In plain english, bea knows coe

ex:bea foaf:knows ex:cory // In plain english, bea knows cory

ex:bea foaf:age 23 // In plain english, bea age 23

ex:bea foaf:based_near _:o1 // In plain english, bea based_near o1

Let’s see a simple SPARQL query against the RDF data graph given above. The query has a SELECT clause to retrieve the variables to appear in the result set and WHERE clause to give the criteria to match against the input data graph.

SELECT ?person WHERE <http://example.org/bob> <http://xmlns.com/foaf/0.1/knows> ?person

The above SPARQL in plain english -> retrieve the persons who bob knows.

Result Set

SELECT ?person ?person1 WHERE<http://example.org/art> <http://xmlns.com/foaf/0.1/knows> ?person ?person <http://xmlns.com/foaf/0.1/knows> ?person1

The above SPARQL in plain english -> retrieve personA personB where art knows personA and personA knows personB

Result Set

Labeled Property Graph Model:

RDF is specifically used to model information on the web whereas Property Graph are widely used for many causes and by many graph database systems. A Property Graph is represented in terms of nodes, relationships and properties. Each node has a label and a set of properties in terms of key-value pairs. The keys are always strings. Each relationship(directed edge between nodes) has a label and can have a set of properties.

Let’s see few examples to understand the data representation,

Example 1: Property Graph

In the above graph, art and bea are nodes of Person type. They are linked by the knows relationship with a property ‘since 2005’. The bea node has properties, ‘age 23’ and ‘based_near seattle’. In plain english, Person named art knows person named bea of age 23 and based_near seatle since 2005.

Example 2: Property Graph

In the second graph, based_near relationship value is made as a separate node instead of a property as given in the first graph. The reason for modeling city as a separate node is the value can be related to multiple other nodes in the graph because of application requirements. Based on the requirements, value can be designed as property or node in Property Graph Model.

Cypher Query Language(Property Graph Model):

Cypher is a query language used to manipulate and retrieve data in property graph data format. In addition to querying, cypher can be used to create, update and delete data from graphs in graph database.

Let’s start with example for cypher querying operation,

Here the query has two clauses, MATCH and RETURN. MATCH clause specifies what criteria needs to be matched against the data for retrieval and RETURN clause specifies what to return.

For the data represented by ‘knows relationship’, the following query retrieves the persons known by the person named art (in plain english).

MATCH (p1:Person {name: art}) -[:knows]-> (p2: Person)RETURN p2

Here the graph pattern is specified in an ASCII notation for graphs: each node is written in parentheses, and each edge is written as an arrow. Both node and relation specifications include their respective types, and any additional properties that should be matched.

In the next example, the query retrieves persons known by person named art since 2010.

MATCH (p1:Person {name:art}) -[:knows {since: 2010}]-> (p2: Person)

RETURN p2

Following is cypher query example for creating graphs,

CREATE (a:Person { name:”Tom Hanks”,

born:1956 })-[r:ACTED_IN { roles: [“Forrest”]}]->(m:Movie { title:”Forrest Gump”,released:1994 })

CREATE (d:Person { name:”Robert Zemeckis”, born:1951 })-[:DIRECTED]->(m)

RETURN a,d,r,m

Output Graph generated:

Conclusion:

There are advanced feature differences between RDF and the property graph models. The two main differences are the property graph model has relationships/edges with properties like ‘roles = [‘Forrest’]’ in the above example and the property graph does not support blank nodes and does not have the need for IRI. The RDF model has reification as an extension to support edge properties.

--

--

Teepika R M
Teepika R M

Written by Teepika R M

AWS Certified Big Data Specialty| Linux Certified Kubernetes Application Developer| Hortonworks Certified Spark Developer|Hortonworks Certified Hadoop Developer

No responses yet