How to import data into graph database?
How to import RDF data into Neo4j?
About Neosemantics:
Neosemantics is a plugin that enables the use of RDF (Resource Description Framework) in Neo4j. RDF is a standard model for data interchange on the Web.
Some key features of neosemantics are:
- Store RDF data in Neo4j in a lossless manner (imported RDF can subsequently be exported without losing a single triple in the process)
- On-demand export property graph data from Neo4j as RDF
Neo4j Desktop is used to demonstrate the import process in the post and Neosemantics can be installed in Neo4j Desktop to utilize its features.
About RDF Data:
As seen in the previous post, RDF data representation is in the format: a subject, a predicate, and an object.
RDF can be seen as node-edge-node as shown in the above picture. Node can be of three types — IRIs, literals or blank nodes. IRI (Internationalized Resource Identifier) is used to uniquely identify resources on web. A blank node is an anonymous variable that does not have an identifier. A literal is any datatype like integer, string etc.
To explore how to import, query and interpret rdf data using Neo4j, we will use one of the three publicly available files of FoodKG— foodkg-core.trig.
About the file used:
Please follow this link https://foodkg.github.io/foodkg.html to know more about the knowledge graph — FoodKG. FoodKG is aSemantics-Driven Knowledge Graph for Food Recommendation. It consolidates recipe details (Im2Recipe), nutrient information (USDA) and ontology details (FoodOn) to provide additional data about the origin and preparation of foods. Instructions are given in the link to generate the output Trig files for utilization. There are three files generated as part of the graph construction process, usda-links.trig, foodon-links.trig and foodkg-core.trig. The foodkg-core.trig is used in the post for demonstration.
You can try with other small rdf files, I used them since I explored the files for my project.
There are several serialization formats available for RDF and some of them are Trig, Turtle, N-Triples, N-Quads, JSON-LD, N3(Notation3), RDF/XML, RDF/JSON etc. Serialization formats are nothing but formats that can be stored or transmitted and reconstructed to original form later. The file we use for exploration in the blog is of Trig format.
Demonstration of RDF file import:
Before opening Neosemantics from Graph Apps in Neo4j Desktop, make sure you have a project and Graph Database created.
In the below snapshot, trial is the project name.
Trial project has two Graph databases — Graph and Graph DBMS. I intend to use Graph DBMS for this exploration. So I started the database instance and made sure it is in active status before starting with the import process.
The database instance is in active state now,
Steps with snapshots of setting up the configuration and importing the RDF file:
Step 1: Open Neosemantics and choose the project, active graph DBMS for persisting data
Step 2: Set the pre-requisite and start with the commands.
Pre-requisite to persist data in Neo4j is to create a constraint-> A uniqueness constraint on the property uri
of nodes with the label Resource.
Step 3: Before importing and working with the data, we need to do Graph Config settings to define how the data needs to be handled by Neo4j. Settings like whether to use full names or not for schema elements, multivalued properties can be ignored or stored as arrays in Neo4j etc., can be set using Graph Config. Once those settings are done, they remain for the lifetime of the graph. So we need to be aware of how we want our graph to be and set it before importing the data.
Cypher is Neo4j’s graph query language that lets you access and retrieve data from its graph. The following cypher command lets you set the default values for the Graph, but with a specific parameter being passed to the procedure, you can set specific values for configuration items.
CALL n10s.graphconfig.init();
CALL n10s.graphconfig.init( { handleMultival: “ARRAY” }) // All other configuration is set with default values except the property ‘handleMultival’
or
CALL n10s.graphconfig.init()
CALL n10s.graphconfig.set( { keepLangTag: true, handleRDFTypes: “LABELS_AND_NODES” }); // Lets you update the configuration items after initial Graph Config created.
Step 4: Import the file (from local file system)
Step 5: Start Querying the imported rdf file using cypher query language.
MATCH (n)-[r]-(m) RETURN distinct type(r);
The above query retrieves all the relationships that exist in the graph
I hope this post helps you to get started with how to import and start playing with RDF data in Neo4j. Please comment for any suggestions or clarifications.