Skip to content

Importing Relational Data to Neo4j using Apache Hop - Graph Output

Effortlessly import relational data to Neo4j using Apache Hop. Transform your data with the Graph Output plugin.

Introduction

Hello and welcome to this tutorial on how to import relational data into a Neo4j database using Apache Hop.

Neo4j Logo

If you are here I suppose you already know about Neo4j and Apache Hop, let’s say that having a basic understanding of the property graph model is the only requirement to have the task done.

Apache Hop and Neo4j1

The idea is exporting data from a relational database system (we’ll use PostgreSQL in this case) and importing the data into a Neo4j database (graph database) using the Neo4j Graph Output transform (Apache Hop plugin).

You can find the code and the files in the public repo how-to-apache-hop.

We can divide the task into small pieces:

  1. First, we need to translate the relational schema to a graph model.
  2. Then, we configure or design the translation in Apache Hop. We create a graph model metadata object in Apache Hop.
  3. Finally, we implement a pipeline in Apache Hop to extract the data from the relational database and loading the data to the Neo4j database.

You can also check our post Apache Hop: Importing relational data into Neo4j - Neo4j Output, which is similar but in that case we load all the nodes, not only the nodes with at least a relationship and we use another Apache Hop plugin: Neo4j Output.

The relational database

We are going to use a sample PostgreSQL database. The dvdrental database represents the business processes of a DVD rental store, including data about the films, actors, and demographic data of the staff.

dvdrental PostgreSQL database 

🗒 To keep the graph smaller, we just choose some of the entities that are most relevant for our graph and would most likely benefit from understanding the relationships in our example.

dvdrental database model fragment

The relational data model must first be turned into a graph data model. When constructing a graph model from a relational model, there are a few standard guidelines to follow.
Convert relational model to graph model

Step 1: Translate the relacional schema to the graph schema

Based on this, we convert our relational model to a graph model.

Nodes and Labels

  • Each row on our Actor table becomes a node in our graph with Actor as the label.

  • Each row on our Film table becomes a node with Film as the label.

  • Each row on our Category table becomes a node with Category as the label.

  • Each row on our Language table becomes a node with Language as the label.

Relationships

  • The join between Actor and Film becomes a relationship named ACTS_IN.

  • The join between Film and Category becomes a relationship named BELONGS_TO.

  • The join between Film and Language becomes a relationship named IN.

If we draw our translation on a whiteboard, we have this graph data model.

The dvdrental graph database model

 

Step 2: Create the graph model in Apache Hop

You can build a Neo4j Graph Model in Apache Hop as a metadata object. A graph model in Apache Hop allows you to create nodes with their attributes, as well as the connections between these nodes.

The Neo4j Graph Output transform's input fields may then be mapped to properties using a graph model.

The following is how the dvdrental graph model is set up.

Model tab

The dvdrental graph model - Model tab

Model name: set the name of the graph model (dvdrental).

Nodes tab

The dvdrental graph model - Nodes tab

 
  • Name: specify the node name (Actor).
  • Labels: choose the label name (Actor).
  • Properties: set the property keys (actor_idlast_name, first_name) and the property types (IntegerStringString).
  • Primary: specify the primary key (actor_id).

As we did for the node Actor, configure an entrance for the rest of the nodes: FilmCategory and Language.

Relationships tab

The dvdrental graph model - Relationships tab

Configure all the relationships by specifying the fields:

  • Name: specify the name of the relationship (ACTS_IN).

  • Label: set the name of the label for the relationship (ACTS_IN).

  • Source: specify the origin node of the relationship (Actor).
  • Target: specify the target node of the relationship (Film).

As we did for the relationship ACTS_IN, configure an entrance for the rest of the relationships: BELONGS_TO and IN.

Graph tab

You can visually check the model you just created.

The dvdrental graph model - Graph tab

Step 3: Load the data to Neo4j

The data may now be imported into the Neo4j database. Using a Table Input and a Neo4j Graph Output transform, we build an Apache Hop pipeline:

Pipeline import_graph_output.hpl

Using a PostgreSQL connection and the Table Input transform, retrieve the data from the dvdrental database:

SQL query

SELECT a.actor_id,
a.first_name,
a.last_name,
f.film_id,
f.title,
f.description,
f.rental_rate,
c.category_id,
c.name as category,
l.language_id,
l.name as language
FROM public.actor a
JOIN film_actor fa ON fa.actor_id = a.actor_id
JOIN film f ON f.film_id = fa.film_id
JOIN film_category fc ON fc.film_id = f.film_id
JOIN category c ON c.category_id = fc.category_id
JOIN language l ON l.language_id=f.language_id;

Table input transform 

The Neo4j Graph Output transform allows you to input data into the Neo4j dvdrental database using the dvdrental Neo4j Graph Model as a reference.

Neo4j Graph Output transform - Fields to properties mappings tab

  • Transform name: choose a name for this transform in the pipeline (write-graph).
  • Neo4j Connection: select the Neo4j connection to write the graph to (neo4j-connection).

  • Graph model: select the graph model we created (dvdrental).

  • Fields: to map the input-output fields, you can use the Map fields option.

We just have to map the concepts following the same rules.

Neo4j Graph Output transform - Fields mapping

You can select the fields and hit the Add option or use the Guess option to generate all the mappings.

The mapping looks like this in the image:

Neo4j Graph Output transform - Fields mapping1

Now you’re ready to run the pipeline:

Pipeline import_graph_output.hpl1

The dvdrental knowledge graph

If your pipeline runs successfully, you will get a graph database containing your dvdrental data.

Let's explore the graph database you just loaded.

The dvdrental knowledge graph
Run the following Cypher code to see the graph schema:

Cypher

CALL db.schema.visualization;

The dvdrental graph database model

Run the following Cypher code to see some of the nodes:

Cypher

MATCH (n) RETURN n LIMIT 15;

The dvdrental graph database model