Skip to content

Exporting Data from Neo4j using Apache Hop - Neo4j Cypher

Export data from Neo4j using Apache Hop and Neo4j Cypher. Optimize your data handling with our step-by-step guide.

Introduction

Hi there, this is a how-to post and we’ll walk through how to export data from a Neo4j database to CSV files using the Apache Hop plugin for Cypher.

If you are here I suppose you already know about Neo4j and Apache Hop, let’s say that having a basic understanding of the property graph model is the only requirement to have the task done.

Neo4j Logo

The idea is exporting data from a Neo4j database using the Neo4j Cypher transform (plugin) and loading the data to CSV files. You can find the code and the files in the public repo how-to-apache-hop.

We can divide the task into small pieces:

  1. First, we need to identify the data we’re going to export and the format.
  2. Implement a pipeline to get the node labels and execute another pipeline to generate the CSV files.
  3. Implement a pipeline to extract the data of the nodes and loading to the CSV files.

The graph database

We are going to use a sample Neo4j database created in previous posts. You can check Apache Hop: importing relational data into Neo4j - Graph Output and Apache Hop: importing relational data into Neo4j - Neo4j Output.

The dvdrental graph database represents the business processes of a DVD rental store, including data about the films, actors, and demographic data of the staff.

The dvdrental graph schema:

dvdrental graph schema
Graph schema of the dvdrental database
 

Step 1: Identify the data to be exported and the format

We’ll export the nodes data in this case:

  • Actor
  • Film
  • Language
  • Category

We should have as result a CSV file with the nodes data in a JSON format.

Step 2: Implement a pipeline to get the node labels

In order to have the least amount of code possible we’ll implement only two pipelines to have the task done. How? Because in Apache Hop we can use the Pipeline executor plugin.

Let’s see it in an image:

Pipeline executor implementation diagram
Pipeline executor implementation

The first pipeline will get all the node labels by executing a Cypher query and will execute another pipeline (using the Pipeline executor transform) to generate the CSV files. We’ll generate a CSV file per node label.

Pipeline export_labels.hpl
Pipeline export_labels.hpl

The second pipeline is used as a template and will extract the node data by using a Cypher query and generate a CSV file.

Pipeline export_nodes.hpl
Pipeline export_nodes.hpl

Let's configure the pipeline export_labels.hpl

Neo4j Cypher transform

First, configure a Neo4j Cypher transform to get the node labels.

Options tab

Set the transform name and select the Neo4j database connection.

Neo4j Cypher Options tab
Neo4j Cypher transform - Options tab

 

  • Transform name: the name for this transform in the pipeline (write-graph).
  • Neo4j Connection: select the Neo4j connection to write the graph to (neo4j-connection).

Cypher tab

White the Cypher query.

Cypher query

 

CALL db.labels;
Neo4j Cypher Cypher tab
Neo4j Cypher - Cypher tab
 

Returns tab

Use the Get Output Fields option to display the query result fields.

Neo4j Cypher - Return tab
Neo4j Cypher - Return tab
If you Preview the results you should see all the node labels:

 

Preview Neo4j Cypher transform
Preview Neo4j Cypher transform

Now, going back to the pipeline we have the first transform configured, so we can get the labels but we need to build a Cypher query to get the nodes data for each label. To do so, we use the JavaScript plugin, the second transform in the pipeline.
Pipeline export_labels.hpl
Pipeline export_labels.hpl

JavaScript transform

Next, configure the JavaScript transform to build the Cypher queries. Build the Cypher queries by concatenating the code with the label field.
JavaScript transform
JavaScript transform
 

If you Preview the results you should see the Cypher query to get the node data according to the label:

Preview JavaScript transform
Preview JavaScript transform
 

Pipeline executor transform

Finally, we need to execute another pipeline that gets the data from the Cypher query and loads the data to a CSV file. We use the Pipeline executor plugin with the fields label and cypher as parameters.

Pipeline export_labels.hpl
Pipeline export_labels.hpl
 

Set the path to the new pipeline and add the parameters.

🗒 Note that the pipeline will be executed once per each row in the data stream (labelcypher).

Pipeline executor transform
Pipeline executor transform
 

Step 3: Implement a pipeline to write the node labels

The second pipeline will extract the node data and generate a CSV file with the label name.

Pipeline export_nodes.hpl
Pipeline export_nodes.hpl
 
Get variables transform

First, we configure the Get variables transform to get the ${CYPHER} parameter received from the first pipeline because we’ll use it in the next transform.

Get variables transform
Get variables transform

 

  • Transform name: choose the transform name (get-cypher).
  • Name: choose the field name (cypher).
  • Variable: set the variable to get the value from (${CYPHER}).
  • Type: specifies the field type (String).

Neo4j Cypher transform

Next, configure the Neo4j Cypher transform to get the data using the previously created cypher field that stores the query.

Options tab

export data from Neo4j using Apache Hop and Neo4j Cypher. Optimize your data handling with our step-by-step guide.
Neo4j Cypher transform - Options tab
  • Transform name: set the name of the transform.
  • Check the option Get Cypher from the input field and select the cypher field.

Returns tab

Configure the output based on the query return:

MATCH (n:Language) RETURN n;

Neo4j Cypher transform - Returns tab
Neo4j Cypher transform - Returns tab

Text file output transform

Finally, configure the Text file output transform.

File tab


Use the ${OUTPUT_DIR} var added to the dev environment file (env_dev.json) and the ${LABEL} var to set the output file path.
 
Fields tab
  • Name: the output field in the Cypher query (n).
  • Type: set the data type (String).

The generated CSV files

Pipeline export_nodes.hpl
Pipeline export_nodes.hpl
If the main pipeline runs successfully, you will get a CSV containing the data of each node type:
  • Actor.csv
  • Category.csv
  • Film.csv
  • Language.csv
Generated CSV files
Generated CSV files