Export data from Neo4j using Apache Hop and Neo4j Cypher. Optimize your data handling with our step-by-step guide.
Hi there, this is a how-to post and we’ll walk through how to export data from a Neo4j database to CSV files using the Apache Hop plugin for Cypher.
If you are here I suppose you already know about Neo4j and Apache Hop, let’s say that having a basic understanding of the property graph model is the only requirement to have the task done.
The idea is exporting data from a Neo4j database using the Neo4j Cypher transform (plugin) and loading the data to CSV files. You can find the code and the files in the public repo how-to-apache-hop.
We can divide the task into small pieces:
We are going to use a sample Neo4j database created in previous posts. You can check Apache Hop: importing relational data into Neo4j - Graph Output and Apache Hop: importing relational data into Neo4j - Neo4j Output.
The dvdrental graph database represents the business processes of a DVD rental store, including data about the films, actors, and demographic data of the staff.
The dvdrental graph schema:
Graph schema of the dvdrental database |
We’ll export the nodes data in this case:
We should have as result a CSV file with the nodes data in a JSON format.
In order to have the least amount of code possible we’ll implement only two pipelines to have the task done. How? Because in Apache Hop we can use the Pipeline executor plugin.
Let’s see it in an image:
Pipeline executor implementation |
The first pipeline will get all the node labels by executing a Cypher query and will execute another pipeline (using the Pipeline executor transform) to generate the CSV files. We’ll generate a CSV file per node label.
Pipeline export_labels.hpl |
The second pipeline is used as a template and will extract the node data by using a Cypher query and generate a CSV file.
Let's configure the pipeline export_labels.hpl
Neo4j Cypher transform
First, configure a Neo4j Cypher transform to get the node labels.
Options tab
Set the transform name and select the Neo4j database connection.
Neo4j Cypher transform - Options tab |
Cypher tab
White the Cypher query.
Neo4j Cypher - Cypher tab |
Returns tab
Use the Get Output Fields option to display the query result fields.
Neo4j Cypher - Return tab |
Preview Neo4j Cypher transform |
Pipeline export_labels.hpl |
JavaScript transform
JavaScript transform |
If you Preview the results you should see the Cypher query to get the node data according to the label:
Preview JavaScript transform |
Pipeline executor transform
Finally, we need to execute another pipeline that gets the data from the Cypher query and loads the data to a CSV file. We use the Pipeline executor plugin with the fields label and cypher as parameters.
Pipeline export_labels.hpl |
Set the path to the new pipeline and add the parameters.
Pipeline executor transform |
The second pipeline will extract the node data and generate a CSV file with the label name.
Pipeline export_nodes.hpl |
First, we configure the Get variables transform to get the ${CYPHER} parameter received from the first pipeline because we’ll use it in the next transform.
Get variables transform |
Neo4j Cypher transform
Next, configure the Neo4j Cypher transform to get the data using the previously created cypher field that stores the query.
Options tab
Neo4j Cypher transform - Options tab |
Returns tab
Configure the output based on the query return:
MATCH (n:Language) RETURN n;
Neo4j Cypher transform - Returns tab |
Text file output transform
Finally, configure the Text file output transform.
File tab
|