Hi there, this is a how-to post and we’ll walk through how to export data from a Neo4j database to CSV files using the Apache Hop plugin for Cypher.
If you are here I suppose you already know about Neo4j and Apache Hop, let’s say that having a basic understanding of the property graph model is the only requirement to have the task done.
The idea is to export data from a Neo4j database using the Neo4j Cypher transform (plugin) and load the data to CSV files.
This post is similar to Apache Hop: exporting data from Neo4j to CSV - Neo4j Cypher but in this case, we export the data in a table format instead of JSON format by using metadata injection.
You can find the code and the files in the public repo how-to-apache-hop.
We can divide the task into small pieces:
Let’s see this in an image:
We implement a metadata injection solution that no matter the amount of node labels can be used and you don’t need to make any manual configuration for the input and output fields.
How? We use the Metadata Injection plugin and the Pipeline execution plugin.
The first pipeline will get all the node labels by executing a Cypher query and will execute another pipeline (using the Pipeline executor) to generate the CSV files. We’ll generate a CSV file per node label.
The second pipeline is used to get the needed metadata and pass this metadata at runtime to a pipeline template.
The template will extract the node data by using a Cypher query and generate a CSV file per node label.
We are going to use a sample Neo4j database created in previous posts.
The dvdrental graph database represents the business processes of a DVD rental store, including data about the films, actors, and demographic data of the staff.
The dvdrental graph schema:
We’ll export the nodes data in this case:
We should have as result a CSV file with the nodes data in a table format.
First, configure a Cypher transform to get the node labels.
Options tab
Set the transform name and select the Neo4j database connection.
Cypher tab
White the Cypher query.
Use the Get Output Fields option to display the query result fields.
If you Preview the results you should see all the node labels:The second pipeline will extract the node data and generate a CSV file with the label name.
Let’s take a step back. So far, we have all the labels we need to build the Cypher query to pass as Metadata to the pipeline template but we also need to build the output fields and data types for each Cypher query.
To do so, we need all the properties associated with a label.
Example
Configure a Cypher transform to get the properties by label.
Options tab
Cypher tab
Cypher query
CALL db.schema.nodeTypeProperties;
Return tab
If you run the query in your Neo4j database, you will get the following result:
Now, we have all the labels stored in the label field and all the properties, so we can:
We apply all these changes in the following block of transforms:
You can check the results by checking the logs. For the label Language, for example, we get the following values:
Finally, we configure the Metadata Injection transform. But, which fields do we need as metadata?
The pipeline template will extract the node data from the dvdrental graph database with a Neo4j Cypher transform and generate a CSV with a Text file output transform.
To do so, configure the Metadata Injection transform as follows:
Text, configure the Neo4j Cypher transform to get the data using the previously created cypher field that stores the query.
Options tab
Finally, configure the Text output.
File tab