Skip to content

Export Data from Neo4j using Apache Hop - Neo4j Cypher & Metadata Injection

Facilitate data export from Neo4j using Apache Hop, leveraging Neo4j Cypher and Metadata Injection. Explore this guide for efficient data export processes.

Introduction

Hi there, this is a how-to post and we’ll walk through how to export data from a Neo4j database to CSV files using the Apache Hop plugin for Cypher.

If you are here I suppose you already know about Neo4j and Apache Hop, let’s say that having a basic understanding of the property graph model is the only requirement to have the task done.

Neo4j Logo

The idea is to export data from a Neo4j database using the Neo4j Cypher transform (plugin) and load the data to CSV files.

This post is similar to Apache Hop: exporting data from Neo4j to CSV - Neo4j Cypher but in this case, we export the data in a table format instead of JSON format by using metadata injection.

You can find the code and the files in the public repo how-to-apache-hop.

We can divide the task into small pieces:

  1. First, we need to identify the data we’re going to export and the format.
  2. Implement a metadata injection solution with 3 pipelines:
    • A pipeline to get the node labels and execute the pipeline with metadata injection.
    • A pipeline to inject the needed metadata into a pipeline template.
    • The pipeline template to extract the nodes data and generate the CSV files.

Let’s see this in an image:

Apache Hop - Neo4j Cypher and Metadata Injection

 

We implement a metadata injection solution that no matter the amount of node labels can be used and you don’t need to make any manual configuration for the input and output fields.

How? We use the Metadata Injection plugin and the Pipeline execution plugin.

The first pipeline will get all the node labels by executing a Cypher query and will execute another pipeline (using the Pipeline executor) to generate the CSV files. We’ll generate a CSV file per node label.

The second pipeline is used to get the needed metadata and pass this metadata at runtime to a pipeline template.

The template will extract the node data by using a Cypher query and generate a CSV file per node label.

The graph database

We are going to use a sample Neo4j database created in previous posts.

The dvdrental graph database represents the business processes of a DVD rental store, including data about the films, actors, and demographic data of the staff.

The dvdrental graph schema:

dvdrental Neo4j schema

 

Step 1: Identify the data to be exported and the format

We’ll export the nodes data in this case:

  • Actor
  • Film
  • Language
  • Category

We should have as result a CSV file with the nodes data in a table format.

Step 2: Get the node labels and execute the metadata injection

Apache Hop - Neo4j Cypher and Metadata Injection

First, configure a Cypher transform to get the node labels.

Options tab

Set the transform name and select the Neo4j database connection.

Apache Hop - Neo4j Cypher

  • Transform name: the name for this transform in the pipeline (read-labels).
  • Neo4j Connection: select the Neo4j connection to write the graph to (neo4j-connection).

Cypher tab

White the Cypher query.

 

Apache Hop - Neo4j Cypher
Cypher query
 
Call db.labels;
 
Returns tab

Use the Get Output Fields option to display the query result fields.

Apache Hop - Neo4j Cypher
If you Preview the results you should see all the node labels:
Apache Hop -Preview data
Then, we need to execute another pipeline that injects the metadata to the template. We use the Pipeline execution plugin with the fields label and cypher as parameters.
 
Apache Hop - Neo4j Cypher and Metadata Injection
Set the path to the new pipeline and add the parameters.
 
Apache Hop - Pipeline Execution
 
🗒 Note that the pipeline will be executed once per row in the data stream (label, cypher).
 

Step 3: Implement a pipeline to inject the metadata

The second pipeline will extract the node data and generate a CSV file with the label name.

Apache Hop - Neo4j Cypher and Metadata Injection

First, we get the ${LABEL} parameter received from the first pipeline because we’ll use it in future transforms.
Apache Hop - Get variables
  • Name: label, the field name.
  • Variable: ${LABEL}, the variable to get the value from.
  • Type: String, specifies the field type.

Let’s take a step back. So far, we have all the labels we need to build the Cypher query to pass as Metadata to the pipeline template but we also need to build the output fields and data types for each Cypher query.

To do so, we need all the properties associated with a label.

Example

Apache Hop - Neo4j Example
 

Configure a Cypher transform to get the properties by label.

Options tab

Apache Hop - Neo4j Cypher
  • Neo4j Connection: select the neo4j-connection.

Cypher tab

Apache Hop - Neo4j Cypher
  • Set the Cypher query to get the properties.

Cypher query

CALL db.schema.nodeTypeProperties;

Return tab

Apache Hop - Neo4j Cypher
  • Use the Get Output Fields option to set the output for the query return.

If you run the query in your Neo4j database, you will get the following result:

Screenshot 2024-02-28 at 11.12.28

 

Now, we have all the labels stored in the label field and all the properties, so we can:

  • Filter by label and get only the properties associated with the label we receive from the previous pipeline at a time.
  • Modify the propertyName and propertyTypes fields by removing the characters (“, [, ]) and mapping some data types to Hop data types (Long, Double) to build the Cypher query we’ll pass as metadata to the template pipeline.

We apply all these changes in the following block of transforms:

Apache Hop - Pipeline

 

You can check the results by checking the logs. For the label Language, for example, we get the following values:

Apache Hop - Logs

 

Finally, we configure the Metadata Injection transform. But, which fields do we need as metadata?

The pipeline template will extract the node data from the dvdrental graph database with a Neo4j Cypher transform and generate a CSV with a Text file output transform.

Apache Hop - Neo4j Cypher
  • Neo4j Cypher: For this transform we need to set the Cypher query we built as cypher_query and the node properties we store in the field concat.
Apache Hop - Neo4j Cypher
 
Apache Hop - Neo4j Cypher
Text file output: For this transform, we need to set output field names we store in concat and the field types we store in propertyTypes.
Apache Hop - Metadata Injection

To do so, configure the Metadata Injection transform as follows:

Apache Hop - Metadata Injection
  • Browse and select the pipeline template.
  • Set the field cypher_query to inject the Cypher field.
Apache Hop - Metadata Injection

 

  • Set the return fields for the Neo4j Cypher transform.
Apache Hop - Metadata Injection
  • Set the return fields for the Text field output transform.

Step 4: Create a pipeline to be used as template

Text, configure the Neo4j Cypher transform to get the data using the previously created cypher field that stores the query.

Options tab

Apache Hop - Neo4j Cypher
  • Neo4j Connection: select the neo4j-connection.

Finally, configure the Text output.

File tab

Apache Hop - Text File Output

Use the ${OUTPUT_DIR} var added to the dev environment file and the ${LABEL} var to set the output file path.

 

The generated CSV files

Apache Hop - Metadata injection
 

If the main pipeline run successfully, you will get a CSV containing the data of each node type:

  • Actor.csv
  • Category.csv
  • Film.csv
  • Language.csv

 

Exported files