In this tutorial, we'll explore how to utilize the Apache Hop “Row Denormalizer” transform.
Advanced Transforms in Apache Hop: Rows Normalization
In this tutorial, we'll explore how to utilize the Apache Hop “Row Normalizer” transform.
“Row Normalizer” transform is ideal for standardizing data from a pivot or denormalized table, allowing for adjustments in row relationships by converting columns into rows.
What will you find here?
Consider a scenario where we have films listed in one column, along with their respective years and the revenue generated by each film per quarter.
Notice that each quarter is a column.
We may need to restructure the revenue data based on quarters for each film. We need to convert the data from this format:

To this format:

To accomplish this, we can use the "Row Normalizer" transform to normalize the pivot data.
The data has been generated using a data grid.
While we've demonstrated this with a simplified example, it's important to note that the data source can be in any format, such as Excel, tables, etc.
To begin, we'll cover the steps involved in this use case scenario:
Step 1: Add and connect the "Row Normalizer" transform.
Step 2: Configure the "Row Normalizer" transform.
-
Transform name: Define a transform name (normalize-rows).
-
Type field: Specify a name for the output field name (quarter).
-
Fieldname and Type: Select the desired fields from the data grid. We keep the fields ("Fieldname") namely quarter1, quarter2, quarter3 and quarter4 and the "Type" with the same name. This way the quarter output field will have these types as the output values.
-
New field: Define a new field for revenue (revenue).
By previewing the normalize-rows transform, you'll see the transformed data, achieving the desired output format.

But in our simplified example, that concludes the process! We've effectively transformed the data in the datagrid into a normalized format.
How it works
In a nutshell, how we did it❓
-
We created a pipeline and mapped the input data using a normalizer.
-
The type field was set as "quarter". This is the new quarter column that will be created.
-
The "Fieldname" is the column name in the source, in this case, we will only normalize the quarters: quarter1, quarter2, quarter3 and quarter4.
-
The type is a string we specify as the values in the new column. In this case we want to store the quarters with the same name of teh column: quarter1, quarter2, quarter3 and quarter4. We could have also specified, for example, Q1, Q2, Q3, Q4, or first, second, third, fourth.
-
The "new field" indicates the revenue value. This is the new revenue column that will be created.
The result is the normalized output:Now, instead of 4 quarter columns containing the revenue data, we have one new quarter column and one new revenue column.
💡 Important tip: When merging several columns with different types (such as String and Integer) into a new field, there's no automatic type conversion. Instead, the initial type is retained. This absence of conversion might cause problems with subsequent data row transformations. It's highly recommended to ensure alignment of data types for values being merged into the same field before normalization.
Summary
-
Understand the purpose: The "Row Normalizer" transform in Apache Hop is designed to standardize data from pivot or denormalized tables by converting columns into rows.
-
Consider your data structure: Before using the "Row Normalizer," assess whether your data needs restructuring, especially if it involves pivot-like structures.
-
Utilize the right fields: When configuring the transform, select the appropriate fields from your dataset, ensuring you include those relevant to the normalization process.
-
Preview before finalizing: Always preview the transformation results to ensure they align with your expectations and requirements.
-
Export options: Once the normalization process is complete, you can export the transformed dataset to various target formats, catering to your specific needs and preferences.