Skip to content

Advanced Transforms in Apache Hop: Rows Normalization

In this tutorial, we'll explore how to utilize the Apache Hop “Row Normalizer” transform.

Screenshot 2024-03-19 at 21.25.16

“Row Normalizer” transform is ideal for standardizing data from a pivot or denormalized table, allowing for adjustments in row relationships by converting columns into rows.

What will you find here?

Consider a scenario where we have films listed in one column, along with their respective years and the revenue generated by each film per quarter.

Screenshot 2024-03-20 at 15.06.46 

Notice that each quarter is a column.

Screenshot 2024-03-19 at 21.31.03

We may need to restructure the revenue data based on quarters for each film. We need to convert the data from this format:

Screenshot 2024-03-19 at 21.32.13
 

To this format: 

Screenshot 2024-03-19 at 21.33.23
                                                 ...
 

To accomplish this, we can use the "Row Normalizer" transform to normalize the pivot data.

The data has been generated using a data grid.

Screenshot 2024-03-19 at 21.34.53-Photoroom.png-Photoroom

While we've demonstrated this with a simplified example, it's important to note that the data source can be in any format, such as Excel, tables, etc.

To begin, we'll cover the steps involved in this use case scenario:

Step 1: Add and connect the "Row Normalizer" transform.

Screenshot 2024-03-19 at 21.39.12-Photoroom.png-Photoroom

Step 2: Configure the "Row Normalizer" transform.

Screenshot 2024-03-19 at 21.52.10

  • Transform name: Define a transform name (normalize-rows).

  • Type field: Specify a name for the output field name (quarter).

  • Fieldname and Type: Select the desired fields from the data grid. We keep the fields ("Fieldname") namely quarter1, quarter2, quarter3 and quarter4 and the "Type" with the same name. This way the quarter output field will have these types as the output values.

  • New field: Define a new field for revenue (revenue).

By previewing the normalize-rows transform, you'll see the transformed data, achieving the desired output format.

Screenshot 2024-03-19 at 21.47.39-Photoroom.png-Photoroom
 
After normalization, you can export the dataset to different formats based on your requirements.

But in our simplified example, that concludes the process! We've effectively transformed the data in the datagrid into a normalized format.

Screenshot 2024-03-19 at 21.50.12

How it works

In a nutshell, how we did it❓

  1. We created a pipeline and mapped the input data using a normalizer.

    Screenshot 2024-04-18 at 09.19.35
  2. The type field was set as "quarter". This is the new quarter column that will be created.

    Screenshot 2024-04-18 at 09.31.28
  3. The "Fieldname" is the column name in the source, in this case, we will only normalize the quarters: quarter1, quarter2, quarter3 and quarter4.

    Screenshot 2024-04-18 at 09.31.58
  4. The type is a string we specify as the values in the new column. In this case we want to store the quarters with the same name of teh column: quarter1, quarter2, quarter3 and quarter4. We could have also specified, for example, Q1, Q2, Q3, Q4, or first, second, third, fourth.

    Screenshot 2024-04-18 at 09.32.05
  5. The "new field" indicates the revenue value. This is the new revenue column that will be created.

    Screenshot 2024-04-18 at 09.32.11

The result is the normalized output:Screenshot 2024-04-18 at 09.47.41Now, instead of 4 quarter columns containing the revenue data, we have one new quarter column and one new revenue column.

💡 Important tip: When merging several columns with different types (such as String and Integer) into a new field, there's no automatic type conversion. Instead, the initial type is retained. This absence of conversion might cause problems with subsequent data row transformations. It's highly recommended to ensure alignment of data types for values being merged into the same field before normalization.

Summary

  1. Understand the purpose: The "Row Normalizer" transform in Apache Hop is designed to standardize data from pivot or denormalized tables by converting columns into rows.

  2. Consider your data structure: Before using the "Row Normalizer," assess whether your data needs restructuring, especially if it involves pivot-like structures.

  3. Utilize the right fields: When configuring the transform, select the appropriate fields from your dataset, ensuring you include those relevant to the normalization process.

  4. Preview before finalizing: Always preview the transformation results to ensure they align with your expectations and requirements.

  5. Export options: Once the normalization process is complete, you can export the transformed dataset to various target formats, catering to your specific needs and preferences.