In this tutorial, we'll explore how to utilize the Apache Hop “Row Denormalizer” transform.
The "Row denormalizer" transform enables you to denormalize data by referencing key-value pairs, with the added flexibility to convert data types during the process.
Let's consider a scenario where we have films listed in one column, along with their respective years, quarters, and the revenue generated.
Notice that each quarter is a row in the column quarter.
We need to restructure the revenue data based on quarters for each film. We need to convert the data from this format:
To accomplish this, we can use the "Row denormalizer" transform to convert each quarter on a column.
The data has been generated using a data grid.
While we've demonstrated this with a simplified example, it's important to note that the data source can be in any format, such as Excel, tables, etc.
To begin, we'll cover the steps involved in this use case scenario:
Step 1: Add and connect the "Row denormalizer" transform.
Step 2: Configure the "Row denormalizer" transform:
💡 Please, note that we maintain the "Target fieldnames" with the same "Key values" as quarter1, quarter2, quarter3, and quarter4 respectively but you can choose a different name for the columns to be generated.
However, we're overlooking a crucial step.
Upon clicking "OK" to save, a message appears, indicating that if the data is not sorted, the denormalized results may be incorrect. Therefore, it is necessary to sort the data before proceeding with denormalization.
We add and connect a Sort Rows transform, then configure it by providing a name and fetching the fields from the previous transform.
Then, we configure it by providing a name and fetching the fields from the previous transform.
By previewing the denormalize-rows transform, you'll see the transformed data, achieving the desired output format.
But in our simplified example, that concludes the process! We've effectively transformed the data in the datagrid into a denormalized format.
In summary, how we did it❓
We created a pipeline and mapped the input data using a denormalizer.
The key field was set as the "quarter" column in the source.
We grouped the revenue per quarter using "film" and "year", resulting in a revenue entry for each film, year, and quarter.
The target fieldname represents the new columns to be created, generating a column for each quarter value.
The value field name indicates the revenue value.
Finally, the key value corresponds to the quarter number for each distinct value in the "quarter" column.
The result is the denormalized output:
We haven't explored all the options available in the Target fields table within the "Row denormalizer" transform.
For example, in the following example, we only group by film, and select "Number of Values" as the "Aggregation" method.
Upon previewing the "Row denormalizer" transform, we obtain the following results:
Instead of displaying the revenue value, the output now indicates how many revenue values each film has per year and quarter. For example, the film "Academy Dinosaur" has 2 years in the input dataset.
💡 Notice that, in this case, the year value is not correct because we only grouped by film.
Apart from the aggregation method, you can explore the other options in the "Target fields" table we didn't cover in this post: