Applied Data, Data Analysis, Retail Analytics

Implementing the advanced incremental refresh on big datasets- Part 3

To give you a quick recap, we have so far implemented the incremental refresh on the data model, configured the parameter values, and published the data model successfully with all 30 historical monthly partitions created.  

In this final article of the series, we will be discussing on how we can load data into each of the monthly partitions and configure the incremental refresh policies on multiple tables within the same model using external tools like ALM Toolkit and SQL Server Management Studio (SSMS).  

Data Processing 

Before you start loading data into the partitions using SSMS, the Power BI admin needs to enable the ‘Read/Write operation’ on the dataset published.  

Now that Read/Write access is enabled, the data can be pushed into each of these partitions. In addition, you can make alterations to the incremental policies, the parameter values, and its filtering behavior from the SSMS tool. 

As we discussed in Article 2, the workaround for the time-out issue was to filter all the rows out from the model so that the model could be published with ease. Now, we need to remove the filters so that all the rows can be loaded into the fact table. 

Removing the filter criteria and changing the incremental policy on the published Power BI dataset model can either be done by generating an XMLA query in SSMS or using the external tool ‘ALM Toolkit’. In this example, we will be using the ALM toolkit. Note: As a word of warning, removing the filters or changing the incremental behavior in the Power BI Desktop file and publishing it again will override the initial settings. 

ALM Toolkit is a free and open-source tool to manage Microsoft Power BI datasets. This tool allows you to compare the model that you published to the data model you have in your development environment (the local .pbix file). 

In the screenshot below, you can see the Source and Target comparison of the data model. The Source here is the Power BI desktop environment (.pbix file), which has the filter on Transaction_ID removed, whereas the published model still has the filter Transaction_ID = 0 resulting in all of the rows being filtered out of the model. 

The Source (The Power BI desktop environment (.pbix file) 

The Target (Published data model) 

The goal was to remove the filter on the Transaction_ID column in the Target and update this in the ALM Toolkit. This then loads all the rows for the partitions without republishing the Power BI Desktop version of the data model.  

You can follow the below steps to update the filter on the target data model:  
Click on Select Actions è Hide Skip Objects è Validate Selection è Update Script  
This will push the changes to the published model in Power BI service. 

Once you push the changes you will the see the ‘Success. Metadata deployed’ message. 

To validate that this change was successful, login to your SSMS application and Right-click the published Fact table è then script table as è Create or Replace to è New Query editor window. 

This generates an XMLA query window, and you can see in the below screenshot that the filter was removed. 

We can now start running the partitions to load the data. Below, is the general process on how to run partitions in SSMS: 

Right click on the Fact table upon which the incremental refresh was built è Click on Partitions è Validate the list of partitions è Click on the ‘Process’ symbol è There are 5 options to choose from, which you can read more on about here : https://docs.microsoft.com/en-us/analysis-services/multidimensional-models/processing-options-and-settings-analysis-services?view=sql-analysis-services-2022

From the list of process modes available we chose the ‘Process Full’ function to process both the KPI calculations and to construct the table relationships in the data model.  

In the screenshot below, you will see that we started processing data for 2020 January. The box indicates that the process has started running and is currently actively loading data into the partition. 

Once the processing is completed, you will see another message box pop up indicating that all the rows were loaded into that partition successfully 

To verify that the data has been loaded you can now open your Power BI report and start interacting with data. 

The Impact 

This was a huge win since the query times improved significantly and we did not need to load the 2020 December month partition again because it maintained the history. In a similar fashion, you could load the data into all the remaining 29 historical monthly partitions. Once the data load is complete for all the historical partitions, based on the incremental refresh policy defined earlier, the fact table will now only look to refresh the most recent three days of data. Another win for us was to notice the dataset refresh times were significantly lower now and took no more than 10 minutes to refresh the dataset.  

Incremental Policy on Multiple Fact Tables 

The next step for us was to implement the incremental policies on other candidate tables in the model. As we discussed in Article 2, you cannot use different named parameters for other candidate tables like RangeStart1 and RangeEnd1. Therefore, we had to reuse the original parameters with the original naming convention ‘RangeStart’ and ‘RangeEnd’ date as defined for the first Fact table.  

Also, as mentioned in Article 2, we cannot republish the model after building incremental refresh on other candidate tables because that will overwrite the initial settings and you will lose the configurations for the first fact table. So, the best way to accomplish adding another Fact table is to use the ALM toolkit or the SSMS to add/update the incremental policy values.  

The ALM Toolkit lets you push the changes to the target destination in Power BI service from the source Power BI Desktop environment (.pbix file). Similarly, you can use SSMS to add the incremental policy script directly to the XMLA query in SSMS and run it. This prevents overriding the initial incremental refresh policy behavior that was setup for the first Fact table. 

In the screenshot below you can see that we were able to define the incremental policies for the 2nd Fact table in the model 

Using the ALM toolkit we were able to modify the initial RangeStart date parameter to 1/2/2018 from 1/2/2021 without publishing this change to Power BI service from Power BI Desktop. 

We followed the same data load process as before and managed to load all the data into each of these partitions using the ‘Process Full’ function successfully.  

Summary

In this 3-part article series, we were able to discuss the entire process from understanding and evaluating the business requirements, choosing the appropriate storage mode type to support future scalability, to building out the entire incremental refresh on multiple fact tables. Along the way we were also able to tackle the challenges of the time-out issue and the ability to implement different incremental policies on different tables within the same data model—all without having to republish the model and risk losing all the initial design considerations. 

This implementation solved the issue of slow report rendering and slow query execution times on a dataset with multiple fact tables holding over a billion rows of data. Not only was the data model able to function faster despite of the huge dataset size, but the dataset refresh times were significantly lower. 

In the end, we had a happy client.  

Leave a Reply

Your email address will not be published.