Power Query is a data preparation and transformation technology. This technology is found in many Microsoft products such as Analysis Services, Excel, Power BI, among others.
Power Query can be thought of as the extract, transform, and load (ETL) functionality within these Microsoft products, and has a wide range of options to achieve the data preparation and transformation goals needed for analysis.
In this article we will discuss the importance of Power Query optimization best practices and why these are best practices.
We will specifically explore Power Query within Power BI. Given that data preparation typically takes up a significant portion of the time dedicated to analysis, Power Query emerges as an immensely powerful component of Power BI.
Optimization is the action or process of making the most effective use of a technology or of a situation.
In Power BI, it is important to optimize whenever possible, whether it is during Power Query, DAX, or creating visualizations. By doing so, performance gains can be made, and performance issues can be minimized.
There are several best practices that should be kept in mind when optimizing Power Query. The following best practices will be discussed.
Microsoft recommends filtering data as early as possible. But why?
There are several benefits of this which will be discussed below.
Data that has been connected to via Power Query is displayed as queries. The data within these queries can be previewed. By filtering out data that is not needed, only relevant data will be pulled into the underlying data model and displayed in the preview of Power Query, which can increase productivity and reduce distraction.
When using import mode while connecting to data, data is saved within the application’s internal memory. By filtering out data early in the process, whether it is via the data connector (i.e., using a custom SQL query rather than selecting all data) or by filtering out the data via Power Query, the amount of data consumed is less. This results in a smaller pbix file, as the data for the report ends up embedded in the file.
If too much data is consumed, especially data that is not needed, this can cause performance issues later in the process, mainly due to the amount of data that visuals or DAX must go through.
Having less data in the model not only decreases the overall size of the model when using import mode, it also results in faster data refreshes. This is due to putting less pressure on machine or Power BI resources.
Query folding allows for a query to generate a single, simpler and less computationally expensive query statement. This query statement is used to retrieve and transform source data. When filtering early, some connectors will take advantage of the filters used when applying query folding.
The ability to query fold reduces the amount of data that is transferred and processed since filtering is being done at retrieval time. This increases the productivity of retrieving data and the overall performance of the Power BI report.
There are numerous data connectors that Power Query provides. These data connectors range from connecting to different databases such as Oracle, SQL Server and PostgreSQL, file sources like Excel, XML, and CSV and others like a SharePoint list and Python.
Choosing the best fit connector for the data source being used can result in performance benefits, and can enable Power BI to take advantage of the previously discussed query folding feature.
An example of a connector that can improve experience and performance is the SQL Server connector. When connecting to an SQL Server database, you can connect via different connectors such as the SQL Server connector or an ODBC connector.
It is best to use a SQL Server connector in this case because it offers features like query folding that the ODBC connector does not have.
When certain operations are done within Power Query, a full scan of the data is required for the data to be displayed correctly within the Power Query preview. One of these operations is sort. If the unsorted data is in a very different order than the sort operation will produce, it could take a while for the sort to properly take place and the query results to display.
Operations like filtering operate over data through a streaming process rather than having to go through the entire data set. As a result, data in the preview displays faster since it is displayed as the data is streamed.
The reason performing expensive operations last results in a performance gain is because you can minimize the amount of time the data preview renders when adding new steps. This allows for better performance and productivity.
In most report applications and programming, it is important to use the correct data type.
Within Power BI, the reason for this is that Power Query provides several features that are based on specific data types. For instance, date type columns allow for date and time column options under the “Add Column” menu. If the column is not set to a date data type, these options will not be available.
The same happens when filtering in Power Query. There are additional options when filtering on a column that has a data type of date that would not be available if it were a different data type. Examples of these options are “Is Earliest”, “Between”, etc.
Sometimes, a specific query can have numerous steps. These steps can become confusing, especially if there are numerous calculations and/or transformations being done. It is a good practice to split up your query if there are a lot of steps, to simplify it.
In Power Query, you can split one query into multiple queries. The queries can reference each other.
Following this approach will allow for better understanding of your queries by yourself and others who may be working within the report.
It is important when creating queries to keep them flexible and dynamic. One way to achieve this is through parameters. A parameter in Power BI allows for storing and managing values that we can use and re-use in our queries. It is a good practice to use parameters when possible. The following describes the benefits of using parameters.
All parameters created can be found in the “Manage Parameters” window. This is a centralized view that can help you access all your parameters in one place, making it easy for Power BI developers to find and alter parameter values.
Parameters can be referenced in other queries or steps. If the parameter changes, so do the queries that reference the parameter. For instance, if your steps reference a particular server and you need to connect to a different server, this will prevent you having to go into each step and changing the server's name. Not only will this result in less work when a change is needed, but it will also prevent issues.
Sometimes, changes that happen to the structure of the source data result in unintended changes to the underlying model in a Power BI report.
Say for example you are connected to a SQL Server Database, and only certain columns from a particular table are needed for your query. When working in Power Query you decide to just remove the columns that you do not need. Later, there are modifications to the SQL Server Database table and additional columns are added. Now your Power Query is pulling in these new columns.
By not preventing the above, now your data model has increased and possibly your performance has decreased. It is important to prevent situations like this for this reason. Power Query has several options to help prevent this such as the “Choose Columns” and “Remove Other Columns” features.
In Power Query, you have the option to disable loading of data for a query or queries each time the model is refreshed. There are certain instances where you should disable the load.
Queries that do not change often or that support data integration should be disabled. The reason for this is because the more queries that need to be loaded into your model when refreshed, the longer the refresh takes.
If a table contains a date or datetime column, a hidden auto date/time table is created by Power BI. This hidden table is created to help when configuring filters, drill-down actions, and grouping for date periods.
If there are several date columns in tables, multiple hidden auto date/time tables are created. These tables will increase the size of the model. It is a good practice to disable the Auto date/time option in Power BI Desktop so that your model does not increase in size and cause performance issues.
Power Query is a powerful tool enabling Power BI developers to effectively import and transform data.
Following the best practices described in this article will help you produce more efficient reports with appropriate model sizes while avoiding slow performance and improving your productivity.
If you’re seeking expert help with unlocking and understanding a specific aspect of Power BI, our business intelligence consultants are ready to show you how.
Our professionals can provide expert assistance in various services, such as planning analytics consulting, integration, implementation, dashboard design, QA, and training.
Feel free to get in touch with us.
Power Query is a data connection technology that enables users to discover, connect, combine, and refine data across a wide variety of sources. It is primarily used for data transformation and preparation tasks in the context of business intelligence and data analysis.
Power Query refers to both a technology and a tool in the Microsoft Power BI and Excel ecosystem. As a technology, it allows users to connect to various data sources, transform and shape data, and load it into their desired destination for analysis. As a tool, Power Query provides a graphical interface within Excel and Power BI for users to perform these data transformation tasks.
No, Power Query is not the same as SQL. SQL (Structured Query Language) is a programming language used for managing and manipulating relational databases. Power Query, on the other hand, is a tool used for connecting to and transforming data from various sources, not limited to databases, and it is often used in conjunction with Excel and Power BI.
Power Query is not the same as Excel but is a feature integrated into Excel. Excel is a spreadsheet application, while Power Query is a tool within Excel (and also in Power BI) that allows users to import, transform, and manipulate data from various sources before loading it into Excel for analysis.
Power Query is used for data transformation and preparation tasks. It helps users connect to diverse data sources, clean and shape the data, combine multiple sources, and load the refined data into Excel or Power BI for further analysis and reporting.
Power Query is particularly useful when dealing with data from multiple sources, handling large datasets, or when extensive data transformation and cleaning are required before analysis. It is beneficial for users who want to streamline and automate the process of importing and preparing data for reporting and analytics.
Yes, Power Query can be considered an ETL (Extract, Transform, Load) tool. It allows users to extract data from various sources, apply transformations to clean and reshape the data, and then load it into a destination for analysis. The ETL capabilities of Power Query make it a valuable tool for data integration and preparation.
With Power Query, you can:
Power Query Editor is the interface within Power BI and Excel where users can visually build and apply data transformations to their datasets. It provides a user-friendly environment for cleaning, shaping, and manipulating data before it is loaded into the analysis tool. Power Query Editor allows users to see a preview of their data at each transformation step and provides a range of tools for efficient data preparation.
Have a project in mind? No need to be shy, drop us a note and tell us how we can help realise your vision.