Power Query - powertothemasses.com

Seasonality is an important phenomenon to consider for many businesses, and in the context of this post refers not only to seasons in terms of Winter, Summer, and so on, but will also consider how you can report on business activity in terms of custom-defined seasons.

To state the obvious, an ice cream shop might sell less ice cream during winter than it sells during summer. Clothing, cycling gear and even chocolate are also products that have seasonal elements in their trade. There are countless more examples, but the point is that you can plan your sourcing and/or manufacturing activity in accordance with seasonal demand. Doing so might have some significant financial and logistical implications too – why would you want to have the previous season’s stock, taking up room (and tying up money) in a shop or warehouse where you need space for the current season’s product? Does the product have a short shelf life, or is it sensitive to temperature fluctuations from one season to the next, in which case you really only want to buy enough stock or raw materials for the relevant season.

You may even decide that you want the amount of safety stock you hold to depend on the season, or schedule promotions by season.

DEFINING THE SEASONS

Central to your ability to analyse your data on a seasonal basis, is the definition of the seasons you choose to adopt. For instance, do you have only two seasons (busy and not busy), three, four or sixteen seasons? This step is the most important to nut out before continuing.

Let’s use an example where we have four seasons, as defined for Australia in general:

Summer: December to February
Autumn: March to May
Winter: June to August
Spring: September to November

Next is to bring those seasons into your calendar table. Depending on whether you used Get & Transform (Power Query) or Power Pivot to construct your Calendar table, the syntax used for doing this would vary accordingly.

Let’s assume you constructed one using the Calendar construction method described here, starting at 2015.

A Switch(True(),…) calculated column can be used to identify the season of interest:

Season =
SWITCH (
    TRUE (),
'Calendar'[Month Number] = 12
|| 'Calendar'[Month Number] <= 2, "Summer",
'Calendar'[Month Number] <= 5, "Autumn",
'Calendar'[Month Number] <= 8, "Winter",
    "Spring"
)

ADD A SEASON INDEX

Next we want to add a counter, or index to the Season, but if we just count the seasons as defined above, we’ll only get a maximum of 4. Perhaps we can combine the Season with the Year value, and therefore get a unique season for each year? Not a bad guess, but there’s a complicating factor here.

The problem with how our seasons are defined, is that the same Summer always spans across two different Year values:

If we simply combined Season with Year to obtain our unique index for each season as follows:

SeasonYear = 'Calendar'[Season] &" "&'Calendar'[Year]

we would get the same index value for Summer 1 (Jan & Feb) and Summer 2 the following December, which falls in the same year. Our index values would effectively mix Summer 1 with Summer 2, Summer 2 with Summer 3, and so on. Not good.

How then, do we get the appropriate index number?

My preferred way is to consider again the definition of the seasons, as well as the starting year of the calendar. The first year in my calendar is 2015, so by varying the number I subtract from the year value depending on the month number, I can ensure that Summer 2 in Dec 2015 yields the value “Summer 2” and Summer 1 in 2015 yields the value “Summer 1” with the following formula:

SeasonNumber =
'Calendar'[Season] & " "
    & SWITCH (
        TRUE (),
        'Calendar'[Month Number] >= 12, 
        'Calendar'[Year] - 2013,
        'Calendar'[Year] - 2014
    )

So when I consider year 2015, I subtract one less year from 2015 in December, making the index value one larger than that for any other in 2015.

The numeric counter, or season index is then calculated, referring to the SeasonNumber column as the value we wish to count:

SeasonIndex =
CALCULATE (
    DISTINCTCOUNT ( 'Calendar'[SeasonNumber] ),
    FILTER ( 'Calendar', [Date] <= EARLIER ( 'Calendar'[Date] ) )
)

The SeasonIndex value provides you with a unique numeric value for each chronological Season in your calendar.

“Why do I need a Season index?”, you may ask. Well, having a numeric index for the season allows us to do some clever stuff in analysing sales (or budgets), such as calculating the average product sales for a particular season in a particular year. Calculating just the average for a particular year, or the average for one of our four seasons (by implication all years), would give us quite a different result.

Your calendar should look something like this now (I deleted day of week detail and hid the columns I don’t want to be available in my Pivot table fields):

Remember to sort the Month column by the Month Number and the FinMonth column by the FinMthNum, otherwise your graphs and pivot tables will sort months alphabetically.

EVALUATE BY SEASON

Let’s imagine you have a table of product quantity per month, for a period spanning a few years. The quantity can be a mixture of sales and budget figures, depending on timing:

…

I imported the numbers in the table to Power Pivot using Get&Transform (Power Query), where I converted the Yearmonth values to a date field. This allows me to create a relationship to my Calendar using the Date column. After that I can report on the sales by season, as it is already in my Calendar:

AVERAGE QTY PER SEASON

Say for instance I wanted to know what the average sales per season was, superimposed on my existing pivot chart. Hint: Adding a measure that is just the Average( ) of the sales qty to the chart won’t look any different to our existing ProductQty measure using Sum( ) as aggregator, because the evaluation context that the pivot table provides is too granular to see the effect of average aggregation. The desired measure needs to delve a little deeper. Enter AVERAGEX:

SeasonAvgProdQty =
AVERAGEX (
    FILTER (
        ALL ( 'Calendar' ),
        'Calendar'[Seasonindex] = MAX ( 'Calendar'[Seasonindex] )
    ),
    [ProductQty]
)

Adding this to the pivot chart (with some formatting to highlight the average values and diminish the monthly values) results in the following:

See how useful that Season index has become? The same measure can be added to a pivot table, of course, if you wanted to see the actual values.

So what else could you use the seasonal average value for? Well, how about that safety stock buffer we talked about earlier? You might have a view that you always want to maintain a safety stock buffer of ten percent above the average seasonal budget.

SAFETY STOCK LINKED TO SEASONAL AVERAGES

You can probably already guess how this measure will look:

SafetyStock=
1.1* [SeasonAvgProdQty]

Predictably, it will just always be 10% higher than the seasonal average value.

Depending on your data, you might decide that working with four seasons provides too much granularity. In the example above, I would be tempted to combine autumn with winter, and spring with summer, resulting in a Warm and a Cold, or perhaps a Dry and Wet season. This would require a redefinition of the previous seasonal split:

Season =
SWITCH (
    TRUE (),
    'Calendar'[Month Number] >= 9
        || 'Calendar'[Month Number] <= 2, "Warm",
    "Cold"
)

We would also need to amend the SeasonNumber definition, as the season change where we span across two distinct Year values now starts in month 9 instead of month 12:

SeasonNumber =
'Calendar'[Season] & " "
    & SWITCH (
        TRUE (),
        'Calendar'[Month Number] >= 9,
        'Calendar'[Year] - 2013,
        'Calendar'[Year] - 2014
    )

This results in the following average spread:

Hopefully you’ve learned something, let me know in the comments if you found it useful, and feel free to share with your fellow Power Pivot fans.

Many people who do not regard themselves as Excel gurus find themselves in situations where they would like to be able to represent tabular data somehow to allow them the functionality of filtering, cross-tabulation (pivoting) and hierarchical sorting that pivot tables offer.

The Excel wizard for creating pivot tables could certainly have been more instructive, but that doesn’t mean creating a pivot table has to be hard.

An important concept to understand for creating pivot tables is the difference between data in a pivoted layout versus data in an unpivoted layout. Data in a pivoted layout is what you would commonly encounter in tabulated form in books, web sites, scientific journals, or even the calendar displayed on your computer or smartphone. It is optimised for human consumption and understanding, and a typical feature of it is that one or more categories are represented as rows, while another category (very often related to some reference to time) is represented in the column headings, with the numeric or categorical data presented in the body of the table. The following table is an example of fictitious data in a pivoted layout (not necessarily an actual pivot table):

Without knowing about pivoted vs. unpivoted layouts, people tend to create pivot tables using data in a pivoted layout as source for their pivot tables:

The problem with this approach is that it is actually not possible to swap rows with columns, and if you had hundreds of columns in the source data, it would take you nearly forever to include all the data in the pivot table. Not good. By the way, if the default aggregation that the pivot table performs is Count instead of Sum, it is an indication that you have text or error values in your source data somewhere.

There is some good news though. The same data can be presented in unpivoted form by condensing the table to contain as few columns as is absolutely necessary, while not being concerned with the number of rows. Let’s look at the first table. Unpivoting can be achieved by identifying that the columns all represent one category of data, which in this case is time. If we instead wish to represent that category as one column (MthYear), we can represent the steps conceptually as follows:

Step 1: The columns that represent the category to be transformed into one column (unpivoted) are identified in blue:

Step 2: A new column is created that will contain the old column headings as values:

Step 3: Another new column (Qty) is created that contains all the numeric data of the original table:

Step 4: The old, multiple columns disappear and are replaced by the two new columns, while the row category values are repeated for each new row created, to preserve the context of each numeric value:

All the original information is still there, the difference is just that in the unpivoted state, the table only has three columns, whereas before it had seven. It is usually harder for our brains to conceptualise the numbers than the same data in a pivoted state, but the reason for wanting to do this is because Excel actually prefers the unpivoted version as data source to the pivoted layout. Check out the video below for a demonstration of this:

With this source layout, it is very easy to swap rows with columns, and you only have three fields to include, irrespective of the time horizon your source data considers. Filtering on either rows or columns is also much easier.

There are some great videos online explaining how to unpivot data using Get&Transform, but I’ll add a video here showing how to do it for this particular case. For good measure I’ll also show a few useful data transformations and how to change the pivot table layout itself:

I hope you found this explanation useful. I certainly wish I knew about un-pivoting and how to optimise data sources for pivot tables much earlier than when I did find out, but it’s never too late to learn!

Many companies operate across financial years rather than calendar years, and still others operate across a 4-4-5 calendar or some variation thereof.

This post will talk about reporting numbers across financial years – that is, starting on the 1st of July and finishing on the 30th of June.

You may be familiar with some of DAX’s inbuilt time intelligence formulas, such as TOTALMTD, DATESYTD and so on, but alas, there aren’t similar ones available for FYTD (Financial Year to Date), or 4-5-5 calendars. Perhaps one day they will become available, but until then we have to rely on custom time intelligence formulas, or some manipulation of the inbuilt formulas.

The other significant drawback to inbuilt formulas is that your data has to have daily granularity. This may simply not be available, in which case you have to group data to coincide with the last day of the week or month, or quarter, etc. Or, if you have millions of rows of data and you don’t need to evaluate things at daily granularity, you may just have to suffer the resulting performance hit.

The good thing about using a custom time intelligence formula is that the structure of the formula is quite powerful and adaptable to a variety of different scenarios (whether you want to report on financial years, quarters, financial year quarters, weeks, arbitrarily defined seasons, or a combination of the aforementioned). Your data can also be at any granularity, and you get to develop a much deeper understanding of how the result is obtained, making it possible to tailor formulas to give you exactly what you’re after.

The basic structure of a custom time intelligence formula has been covered extensively by others, and I also like Matt’s explanation here. They can seem a bit daunting to write at first, but once you’ve done it a few times, the penny drops and it becomes quite easy. I strongly recommend you read their posts for a full understanding of how the formula works, but that’s not the aim of this post.

I’ll do a few examples at different levels of granularity to illustrate.

Assuming you have some data at daily granularity, you might have defined a measure aggregating your total sales:

Total Sales =
SUM ( Salestable[SalesQty] )

You would then have to create a Financial Year (text format) column and a Financial Year Index column (numeric format) in your Calendar, assuming it’s actually called “Calendar” (you can follow instructions on how to do that here), and once you have those, you can define the FYTD sales formula as follows:

Total Sales =
SUM ( Salestable[SalesQty] )

Looking at the formula, you might realise why it is necessary to have a numeric index column for FinYear. Hint: It’s because of the MAX function, which only accepts numeric data. DAX doesn’t know that the text “2017/18” represents a greater value than “2016/17”, which is why the numeric Index value is needed.

But there is also an alternative option you could use if your data is at daily granularity, which doesn’t require the custom time intelligence formula or an index column. In fact, it relies on one of the built-in time intelligence aggregation functions, where you can specify the year end date as a text string:

Sales FYTD =
CALCULATE (
[Total Sales],
    FILTER (
        ALL ( ‘Calendar’ ),
‘Calendar'[FinYearIndex] = MAX ( ‘Calendar'[FinYearIndex] )
&& ‘Calendar'[Date] <= MAX ( ‘Calendar'[Date] )
    )
)

)

If you left out the optional “June 30” part of the formula, it would revert to the default, which is December 31. Of course you could type any end date here if you wish, and the formula should still work.

Now let’s consider data at monthly granularity. Say you have sales data aggregated by Yearmonth (format YYYYMM, but make sure it’s formatted as a number) . The structure of the formula is exactly the same:

Sales FYTD =
CALCULATE (
[Total Sales],
    DATESYTD ( ‘Calendar'[DATE], “June 30” ),
    ALL ( ‘Calendar’ )
)

If your aggregation time interval is specified as text, you may need to create a numeric index column for it first.

So there you have it, a few formulas for aggregating data across financial years, allowing for different levels of granularity.