Many people who do not regard themselves as Excel gurus find themselves in situations where they would like to be able to represent tabular data somehow to allow them the functionality of filtering, cross-tabulation (pivoting) and hierarchical sorting that pivot tables offer.
The Excel wizard for creating pivot tables could certainly have been more instructive, but that doesn’t mean creating a pivot table has to be hard.
An important concept to understand for creating pivot tables is the difference between data in a pivoted layout versus data in an unpivoted layout. Data in a pivoted layout is what you would commonly encounter in tabulated form in books, web sites, scientific journals, or even the calendar displayed on your computer or smartphone. It is optimised for human consumption and understanding, and a typical feature of it is that one or more categories are represented as rows, while another category (very often related to some reference to time) is represented in the column headings, with the numeric or categorical data presented in the body of the table. The following table is an example of fictitious data in a pivoted layout (not necessarily an actual pivot table):
Without knowing about pivoted vs. unpivoted layouts, people tend to create pivot tables using data in a pivoted layout as source for their pivot tables:
The problem with this approach is that it is actually not possible to swap rows with columns, and if you had hundreds of columns in the source data, it would take you nearly forever to include all the data in the pivot table. Not good. By the way, if the default aggregation that the pivot table performs is Count instead of Sum, it is an indication that you have text or error values in your source data somewhere.
There is some good news though. The same data can be presented in unpivoted form by condensing the table to contain as few columns as is absolutely necessary, while not being concerned with the number of rows. Let’s look at the first table. Unpivoting can be achieved by identifying that the columns all represent one category of data, which in this case is time. If we instead wish to represent that category as one column (MthYear), we can represent the steps conceptually as follows:
Step 1: The columns that represent the category to be transformed into one column (unpivoted) are identified in blue:
Step 2: A new column is created that will contain the old column headings as values:
Step 3: Another new column (Qty) is created that contains all the numeric data of the original table:
Step 4: The old, multiple columns disappear and are replaced by the two new columns, while the row category values are repeated for each new row created, to preserve the context of each numeric value:
All the original information is still there, the difference is just that in the unpivoted state, the table only has three columns, whereas before it had seven. It is usually harder for our brains to conceptualise the numbers than the same data in a pivoted state, but the reason for wanting to do this is because Excel actually prefers the unpivoted version as data source to the pivoted layout. Check out the video below for a demonstration of this:
With this source layout, it is very easy to swap rows with columns, and you only have three fields to include, irrespective of the time horizon your source data considers. Filtering on either rows or columns is also much easier.
There are some great videos online explaining how to unpivot data using Get&Transform, but I’ll add a video here showing how to do it for this particular case. For good measure I’ll also show a few useful data transformations and how to change the pivot table layout itself:
I hope you found this explanation useful. I certainly wish I knew about un-pivoting and how to optimise data sources for pivot tables much earlier than when I did find out, but it’s never too late to learn!