Introduction
One of the most popular topics I have written about in this blog is a method I call “Tableau buckets”. I have been very surprised by the number of people that have found this concept to be useful. For this reason, I am going to extend this “bucket” concept to Alteryx workflows.
In this article, I demonstrate an example of how you can create these useful “bucket” fields within your Alteryx data-processing workflows. By doing this, you can use these buckets when you visualize and analyze your data in Tableau. This approach is especially useful if you are processing big data sets in your Alteryx workflow, as I show with an example.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
What is a Bucket?
A bucket can represent a period of time, a grouping of items, or just a temporary placeholder to store some things for an intermediate calculation. A bucket allows you go group things in logical fashion so that you can compare results between the buckets.
To learn more about how to create and use these in Tableau, you can click here to read my original article. To see another specific example of buckets for comparing weekday and weekend data, click here. Buckets can help you to be very creative when displaying and analyzing your data in Tableau. In Tableau, buckets are created using calculated fields. In Alteryx, buckets are created using the formula tool.
Installing Buckets in Alteryx
In this example, I create a series of buckets based on time. The motivation for these buckets has to do with the longevity of the data I used, as you will soon understand.
The data set I am using contains potentially a few billion records. This data represents climate readings from all across the world, from the mid 1700’s to the present day. There are approximately 100,000 monitoring stations from which the data has been recorded. There are monitoring stations that have over 175 years worth of data and others that have only relatively recent data. To learn more about this data and the program that creates and maintains it, click here.
For me to better understand how the climactic data is changing over time, I have decided to develop buckets based on time. Since the longest time aggregation currently available in Tableau is 1 year, I will be developing buckets to contain data from 10 years, or a decade. In this application I also develop buckets for centuries (100 years) but in the example I demonstrate in this article they are not as useful as the decade buckets.
The Decade Bucket
Click here to download my Excel worksheet for the code to create a decade bucket. Figure 1 shows what this code looks like when installed in an Alteryx workflow via the formula tool. Essentially, the decades get set using a formula tool that has two features. First, the decade gets set based on a DateTimeParse operation of a string-based date (like “1/1/1780”). These string-based dates are strategically picked to create your bucket. Secondly, an if-then-else logic block is used to identify the correct decade by doing a comparison between the measurement date and the string-based “bucket” date. A new field called “Decade” gets created by Alteryx for every climate measurement at every monitoring point. This formulation is not necessarily optimized and could likely be created another way, but for my purposes, it works just fine!
An Update to the Decade Bucket After Original Publication
If you look down in the comments section, you will see that my buddy Chris Love made a suggestion for a simpler decade bucket formulation than what was shown in Figure 1. After thinking about his suggestion, I simplified it further to a one-line formula that is fast and elegant as shown in Figure 1b. Thanks to Chris for his suggestion!
Using the Decade Bucket
Figure 2 shows the location of a randomly-selected climate monitoring station (USW00093822) that is operated at the airport in the northwestern quadrant of Springfield, Illinois. This station is titled the Springfield Capital AP.
This monitoring station has climate data from the early 1900’s to the present (12 decades of data). One of the questions I was interested in answering is related to precipitation. I wanted to visualize how precipitation might be changing over time at this monitoring station. To do so, I used Tableau to create time series precipitation charts to answer my question. The time series charts were created after the precipitation data was processed in Alteryx.
Since I had to establish a significantly-complex Alteryx workflow (to be shown in another blog post) to process the climate data, I began my analysis by extracting data from a completely different data source than the one I was using. I wanted to perform some quality assurance checking of the data that I had processed before drawing any conclusions.
As shown in Figure 3, the precipitation data for Springfield, Illinois can be easily obtained by clicking here and visiting US Climate Data website.
The graph in Figure 3 shows how precipitation in Springfield varies from month to month, across an average year. I wanted to determine two things: (1) how the total precipitation (average of 37.44 inches) might be changing over time and (2) if this monthly precipitation distribution is changing over time.
To answer question #1, I used the data available in the world-wide climate database that is available by clicking here. Once I had this data, I processed the information for the monitoring station shown in Figure 2 using my Alteryx workflow. In this workflow, I installed the decade bucket and generated Tableau *.tde files for rapid analysis of the precipitation data.
Figures 4 through 6 show the precipitation data on a daily, monthly and annual basis. These time series charts to NOT allow me to understand how precipitation might be varying over time in Springfield. There are no discernible patterns that I can detect with such an approach, and a 1-year time aggregation is too short for me to be able to draw any conclusions. The only conclusion that I can make is that the long-term average shown in Figure 6 of 35.4 inches of precipitation is less than the 37.4 inches shown in Figure 3 from the other data source.
By using my decade buckets as shown in Figure 7, I could visualize how precipitation is changing at the monitoring station over time. This simple insight of using a 10-year time aggregation in Tableau allows me to answer one of the the questions I original pondered.
From the 1950’s to the 2010’s, two things have happened to precipitation at this monitoring station. First, the total amount of annual precipitation has been rising. Secondly, the annual average precipitation has been steadily rising. These two trends were not clearly visible without the use of the decade buckets.
To answer question #2 of how the monthly distribution of precipitation has been changing, the decade buckets were again used to visualize over 120 years of data. Figure 8 shows the average monthly rainfall totals within each decade. Each month represents the 10-year average precipitation for that month within that decade. The most obvious insight from this Figure is that peak monthly rainfall is increasing in the spring months from the 1990’s through the 2010’s. There is an increasing maximum monthly precipitation being experienced in Springfield and an increase in the annual total precipitation.
Final Thoughts
All software programs have limits. Tableau has limits in time aggregations. To extend these limits, you can design and build your own aggregation levels using buckets, like I have shown here with the decade buckets. Alteryx and Tableau both allow you to install buckets very easily.
There are so many potential applications of bucket concepts that I could write a book on how to use them. You can create buckets for winter, spring, summer and fall time periods, for example. You can have time-varying buckets. The potential uses of buckets are endless, much like Alteryx and Tableau. The only thing that might be limiting your ability to solve problems with these tools is your own ingenuity and creativity, so “think outside of the box” to unlock insights hidden in data by using concepts like “buckets”.
Hey Ken, great post – try this as a quick improvement to your formula:
tostring(round(tonumber(datetimeformat([Order Date],’%Y’)),10))+”‘s”
Hi Chris,
In my article, I wrote that the decade bucket wasn’t optimized. You gave me another formula to try, I looked at it and realized there was a problem with the rounding function.
Once I did that, I thought of the most simple solution possible using the basic construct that you gave me. Here it is:
left(datetimeformat([Date],’%Y’),3)+”0’s”
All you do is take the first three digits of the year and add a “0’s” to it to form the decade. The power of collaboration at work!
Thanks,
Ken
Pingback: Using #Alteryx To Process Non-Ideal Flat-Files | 3danim8's Blog
Pingback: How To Build An #Alteryx Workflow to Visualize Data in #Tableau, Part 4 | 3danim8's Blog