Introduction
The title of this post is non-specific and can be conceptually very big because I use Alteryx in so many ways. This concept can represent potentially hundreds of approaches that I routinely use. Less than two years ago, I wouldn’t have been able to write an article like this because I did not know the great things that Alteryx can do to data.
In this article, I cover a couple of topics. First, I show how I create synthetic data for blog post publications using real-world information. I am always careful to disguise data so that it is not possible to identify the source of the information or to draw conclusions on the data shown. However, to write blog posts that demonstrate techniques like I do here, I need data. I use Alteryx to create synthetic data that inherits data structures but does not contain real-world, identifiable data.
Secondly, I show how I use Alteryx to handle one type of data complexity, rather than having Tableau do it for me. I really enjoy how easily Alteryx handles data complexity, which makes creating Tableau workbooks fast and efficient.
In this article, I happened to choose a common Tableau topic of interest from my recent work. This topic described is a “Top N and Others” approach for shrinking a long list of information down to a more manageable size. This is a very common Tableau technique that is useful in many different ways.
Part 1: Creating Synthetic Data
The idea that I show here is how I create a synthetic data set from a real-world data set without publishing the original information. Many times I need to have the complexity of a real-world data set so that I can demonstrate the techniques I want to show. The best way to capture this complexity that I have found it to take an original data source and modify it so that the data itself is no longer recognizable or traceable to the original data. Figure 1 shows an example of a simple workflow that I use to create synthetic data.
The two keys to this workflow includes:
- Using the random sampling tool for capturing a portion of the original data set (Figure 2);
- Using a formula for randomly changing the data you will be using in your analysis (Figure 3).
Figure 2 shows the settings I use to generate a random sample of my data. This tool produces a random 1 in 10 chance for each record being selected from my data set. This is a great setting to have at your fingertips.
Figure 3 shows how I use the rand() function to change the value of the numbers I’m going to disguise. The rand() function creates a number between 0 and 1, such as 0.342. By applying this random number as a multiplier, I change the original data so that it is no longer recognizable.
I also discuss these topics in the video shown at the end of this article.
Part 2: Using Alteryx to Replace The Tableau “Top N and Others” Approach
There are several approaches that can be used in Tableau to collapse long lists of information down into shorter lists. This is what I call the general category of “Top N and Others”. There have been many articles written about ways of doing this in Tableau. If you are interested in seeing what these techniques include, do a search of “Top N and Others” to find the articles or click on the link I gave you in the first sentence of this paragraph.
In the video shown below, I show how I move the complexity that is inherent in many of the “Top N and Others” approaches and place it into Alteryx. My objective in doing this is to speed up my production work in Tableau so that I can rapidly create graphics like the one shown in Figure 4.
The Technique Video
The following video demonstrates how I create the synthetic data using Alteryx. Once I have that data to work with, I show how I use Alteryx to replace the Tableau “Top N and Others” approach.
By shifting the computations from Tableau to Alteryx, I am able to complete the work with less set-up time. This makes me more efficient in my work and it makes my Tableau workbooks easier to maintain and use. This type of approach is especially good if you have several of these types of analyses to do in a big data set. Even if the data set changes, each time I run the Alteryx workflow, the approach produces what I need.
Final Thoughts
After nearly 30 years of writing custom computer codes to do data manipulation, Alteryx has become the most essential tool I use for getting work done. Two years ago, I couldn’t have predicted that this would be the case.
Alteryx is simply an amazing platform built from a massive collection of highly-optimized computer routines. The programmers at Alteryx are simply world-class and have a vision that is going to stretch far into the future. The Alteryx development team members are relentless in their pursuit of excellence.
The most amazing thing to me: Alteryx has NEVER crashed on me. Thousands of uses, NO crashes. Perfect execution. How many complex computer codes have you used that have had a perfect record of execution? This is one reason I have learned to depend upon Alteryx to solve the most vexing business and science problems I encounter.