Introduction
My career is flying by like a ride on a high-speed Japanese train. For this reason, I often stop and ponder the significance of what I am accomplishing on my job. At each passenger loading/departure station, it feels like a job has ended and a new one is about to begin. In between stations, I go from zero to 200+ miles per hour in no time at all as I tackle one advanced analytics job after another. In all aspects of my life, time is passing very fast.
When you work for as long as I have, you comprehend the importance of time and how wasting time affects productivity. You can sense throughout your lifetime that time is accelerating and there isn’t an unlimited amount of time to experiment and learn how to do things in different ways.
As you age, you begin to see time as a precious resource, and you will learn to use it wisely every day. In my life, I don’t have time for playing around with a variety of software tools for the fun of it, and I don’t have time for inefficiency on my projects. I strive to be maximally efficient in all that I do.
Alteryx and Tableau allow me to use my time wisely, and in this article, I am going to explain how that is the case by showing a project example. For the past five years, I’ve been documenting my data analytics journey in this blog. As I look back upon what I have written, I can see how I have uncovered a method for working very fast, very accurately and with great capabilities that lead to incredible insights. This method uses Alteryx to create custom data sources and Tableau to visualize the results.
I think of this as RAI -> ROI. That is Rapid Analytical Insights (RAI) Leads to Return On Investments (ROI) because collecting data costs money and you get the highest returns on those investments if you rapidly uncover important and valuable insights by analyzing the data.
Background
Data is being collected everywhere. Companies want to use it to their advantage, to work more efficiently and to make more money. Just because data is being collected, it doesn’t mean that it will be easy to use or will lead to rapid strategic insights. Being able to quickly uncover the stories hidden in diverse data sets requires skill, vision and the right software.
I am just going to state what I have learned as simply as I can. Alteryx absolutely is the best platform I have ever used for rapidly preparing data for quantitative and visual analysis. Tableau is the best platform for fast visual interpretation of the data. When these two packages are used together, massive quantities of data can be explored so fast that it is extraordinary. That is the truth as I have lived it.
The Challenge I Face
One of my primary challenges in writing this blog and explaining my viewpoints has always been that I cannot show my best work. My best work occurs during my daytime hours, which is when all the work I do is proprietary. Even within the companies I have worked for, many of my colleagues have not been able to see much of my work because of project confidentiality.
This means that I just have to ask readers of these articles to believe what I write. Although I write honestly, I hope that my enthusiasm derived from using Alteryx and Tableau tells the truth of how I work and what these software packages allow us to accomplish. I can assure any potential skeptics out there that I am not trying to promote myself or achieve some other form of reward. I write these articles simply to share knowledge in hopes of making other business analysts better on their jobs.
I try as best as I can to provide techniques that readers can use for themselves, but the truth of this mission has become apparent to me over time. People will find my work when the time is right for them, and that typically occurs through an internet search on a particular topic.
For many users of Alteryx and/or Tableau, this will require more time to advance their skills before they need some of the techniques I have written about. This is true because I have more career experience doing these things than most of the Tableau and/or Alteryx users that are looking for information. For this reason, a lot of what I have written is considered a bit complicated and possibly ahead of its time. That is what I have been told, at least.
From my perspective, having information like this available is important if we want to advance what we can do with data. There are always people out there doing things that are at the cutting edge of possibility. Couple this with continuing software advancements, and it becomes very important to learn how to work with the best platforms available to achieve the highest production possible when working with data.
Why I Believe Alteryx Is The Best Platform For The Type of Work I Do
To explain how Alteryx and Tableau allow me to rapidly derive analytical insights, I’m going to share the results of an experiment that I have been conducting for over one and a half years. In some ways, what I will be showing can be considered an Alteryx benchmark study conducted over time.
An Example Workflow
About 2.5 years ago, I was asked to develop a system that could track all expenses across an organization. Employee travel expenses were one category of particular interest, although there were dozens of other types of expenses to be tracked, too.
The problem with this task was that all the data related to expenses were stored in different systems both inside and outside the company that were never designed to work together. This job became one that a detective might choose to solve. It was a trial and error process in the beginning, followed by incremental advancements over time.
Eventually, I found ways to relate the expenses to one another. I also found ways to define and apply business rules by working with the experts in the organization. This initial workflow development took several weeks to complete and verify (100-140 hours of work). For about a year, we used this system to report the expenses around the organization by using interactive Tableau dashboards.
About 1.5 years ago, a new financial system was put in place which meant that this workflow had to be rebuilt and verified. That was accomplished in about 80 hours. Since that time, this workflow has been used to produce monthly reports and interactive dashboards for this organization.
I will not specify all the details of what the Alteryx workflow accomplishes. Here are a few features of it:
- Eight different types of data files are used in the workflow
- One of the primary data files used is expense report information and is accumulating at about 200K records per month
- Five of the data files have data that changes each month, while three data files are essentially static and represent business rules and categorizations
- The workflow examines many credit card transactions and it has been used to identify potentially fraudulent charges
- The workflow give visibility into spending throughout the entire organization and the design allows spending aggregations to occur at every level of management
- The business values derived by the workflow are many and the results produced are sent to Tableau Server to power multiple Tableau dashboards for consumption by people throughout the organization
- About 170 Alteryx tools are used in the workflow
- Finally, I won’t even bother stating the savings achieved in the first year of using this system. Even I couldn’t believe it when I heard it. I’ll just say there were 8 digits in the number. That was just for North America activities. Now there is a desire for more coverage worldwide. Now that is RAI -> ROI.
The Workflow Performance
During the past 1.5 years, this workflow has been used to update the monthly dashboards. I will not be able to show the dashboards. However, I have recorded the time required for the workflow to run and the number of output records processed.
Figure 1 contains the results of this work (click on Figure 1 link to download the 4K graphic). The number of output expense records is shown as the reddish rising line from left to right and is now over 18M records per month (see column Full LOD – level of detail). The workflow output performance in terms of records per minute (rpm) is shown along the top of the graphic. The orange bars (1,000,000 rpm) occur when the Tableau Server writes are disabled, and the blue bars (500,000 rpm) are when the Two Tableau Server (*.tde) files are written. The size of the primary *.tde file is shown in the column called Avg Monthly Record and it is now over 11M records.
What intrigued me about the results of this study can be summarized in one sentence. Although the incoming expense data grows at about 200K records per month across many tens of thousands of employees (and the four other types of input files were also growing proportionately), the computational performance of the workflow remained steady (i.e. linear) over time. I didn’t know whether this would be the case, given the complexity of the operations completed and variability of incoming data over time. Intuitively, I expected to see some performance degradation because more files are being handled, more variability in data is occurring, and the changes in the managerial structure of the company are ongoing.
To summarize, if I write the results to Tableau server, the workflow outputs data at about 500,000 records per minute. If I write the results locally, the workflow outputs the data at about 1,000,000 records per minute. Each record contains 77 fields, so this is a fairly staggering 38.5 to 77M data fields per minute. In other words, the workflow performance in terms of records processed per minute is independent of the size of the incoming data. That is a great result in itself in addition to the blazing speed of operations.
What These Results Mean To Me
These results mean to me is that Alteryx does a great job with memory management and that the workflow performance is not being degraded by rapidly increasing data size. For every new month of data, it takes about 2 minutes longer to run the workflow because 200K of new expense records are entering the workflow.
There has been no discernible decrease in performance (records per minute processed) as the amount of information being processed is increasing. This workflow has been perfect in executing the work, it has been flexible in allowing me to make some recent changes, and it has been so easy to use because of the direct writing to Tableau Server.
The recent changes I have made to the workflow have changed this application from a Type 1 to a Type 2 reporting system. The workflow performance jumped to nearly 4 million records per minute for local file writing (another article will be needed on that topic!), and nearly 1 million records per minute for Tableau Server writes. I was able to make this change with a few hours of work and testing. Once again, the versatility of Alteryx continues to impress me.
Typically I spend about 2 hours per month preparing the data, running the workflow, and checking the results on Tableau Server. That is a very efficient use of my time to create a very valuable data product that didn’t exist before Alteryx and Tableau were used to create it. That is how I achieve rapid analytical insights using Alteryx and Tableau.
Filling a Niche In a Large Enterprise Setting
In a recent podcast, I discussed an aspect of Alteryx that I believe is under-appreciated. I see very clearly that Alteryx is a great tool for rapidly building prototypes and performing proof of concept work in a large enterprise setting.
For many of these cases, such as this proof of concept project, Alteryx can fill a void in a large enterprise setting by allowing teams to rapidly create workflows that can do a significant amount of very valuable work. The development time needed for these applications is typically much smaller than the time required by traditional development teams. To demonstrate what I mean, I offer the following details.
The second version of this workflow initially took 80 to 100 hours for me to build and verify. On two different occasions, we tried to “productionalize” this approach by using enterprise tools and traditional development teams. The first attempt used Oracle SQL and the development team worked for about 6 months on the project. There were numerous problems encountered along the way. Although we eventually developed a product, it never reached a completed, productionalized version.
The second time we revisited this workflow many months later, a larger team was assembled to produce time estimates for creating a “productionalized” application. The total time needed for completion was estimated between 1200 and 1500 hours. Due to staff constraints and other issues, this project attempt never began and the Alteryx workflow continues to be used.
I suspect that Alteryx could be very heavily used in large enterprise settings for applications like this one. The way I see it, Alteryx can fill a void that may exist in large enterprise settings. Rapid prototyping can be used to build valuable applications that can be deployed on Alteryx servers to service the needs of a lot of projects.
Thanks for reading.
Thanks for the insights, and you’ve given me some thoughts on how to justify Alteryx for our group. I’m using a trial version of Alteryx now to blend 7 files from 3 different data sources for rapid prototyping. This project has the potential to save millions of dollars.
Gary,
Good luck and let me know how I can help.
Ken
Great post Ken! You are spot on with your comments about IT attempting to ‘productionalize’ an Alteryx workflow. It’s amazing how long it takes them with Enterprise tools. It really does make appreciate Alteryx and build an ROI case.
~ Nathan
Nathan,
It really shouldn’t be a surprise to anyone that has written computer software that it takes 10 to 20X longer to productionalize a workflow using a language like SQL. The big advantage that we have is that Ned Harding has spent the past 2 decades, as well as all the other developers at Alteryx, writing reusable, highly optimized and efficient “tools” for us to use. These tools represent thousands of lines of code each (millions in total), and they are designed to not only work together, but to do so to make our data processing jobs easier. Alteryx takes care of all the dirty work, like allocating and deallocating memory, error handling, object definitions and instantiations, and all the other time-sucking things we have to do as programmers. I hope the day never comes where I have to go back to writing codes like I have done over the past 40 years! Long-live Alteryx!
Thanks and keep up your tremendous work!
Ken