Introduction
Recently I was consulted on an issue that featured a complicated data structure. After patiently listening to the complexities inherent in the data, I calmly said something to the effect of: “Don’t worry, I’ve got the tool for the job”. That tool, of course, is Alteryx.
This article is going to be different than most of my other articles. I explain an Alteryx workflow in a new way. I’ve created a series of seven videos that explain specific Alteryx techniques I used to solve the problems inherent in this data situation. I discuss the tool settings needed to accomplish each task. There are several new techniques I developed for this problem and these are explained in the videos. As usual with Alteryx, there are other tools/techniques that could be used to handle some of the situations I discuss in the videos.
I have previously discussed some of the methods used in the solution to the problem described herein. If you need more information or want to learn more about these approaches, here are three references to these materials.
- //datablends.us/2014/12/12/using-alteryx-to-process-non-ideal-flat-files/
- //datablends.us/2014/10/03/a-truly-unbelievable-alteryx-flat-file-reading-result/
- //datablends.us/2015/02/02/how-to-build-an-alteryx-workflow-to-visualize-data-in-tableau-part-4/
The fact that Alteryx enables me to complete such rapid prototyping of data parsing, creation, and conversion still amazes me each time I use it. The workflow described in this article took me about a day to complete.
The Alterxy Techniques
Here are the Alteryx techniques discussed in the seven videos in this article:
- Splitting a flat-file that contains variable record structures
- Setting up a flat-file configuration file
- Using the RecordID tool to help quickly interrogate data
- Using the auto field tool to help keep the data compact
- Operating on unsigned numeric fields that have implicit decimal digits
- Creating dates from a flat-file when the dates are incomplete
- Converting time-variable currencies from many countries into US dollars
The Workflow
Figures 1 and 2 show the workflow from beginning to end.
The Seven Videos That Describe The Workflow Components
Step 1: Splitting a flat-file that contains variable record structures
Step 2: Setting up a flat-file configuration file
Step 3: Using the RecordID tool to help quickly interrogate data
Step 4: Using the auto field tool to help keep the data compact
In this workflow, the use of the auto field tool isn’t technically necessary because I define the length of each field in the flat-file defintion. However, I wanted to use this tool in this example because it is very useful and I haven’t previously written about it.
Step 5: Operating on unsigned numeric fields that have implicit decimal digits
Step 6: Creating dates from a flat-file when the dates are incomplete
Step 7: Converting time-variable currencies from many countries into US dollars
Final Thoughts
This workflow processes 5.5 million Type 1 records in five minutes, or over 1.1 million records per minute. With 64 data fields per record and about 10 new fields added in the workflow, that translates into a processing rate of over 80 million data fields per minute or 1.35 million data points per second. That is the performance for the complete workflow without any optimization. This workflow was built in the minimal amount of time to accomplish a series of tasks. Its performance can be improved with additional work.
In contrast to this performance, let me offer the following. One 500,000 record flat file was processed in Excel with some custom VBAÂ code. There are 24 of these flat files that hold the 12.6 million records of data. The VBA parsing operation did not do all that was accomplished in this Alteryx workflow. Nevertheless, the Excel parsing operation took about 50 minutes for 500K records, or about 10K records per minute, or about 11,000 data points per second. To process the entire data file of Type 1 and Type 2 records would have taken about 20 hours, compared to less than 10 minutes in Alteryx.
After processing for 20 hours, there would still be many data operations left to do. Which procedure makes more sense to you? That is why Alteryx is the best tool available for these types of jobs. Of course, all of this data complexity is easily handled in Alteryx and the processed and transformed data flows seamlessly into Tableau for visual analysis. This is a great example of using the right two tools for the job.
Hi Ken,
This was a great post. It really showcases why this is an amazing tool. Your detailed step-by-step demonstrations are also wonderful for helping newbies understand how and why they would use some of the myriad functions that Alteryx provides. More please. 🙂
-Mike