Introduction
For four months (October 2013 to January 2014) I did a job that was tough. It was tedious, time consuming, and it caused Tableau to buckle under the pressure of multiple data sources, multiple data joins, and difficulties with data formats.
That four-month grind taught me a lot about working with big data files that originate from multiple database tables. The holder of this information had multiple data warehouses that held the information from over 40 years of activities. There was no way that they could blend the data into one table for me because these warehouses were separate entities. I had to do the dirty work myself.
Luckily I had enough experience to complete the operations using regular expressions and other manually-intensive data manipulations. When I finished grinding through the files, I distinctly remember being glad when that job was over. Jobs like that will test your patience and tend to drain you of energy.
The Problems
This job featured numerous data problems. Dates were not stored as formatted dates. Fields had leading zeros, including the key identifying field. Each database came without headers. The headers were provided in another file and they did not align with the data sources (there were extra fields!). The client would periodically change the key identifying field for certain individuals without telling us! There were files that had hundreds of thousands of records and nearly 100 columns of information. There were about a dozen of such files, each of which contained key strategic information for the project.
After working on the data for what seemed like weeks, I finally was able to begin processing in Tableau. That is when all hell broke loose. Left joins, right joins, and inner joins were conducted to gain the insights we needed. A lot of time was spent doing data joins across all of those files. Multiple level of joins were attempted. Data densification and domain completion issues erupted. Computers bogged down, locked-up and I finally relented and called Joe Mako.
Joe is a cleaner, a guy you call to get the dirty job completed. Joe is like Harvey Keitel was in the movie “Point of No Return” from 1993. The difference is, whereas Harvey cleaned the crime-scene of the blood and bodies, Joe cleans the Tableau settings to get the job done. As usual, Joe helped me understand a part of the problem and showed me how to resolve the issue. The dirty job got cleaned, thanks to Joe.
The Tableau Solution
After implementing a few changes, things went OK. We were able to get the job done and I feared the day when we would have to revisit the project. I knew that day was eventually arrive because we conducted a multi-variable test for this client. Now a year has passed and we need to determine how successful the test was. In other words, we need to process more of this data!
The good new is that about 4 months after finishing the first phase of that project, we bought a license to Alteryx. I’ve learned how to use it for the past eight months or so. During this time, I had not thought about that project that I just described – until today, that is.
Alteryx to the Rescue
In two-hours today, the difficulties I explained earlier were quietly and seamlessly gobbled up and digested by Alteryx. I wrote a sweet little workflow to solve a small portion of this job (Figure 1 – intentionally undocumented). This workflow allowed us to get to an answer that previously would have taken us days to complete. This workflow simply replaced several days of teeth gnashing, potential cussing and hoping that we didn’t forget to do something to get to the answer. Forty-four seconds was all it took to crunch the data followed by about a minute to create the time-series chart in Tableau.
Weeks of work were replaced in two hours of workflow testing and writing. Less than two minutes was required to get to the answer. It will also take us just two minutes to process the next data update.
Final Thoughts
This real-world project was initially very hard for me to complete due to the multitude of files received and inconsistent data formats. The job required too much “data formatting” and “data joining” to process the information efficiently in Tableau. Now this example has given me perfect vision as to why I need Alteryx in my data toolbox.
Alteryx makes hard jobs seem really easy, especially after you had to do it the hard way the first time you did it. Not only is the job seemingly easy by using Alteryx, it actually is fun!
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
My only critique of this blog is that the Harvey Keitel reference is from “Point of No Return” instead of “Pulp Fiction”, which came out a year later but is much more recognizable lol. And it’s just a better movie. Now I have to go watch Pulp Fiction brb!
P.S. Joe Mako’s nickname should be “The Wolf” if it isn’t already.
Mark,
You are correct. Harvey played a cleaner in both movies. I always enjoy his movies, but I chose to to reference “Point of No Return” because it stunned me the first time I saw his character! I laughed so hard at myself for writing that reference that tears came to my eyes. I know that Joe will get a kick out of it too because he is so easy-going.
Thanks!
Ken
From Wikipedia:
Harvey Keitel played a cleaner in the film Point of No Return (1993), an American remake of the film Nikita, and in Quentin Tarantino’s 1994 film Pulp Fiction.
Pingback: Proof of Why Alteryx Is Great Software | 3danim8's Blog