Introduction
Last year, I innocently turned on the Alteryx logging capability. That might have been a mistake. After the brain meltdown I suffered last night due to excessive use of regular expressions, I’m not so sure it was a good idea.Â
After perservering and overcoming multiple hidden challenges, I decided to see what these logs could tell me about how much I love to use Alteryx. If anyone reading this article decides to follow my lead and do this for yourself, don’t blame me. Consider this your fair warning.
Turning on Logging
You can turn on logging by editing the field called Logging Directory in the User Settings/Default tab (Figure 1). By entering a directory, your Alteryx workflow usage logs will get saved.
A typical log file looks like the one shown in Figure 2. In this case, I have word wrap turned on to be able to read the content. Although the content is good, the format of the log file is a PITA (Pain-in-the-A$$) to process because of variabilities that inevitably occur in input file names, among other things.
For instance, if you encounter spaces in the file name, you have to be able to handle that because this unstructured log file is read in a spaced delimited fashion. There are issues like this that make extracting data from these files challenging.
After completing this exercise, I now offer a challenge to the Alteryx developers. Please, please, please write log files that are easier to use! There is gold in these logs which can be used to produce the types of insights I show in this article. However, finding the gold requires the ingenuity of a geologist (which I happen to be!).
I also mention this because I know that there must be other data dorks out there just like me that like to quantify how important Alteryx is to them. I previously did this type of exercise with Tableau, more than 5 years ago. This was early in my blogging career, so please don’t go back and read that dork meister article. If you do, I beg you to have mercy on me.
Processing the Log Files
As I said previously, processing these log files is a PITA. In fact, I encourage others to try to do it as I have done (NOT!) I spent much of last night in regex hell, writing operations so cryptic that my brain began oozing from my ears. I was about to beg for mercy by calling the Masters’ Dunkerley, Mako and/or Harding, but I persevered until the end. I’m sure I suffered permanent brain scarring, however.
If you don’t believe me, try it yourself (NOT!). You might need to do things like this:Â REGEX_Replace([Field_1], “([^>]+)”, “()”). Yep, now you know what I mean because that was the lazy man’s way of solving the problem – it was a quick fix. Figure 3 shows the notoriously tricky workflow I wrote for this.
If the authors of the weekly challenge were so inclined, they could offer this little doozy up for the masses (NOT!). I wouldn’t recommend it, however, because there are several really sneaky things that can bite you. In fact, the structure of the log file is so variable that I had to string together a multi-step regex-based QA program just to get consistent results to send over to Tableau.
Performing QA
If you are like me, you perform QA on your work even though it doesn’t matter to anyone but you. In fact, this work matters to nobody but me. Can you see how ridiculous I can be?
During the QA process, I noticed something when I was looking at workflows that ran more than 1 hour (Figure 4). What I noticed is that the 7 hours run time perfectly matched the total duration of the next three workflows.
I immediately knew that the first entry represented the list runner macro, while the next three were the individual workflows that were executed by the list runner macro. This meant that I had to remove the 30+ workflow logs that contained the list runner macros. That little insight caused me to develop round 2 of the workflow. Luckily it was fairly easy after last night’s adventure.
If I didn’t do that, I would be double counting the total workflow run times when list runner was used. If you want to understand why the list runner macro is so awesome, read this little beauty about Alteryx and Big Data techniques.
What Have I Learned About My Alteryx Love Affair?
Here are my findings (Figure 5), which represents my Alteryx activity from mid-October 2017 through June of 2018 (8.5 months). I am almost certain that these results are not complete because I remember blasting a bunch of log files to save space. That was a mistake I wish I had not made.
- I used Alteryx over 5200 times. This just means I told Alteryx to launch 5200 workflows.
- Alteryx computed for over 612 hours. If you know Alteryx, you know that is a lot of work. Sometimes I computed multiple workflows that spanned more than a day in duration.
- I used Alteryx on weekends, as well as weekdays. That’s the sign of a nice love affair.
- I did a lot of number crunching on Sundays. Please don’t tell my employer because I never charged them for this work! We will keep that as our little secret as I pushed billions of records through Alteryx to help a great company keep moving forward.
- I did a lot of development work on Mondays followed by more number crunching Tues – Friday. The uses were high on Mondays but compute times low (i.e., workflow development using the techniques discussed here).
- There is evidence of some whopper workflow run times when I put the hurt on Alteryx. I even once asked Mr Ned if it was OK to do that! In some days, I run multiple simultaneous workflows that can result in run times exceeding 24 hours in a day.
Of course, this analysis would not be complete until I examined my daily usage traits. The slideshow shown below contains the number of times I launched Alteryx workflows. It is easy to see which days were development days when I built or modified workflows.
On those development days, I could run Alteryx over 150 times, or roughly 20 times per hour, or every three minutes. This gives me some idea of how much time I spend between developing concepts before running Alteryx to test my work.
As far as the daily compute times go, the next slideshow holds those results. During the work week, I ask Alteryx to compute about 4 hours a day, with Monday only being 2 hours because of the development work. On Sundays, I push it to about 7 hours a day. Clearly, Alteryx is my main computational workhorse.
Final Thoughts
Even after 5 years of serious usage, I feel that I am an Alteryx novice. When I hear the Alteryx developers discuss their work, I know that I am a novice.
To understand what I mean by that, listen to this excellent Alter-Everything podcast. That discussion will help you understand the brilliance of Alteryx and how it can take years to explore the endless possibilities present in the software.
Pingback: Why @Alteryx and @Tableau Are Sticky Products | Data Blends