Prelude
I believe that learning should be fun and exciting. Studying a topic in college should lead to passion and excitement, especially considering the cost of acquiring such an education. My expectation is that when you are working in the field of data analytics, you should be a happy data guy, like the data dork shown in Figure 1.
Sometimes, however, learning can feel more like drudgery. In the video shown below, I try to explain the thoughts and emotions I have been having over the past few months as I watch my son navigate his college experience. This article represents my opinions, but these opinions have been forming over several years.
Introduction
Imagine this. Imagine that you had to learn things backward. Imagine that you were taught that 10 = 5 + 5, but you didn’t understand what 5 meant. Before you could understand the equation, you had to go back and learn the meaning of 5.
If we had to learn this way, I wonder how frustrating this would be for us. This story feels like this situation to me.
The Agony I’ve Experienced This Semester
For the past three months, I have been spending a lot of time sitting next to my son Colton as he does his homework. During this time, this article has been forming in my mind as I experience “the agony of an education”.
As we work together, I offer Colton advice and pointers when I am able to do so. He is 22 years old and in his final year of undergraduate studies in business analytics. He has two classes that I’ve been participating in with him. The first is a SQL programming class. The second is a business analytics class taught using the language R. Neither of these topics is in the set of my top 10 skills.
For over 40 years, I have written literally millions of lines of computer code in as many as 10 different languages. I have done so much coding that my brain just automatically does what is needed to solve a particular problem. However, I have never taken a formal SQL class and I have never taken an R class. For me, both of these are new academic frontiers, although I have used both in my professional career over many years.
The SQL Mystery
Working through SQL homework problems seems to me to be mysterious. There are a lot of easy problems, but complexity can quickly be introduced. When subqueries and “having” statements are needed, the problems seem to take a lot longer to solve and are indicative of added complexity.
What I have learned is that as a new SQL programmer, it is really easy to burn a lot of time trying different things in SQL. In many ways, Colton and I have been guessing on how to solve the more difficult problems. We can spend between one and two hours on a problem by trying many different combinations of approaches.
It is very frustrating for me to work this way for two reasons. First, I’m not attending the SQL lectures, so I am essentially unequipped for applying more advanced techniques. The truth is, I do not even have all of the SQL fundamentals locked-into my working repertoire. Essentially, I am trying to learn SQL without receiving formal instruction, which is not a great recipe for success. Secondly, I have a couple of secret weapons at my disposal, and these weapons are built to execute instructions the way my brain thinks about them. Working natively in SQL does not work this way for me.
My secret weapons are called Alteryx and Tableau. What Colton and I generally do is work on an SQL problem until we are frustrated and exasperated. We complete on-line searches, reading blogs, and other forms of research. I have even been repeatedly reading Ken Flerlage’s excellent SQL series for help and guidance on the topic.
When we relent and admit we can’t solve the problem, I write the database tables to separate sheets in an Excel file and mail it to myself. I go to my computer that has Alteryx installed and we typically solve the problem in a few minutes, including taking the time to teach my son what is happening. By having a visual workflow with easy-to-understand tools, Alteryx allows me to teach my son how to do things the easy way, rather than by struggling through a trial-and-error method in SQL.
What this experience has proven to me is what I already have known for years. Alteryx is a magnificent platform for solving a wide variety of problems. Alteryx is so easy to teach and use that it makes working directly in SQL seem like drudgery when you don’t have much direct SQL programming experience. If I had formal training in SQL, I might not feel this way, but for now, I much prefer to do the data manipulation work in Alteryx followed by data visualization in Tableau.
The only question I have been asking myself is this: Is it necessary for me to do an in-depth study of SQL so that I can appreciate what it can do? Would SQL training help me become a more capable analyst? Considering what I have been able to accomplish in Alteryx (and Tableau) without in-depth SQL training, my intuition tells me the answer to that question is likely to be no.
The reason I answer the question in this way has to do with the development of continuously improving software tools that give us the capability of being highly productive analysts. Tableau and Alteryx both produce SQL queries for us. The experts at those companies have absorbed the complexities of constructing SQL queries and bottled it up in the form of awesome graphical-based software that is intuitive and easy to use.
The complex SQL queries are abstracted away from us through these outstanding graphical user interfaces and the intellectual power embedded in the engines of Alteryx and Tableau. I do not see the need to go back to learn SQL fundamentals when I don’t ever have to write custom queries from scratch. I let VizQL and the Alteryx engine do the work for me. This is how I stand on the shoulders of giants that have come before me.
The analytical prowess that I possess was built upon decades of scientific programming rather than working with database programming languages, even though my database experience started way back when dBase III was popular. For this reason, learning to become more proficient in SQL would not help me advance too much considering I use Alteryx and Tableau for many hours every day on my job. I think it is wise for me to let Alteryx and Tableau do the hard work in SQL while I get to explore and develop my problem-solving skills.
To summarize, using Alteryx allows me to solve a very wide variety of problems without having to know how to formulate the SQL commands to solve the problem. This is what I meant at the beginning of this article by learning things backward. Maybe I should have learned SQL first before learning Alteryx. At this point, I don’t think that it really matters.
The R Quandry
On some days we write SQL, while on others we go deep in problem-solving using R. At this point, I’d say that I have grown to appreciate the cleverness of R, especially for mathematical model building and statistical analysis. The main problems with R as I see it are related to the cumbersome nomenclature of the language combined with the sporadic and inconsistent documentation of some functions.
There is a lot to learn and a lot to remember when you are learning to program in R. This is why there is an abundance of R “cheat sheets”. There are so many people that develop R functions because there is a big R community. One drawback of this extended development approach is that consistency in naming functions and grouping of functions is not a strong aspect of R.
When compared to Alteryx tool groupings, determining which functions to use for any particular problem can take too much time when working in R. This is especially true if you are working in a dynamic and rapidly evolving ecosystem of big data. When I am developing a custom data solution in Alteryx, I don’t need to use any cheat sheets because each tool is a self-documenting masterpiece with tool configurations given to us on the GUI canvas.
There are many functions that are available for us to use to do great things in R, but it is up to the programmer to know or be able to find this information. If you were given problem statements without example code, it can be quite a burden to find and learn how to use the available functions, as well as to properly sequence the operations to solve the problem.
Considering the wide variety of problems we have been given to solve this semester, we are getting a great education on how to use R across a spectrum of topics. Without the example problems given to us, however, we would never finish the assignments because so many different functions are needed to solve these problems. We would have to spend dozens of hours each week performing research to learn how to construct the algorithmic approaches that are given to us as by the instructors as example solution techniques.
Some of these approaches are cryptic and nearly incomprehensible the first time you see them. An example is shown in Figure 2. The numerics array is completed when a series of functions are applied in one line. Sometimes these constructions can be so large that it takes a while to deconstruct what they do. Maybe this is great for writing tight code, but it is problematic when you are first learning a language.
Compared to learning Alteryx, learning R is much more difficult and time-consuming. There is no question in my mind that R is a great complement to Alteryx, however, because of the wide-ranging amount of functionality available.
For this reason, I am going to focus on implementing R functions within Alteryx rather than writing stand-alone programs in R studio. The flow-based programming approach used in Alteryx combined with the great GUI and self-documenting and repeatable workflows make the Alteryx experience much more effective for me compared to writing code directly in R-studio. Being able to so easily track how we are modifying the data stream in Alteryx is a huge advantage compared to working in R. For more thoughts on why this is the case, I encourage you to read this series of articles on improving your data comprehension.
To be succinct, I believe that the very nature of a continuous data stream in Alteryx that is modified by a series of in-line tools is more effective and intuitive compared to the method of stacking together a series of complex, black-box functions in R. In R, we write code snippets to accomplish a particular task. This approach never really allows me to achieve a flow or continuity in developing the workflows needed to solve the entire problem. What this means is that solving individual problems in R takes me much longer than in Alteryx, which is similar to what I described for SQL.
I would say that the discrete tool operations in Alteryx are easier to understand and teach than some of the advanced, black-box functions that pack a variety of operations into one routine. Many times in R we are asked to use very powerful functions that do a lot of things to our data without really understanding what is happening. For this reason, R is a bit mysterious to me compared to working in Alteryx. I think this is why I feel much more comfortable in Alteryx than I do in R. In Alteryx, I understand every operation being performed on the data. There is little to no guesswork in Alteryx for me.
On the flip-side of this argument, it could be said that these powerful, custom black-box functions in R can take a lot longer to construct if you had to develop them natively in Alteryx. Well, the good news is that Alteryx can use these R functions directly without having to write them natively in Alteryx. This is the approach that I plan to use as I move forward in my career. For this reason, I have put together an Alteryx and R reference section on my blog to help me learn new techniques and perform future work.
Final Thoughts
Over the past decade, I have been a witness to and participant in a software revolution that is giving us amazing capabilities when working with data. The tools that are now available for working with data are immensely more powerful than when I began my career 30 years ago. Specifically, tools for working with data for quantitative and visual analysis have become so much fun to use because they offer us the ability to gain quick insights without the frustration of doing the dirty work. The fun-factor of using Alteryx and Tableau is one of the reasons why the annual conferences of each product continue to expand.
Tools like Alteryx and Tableau are a lot more fun to use than SQL and R, and they are state-of-the-art (Figure 3). Maybe my opinion is not shared by everyone, but there is no doubt in my mind that my rapid problem-solving abilities are fueled by these two great software packages. I encourage young data workers to explore these platforms for themselves, to compare them to the techniques learned in school using SQL and R. You will quickly find that using Alteryx and Tableau will make you a more productive data worker in a very short amount of time. Increased productivity is one key measure that will help propel your career forward!
Additional Reading
Ken, Can you please link to the 2 articles you mention in your video? I would love to read them as I have been through the excel drudgery quite a bit.
Place the word “Excel” into the blog search box. The articles will be returned. I’ll add the links to the article very soon.
Pingback: The Happiness of an Education | Data Blends
Pingback: How Colton’s Accomplishments Became One of My Proud Moments in 2019 | Data Blends