How To Achieve Better Data Comprehension, Part 2

intro


Introduction

This article explains how I have learned to improve my data comprehension in modern-day analytics projects. Get ready to strap it on because I’m not going to mince words or be fluffy with what I say.

My thoughts and beliefs are definitive and backed by years of working and paying attention to the changes that are going on around us. The advice I give in this article is not academic. It was forged in the real-world in which productivity, achieving results and meeting deadlines were paramount to staying employed.

If you want to understand the motivation for this article, please read Part 1 by clicking here.

Background

I started my career writing computer programs and performing predictive analytics. That lasted 20 years, included at least 10 computer languages for programming, and I burned a lot of electrical energy performing the number crunching. This career ended when I said these words to myself:

I am as good as anyone I’ve ever met at doing this type of work. I could do it until I die and still be really good at it. However, I will not have progressed much in the second half of my life and I cannot let that happen. I need to find a new challenge. I need to get better and develop new skills. I’ve got to use this math knowledge and experience to do good things in the world.

During the first phase of my career, I also spent a huge amount of time comparing one piece of software to another. I did this for compilers, numerical models, and software packages for things like computer-aided-design (CAD), graphical engines, and geographical information systems (GIS).

I also wrote, documented and tested numerical algorithms for solving systems of equations. I wrote parallel processing routines to find out which methods worked the best. I tested, documented, and made determinations of what worked best. I spent countless hours embedded in the theoretical basis of numerical modeling, always knowing that what I was doing was never completely correct. For a person like me, that was hard to accept.

I was maniacal in my work and I was never satisfied with my accomplishments. Due to this perseverance and surrounding myself with unbelievable talent, my friend Dudley and I created software that was at least a decade ahead of its time. Still, to this day, some of this software we developed would be considered state-of-the-art.

OK, so what. Who cares?

Well, we did. Dudley and I cared a lot. We were searching, developing, testing, and pushing technologies as hard as we could. We wanted to work better, more efficiently, and we wanted to create solutions that could solve problems that could not easily be solved before we wrote the codes.

Sometimes it worked and those times were great. Sometimes our work didn’t lead to a viable solution, but we learned a lot in the process of trying to do those things. We were software innovators that were writing tools to do science in better ways.

Once again, I know what you are thinking. Who cares what we did 20 years ago?

Well, you should care because I’m about to drop some knowledge on you that will change the course of your career if you are willing to read, think critically, and comprehend what I am trying to teach you.

What Did I Learn From All That Critical Thinking, Testing and Software Development?

What I learned is lesson #1 of this article. When you find software that works for you and you understand it, stick with it.  Go deep, learn it, love it, and use it all the time. Never stop learning.

If you spend the first 20 years of your life chasing the best technology that happens to come along, you will never become a master of your domain. You will become a jack-of-all-trades, you will not reach your potential, and you will not develop the deep insights needed to consistently achieve great data comprehension.  The key to doing this, however, is to make sure that the software tools you choose to use are solid and dependable.

We are pawns in the great game of life. We are surrounded by companies vying to take us to the promised land. “Use our product – we are the fastest in the world!”.  We are pursued, we are coerced and cajoled into using the “Next Big Thing.” Some of these companies might survive, while most of them will not.

The reasons this occurs can be numerous. Some companies survive if they write good software and are able to achieve critical mass by demonstrating continuous growth in their client base. Achieving the status of a surviving company is a tough thing to do in a rapidly-evolving marketplace like advanced analytics, where competitors are found in every direction you look.

Sometimes companies develop the absolute best technology but they fail to survive because they did not market their product properly, or they couldn’t get enough paying clients to understand the brilliance of their product. Sometimes companies simply promote old technologies by repackaging them into a new form, with some fancy buzzwords designed to baffle us with BS.

A lot of companies die a slow death simply due to single points of failure, a loss of institutional knowledge over time due to staff attrition, or they incorrectly picked the base platform from which to build their product. If you invest a decade of your life in learning the software from one of these companies, you made the wrong choice. Sorry Charlie.

If you want to know more about my beliefs on this topic, you can read this article or even this article.

How Do You Know Which Software To Use?

The answer to that question depends a lot on what you want to do. You have to understand what job you will want to do over the next decade or two.

Are you going to work in the 80%, the 20%,  or the 100% (20+80) part of the spectrum of data analytics? If you don’t understand what I mean by that, go back to Part 1 and re-read the article.

Based on what I can tell so far, there are a lot of people interested in the 20% and not as many in the 80%. I predict that will change in the near future as people discover the excitement of performing the 80% job.

The choice of software tools you can use to do these jobs is quite large. You can use free tools like Python, R, D3, and other open source software tools to do everything you need to do. For many people, this is how they work and the capabilities they possess are very impressive. You can also choose many different types of commercial software products that can do the 80%, the 20% or the whole job.

If you are trained as a computer programmer, then using a series of open source tools like these and others may be perfect for you. If that is the case, get ready to study because the rate of change in these tools will force you to continuously learn new techniques. You will be in chase mode, trying to keep up with the new developments.

You can spend your life continuously learning new packages, new syntax, and new features as they are developed. You can spend your days writing codes, learning new GUI’s, new operating systems, and practicing how to write solutions to particular problems. In other words, you will work like I did for the first 20 years of my career. The good part about working like this is that you will automatically be undergoing self-improvement.

I loved working like that. I thought I was solving important problems. Sometimes I did achieve that goal, but oftentimes I wrote programs that were used once. Some of these codes took me a long time to write for a particular application. It was only later on that I realized that the application I built was very narrowly focused and not that useful beyond the original application.

As I advanced through my career, probably in the second decade, we decided to start writing general purpose codes. The general purpose codes were harder to write, required more skill and vision, but they were worth it in the long run. The time it took to develop these (years) was worth it because we used the codes on numerous sites over many years. The vision needed to develop these tools occurred in part because of the number of applications we had completed.

Now I bet you are wondering what this has to do with the current topic of data comprehension.  I am about to tell you.

Lower-Level Programming

When you are working as a programmer writing codes line by line, stacking functions on top of one another like you can do in R, you are immersed in an ocean of technical details. You have to worry about everything including the three parts: the body() , the formals() and the environment().

These three function features represent the code inside the function. the list of arguments which controls how you can call the function. and the “map” of the location of the function’s variables. In other words, you are having to do a lot of book-keeping to get things done and you have to practice a lot to be good at this.

Malcolm Gladwell suggests that it will take you 10,000 hours to achieve proficiency in an endeavor like programming. There are so many things to learn that I believe the 10K estimate is a good one.  The great thing about programming is that the more you learn, the more robust you become and you will quickly become capable of proficiently working across multiple platforms and computing languages.

This means that you should plan on taking 5 to 10 years to build these skills such that you can work efficiently and effectively as a programmer in this environment. When I say efficiently and effectively, I mean that you can get jobs done quickly, accurately, and with great data comprehension.

One of the reasons it takes so long is that you have to map your neural network (i.e., your brain) in such as way that it will understand all the necessary details of doing this work correctly. You have to learn to think in low-level terms, like a computer, and you have to learn all the fundamentals of a language like R or python, or C++.You also have to become automatic with the language, with the syntax and the order of operations needed to write and execute programs.

If you choose R, you have to know how R is designed and the way the functions work with each other. You will have to study Hadley Wickham intensively and learn to think how he does. In effect, you are learning all aspects of a computer programming language and you need to think and act like a programmer when you are using these tools. One drawback of this approach is that the effort needed to learn these things takes time away from you for working on your problem-solving skills using data.

That approach is one way to work and it is very popular. It is an awesome experience and if you are on this path, enjoy every minute of it. I know this because I did this before, in the 1980’s through the early 2000’s. I still work like this at times, but those times are now fleeting because of what I’m going to tell you very soon.

Instead of learning from Hadley, I learned from another very brilliant guy named . In fact, I was very lucky because we had adjoining offices and we collaborated for years. If I were to tell you the names of the five most innovative and brilliant modern programmers I have ever known about and directly benefited from, the list would include Donald Knuth, Dudley Benton, Ned Harding, Hadley Wickham, and Linus Torvalds. That is pretty rare-air and I’ve been very lucky to have met two of these people in person.

If you can’t see yourself working like this, I have some really great news for you. There is an alternative working methodology and it is simply stunning, incredibly powerful, unbelievably fun, intoxicating and so much easier to learn.

Instead of writing code, debugging, testing, pulling your hair out, and screaming when you can’t figure out the problem, you will be able to obtain a tranquility with your work. You will obtain a “data zone” and you will achieve a type of flow with your data like you have never achieved before.

This achievement of “flowing data” will occur in the fraction of the time (1 to 3 years) and you will reach a higher level of data comprehension than you ever have achieved. You will be able to solve problems easier, sooner and with a software product that will allow you to explain your work to the non-technical people you interface with.

Does this sound too good to be true? Well, it is true and I am going to teach you how to obtain these skills. I have found this alternate method to be more powerful, valuable, easier to learn, and a million times more fun. Wow. Did I really mean that? Yes, I did.  Let me explain.

Higher-Level Workflow Development

There are a multitude of object-oriented software programs available for us to use to do this type of work, which I loosely call a “flow-based programming” methodology. My choice of this term isn’t necessarily the best choice because there is a formal programming approach that is officially called “the classical flow-based programming (FBP) approach”. Those methods were developed in the 1970’s by J Paul Morrison, and they represented a paradigm shift in computing capabilities.

However, I like using this term to describe the way I work because this describes the basic approaches I use to solve business problems. The analytics-based programs that are available are not necessarily classical flow-based programming examplesbut the concepts of moving a quanta of data from one point to the next with changes being made to it along the way is consistent with flow-based programming methods. In my mind, when I work this way, data is flowing along through a sequence of operations that are designed to accomplish various goals, like little data processing units would do in an FBP application. Even though the techniques I use are not actual FBP’s, I enjoy thinking of my work this way because it makes sense to me.

The lucky thing for me was that I happened to find the best program for working like this, the very first time I happened to look for a tool like this. At this time, I am not going to discuss any particular software products (I’ll get to that soon enough). If you haven’t noticed, I haven’t used the name of any commercially available software packages in either of these articles.

To work at a higher level of programming compared to what I previously described, you will have the benefit of others helping you perform your work. The people I’m referring to are the computer programmers that wrote the software package you are going to use to do the 80% of the work. These people will have removed all the tedium from your life so that you can be an innovator with your data.

By working at a higher level of workflow development, a series of pre-packaged tools are offered to you that do a lot of work for you automatically. These tools can easily be configured to accomplish the types of common operations that you want to impart on the data.

A single tool can replace dozens to hundreds of lines of code that you would write in a classic computer program. In essence, you will not be programming with a line-by-line approach. You will be stringing together subroutines, function or modules that accomplish a quantum amount of work for you without you having to write or initialize them. These items are called “tools”.

The difference between using these and writing code in R, for example, is that you do not have to write the function or even know anything about the parameters required for that section of code to work. You can drag a picture of the tool onto a canvas and set a few configuration options. Sometimes you don’t even have to do that because some tools can configure themselves.

By placing these tools in the right order to perform the operations your brain is telling you to do, you will rapidly be able to architect a solution to your problem. In effect, you have to tell R how to do what you want to do, whereas you only have to tell flow-based programs what you want to do. These programs already know how to do the job because the programmers have taken care of that part of the work. This difference is huge when it comes to achieving high productivity and keen data comprehension.

When these tools are placed in a logical sequence to transform data from Point A to Point Z, you will be working your way towards achieving data comprehension. Not only do you get to visualize your data as it moves through the logical workflow you have created, the visualization of the sequence of tools reinforces what you are doing. In other words, your brain is conditioning itself to complete the task of achieving data comprehension.

Once again, for these reasons, I call this working methodology “flow-based programming”. Other people call it self-service analytics, and I’m ok with that name, too.

The reason it is called self-service analytics is that the techniques allow you to do the 100% of the job, from gathering the data to making the transformations, all the way through visualization. There is little to no dependence on others to deliver data to you since you can do everything yourself. If this sounds like a good idea to you, then this method of working might just be for you, although some other people might not be too happy when you develop your new skills.

Solving Business Problems With A Flow-Based Programming Methodology

As you attack a new business problem, the basic methodology you will use to formulate and solve the problem will involve a number of topics and logical steps.

The whole process begins with a definition and understanding of the business problem to be solved. Once that is known, you start by getting the data and beginning to understand what it means. Once you have that you prepare the data to use it in a descriptive or predictive fashion, if you need to run some type of model. Once you calibrate and/or verify your model, you might use the model to make future predictions and then you visualize the data and the model results.

This is what I did for 20 years straight, very intensively, in environments where teams of hundreds of people were working collaboratively, deadlines were applied and results were demanded. Millions of dollars were riding on our ability to achieve defensible results. This experience helps me to see the beauty of this “flow-based programming” methodology.

Now getting back to the basics of “flow-based programming”, you will be working in a natural state, the way your brain is wired to work. If you remember what I said in part 1, your brain likes to pre-plan and visualize what it is going to do. In much the same way, this is how the “flow-based programming” methodology is executed.

What these “flow-based programming” tools do for you can be summarized as follows:

  1. They read your data, allocate memory space, define variable types, and get you ready to work with the data, all in one step. It doesn’t matter where the data originates or how big it is, you can go get it seamlessly, effortlessly and with surgical precision over what data you want and need to grab.
  2. The tools allow you to manipulate the data in any way you need to. These operations include various quantitative and non-quantitative transformations, creating new variables, filtering, joining, summarizing, sampling, cleaning, and a host of other things.
  3. You get more work done with fewer commands and in much less time. Comparatively speaking, working like this is lightening fast compared to the development possible with traditional programming tools.
  4. By using these tools, you don’t sweat the details that computer programs require. In other words, you let the developers of the tools handle the required details while you get to freely focus on the natural flow and transformation of the data in the workflow as you pursue data comprehension. You simply tell the program what to do by setting simple configuration settings – you do not have to specifically tell the computer how to do the operations.
  5. You can easily test the data transformations that you need to create in the workflow. You get to visualize the data as it moves through the workflow and you get to visualize the logical steps you designed to move and transform the data.
  6. You get to produce various types of output at the end of your workflow. You can output as much or as little as required. The output choices available to you make it very easy to visualize the data.
  7. You can easily re-run, modify, or add new capabilities to the workflow any time you want. The self-documenting nature of these “flow-based programming” tools makes it easy to follow what they are doing and it makes it easy to collaborate as a team when developing complex workflows.

The most important part of working at a higher-level with your data is this: Our brains naturally think this way as I discussed in Part 1 of this article. We are conditioned to visualize what we want to accomplish, and we are programmed to execute the steps necessary to complete the work.

The algorithms that form in our head for solving problems are perfectly visualized and easily materialized on the drawing canvas of these types of tools. This higher-level approach allows us to create a data flow, from beginning to end, with data modifications made along the way that allows us to comprehend our data. This approach is so powerful that I find my brain automatically draws these workflows before I ever reach the computer keyboard.

My Hope: Rewarding People Who Achieve Data Insights and Comprehension

When I think about the value of people that have skills like these, I can’t help but make logical comparisons to other skilled workers in the world. I believe that we highly undervalue people with these skills and I think that is going to change very soon. These people hold the keys to us improving the performance of our companies, our schools, and our lives in general. After all, I believe this because I think data is becoming the language of life.

Let’s have a look at the skilled artisans that we celebrate. You might notice a common theme running throughout this list. I’ll leave it up to you to find it.

We celebrate:

  1. Great Painters – they take a canvas and put colors on it to tell a story of a moment in time. Does that sound familiar? We are willing to pay millions for the great stories.
  2. Great Chefs – They take food ingredients and blend them together to produce something we love to consume and transform into energy. Does that sound familiar? We pay big bucks to experience these short-term, singular events.
  3. Athletes – We pay a lot of cash to watch the physically gifted people perform in their respective sports. They get paid handsomely because they can create revenue for their business/sport. Why shouldn’t we pay the intellectually gifted people who can perform with data? I propose that their pay should be commensurate with the revenue they create for their companies.
  4. Singers – We pay big bucks to see these talented people blend words and music together to make our brains feel good. Why don’t we pay data workers that can do the same thing with data?
  5. Celebrities – These people get paid huge money to make us believe something that will not have any effect on our lives. We watch movies, laugh at comedians and gain satisfaction by watching TV. Why don’t we pay people who use data to make us believe in the impossible, to visualize the dreams we have?  I predict that one day there will be a recognition of this talent and the people with these abilities will be celebrated for their skills.
  6. Medical Doctors – We celebrate medical doctors for helping us survive throughout our lives. We pay huge money to them for helping us avoid the inevitable. Well, I can tell you that I can prove to you that doctors are not working optimally throughout their careers. As a person with these data skills, I can help them improve dramatically. I think that people that can help doctors improve their performance should be compensated for these skills.

This list could go on. My point is that people that can routinely achieve data comprehension need to be more highly-valued. We need more people that can work like this. It is my belief, my contention, that we are on a rocket that is about to blast-off. The rocket is fueled, the people are on board, and we are ready to achieve some great things with data as we fly to the heavens. In fact, I just saw this article about SpaceX’s Big F*cking Rocket (Figure 1). Without data comprehension, nobody will board that rocket.

spacex

Figure 1 – The Big Data Rocket is getting set to launch. Will you be on board? (click the pic to read the article)


 

The abilities that exist in our brains, the parallel processing capabilities coupled with experience,  knowledge, and intuition is still more powerful than our best algorithms for solving common business problems.

What I think we need are people with these skills at the top of organizations, not at the bottom looking up, because we are the visionaries. We can see the past, the present and the future through data better than the top decision makers are able to guess. Companies need to change to help themselves. The people that achieve data comprehension are the ones that should be celebrated, promoted and rewarded.

Upcoming in Part 3

I have made you wait long enough. I have spent nearly 8,000 words talking about ideas, concepts and beliefs. I have promised to help you learn what is needed for you to do these types of things.

Well, in part 3, I’m going to unleash some ideas that will surprise you. I’m going to take you on a journey of the improbable. The concepts and ideas I will express have formed over the past three decades as the tools and technologies have been rapidly developed, as I have developed, and as I have seen things change. For all ten of you that have made it this far, thanks for reading.

Click here to read Part 3.

20160927_211707-001.jpg

Always Remember to Think Big.