Introduction
What the heck is a high-volume Tableau user? Any person that uses Tableau a lot, like using Tableau every day, is a candidate for being a high-volume user. In this context, however, I define a high-volume Tableau user in this way. People that use Tableau to investigate many different data sets from different data sources, many of which you do not necessarily know much (or anything) about, are high-volume users. In other words, Tableau is your data analysis workhorse and you ride this horse very frequently in many different settings.
So who are high-volume Tableau users? Generally, I believe that these people are business intelligence consultants that work with a multitude of clients throughout their career. They may work in different industries and with different groups within the same company.The data that they will receive and be asked to analyze can come in all shapes and sizes from a wide variety of data warehouses. The data formats they receive will vary as much as the data will vary. So what can a high-volume Tableau user do to make thier life easier on any project? There is one specific thing that they can do and I describe it in this post.
Although I target this advice for high-volume users, the lessons learned and explained in this post are equally valid (an maybe even more important) for new Tableau users.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
My Secret Weapon For Maximizing My Tableau Productivity
The insight I’m about to drop on you took me years to formulate and understand. The reason it took so long was that I had to learn Tableau (and I mean really LEARN Tableau) to the point where my decisions on what to do with data became automatic. It is like learning to effectively ride a bike or hit a baseball. It took me a lot of practice and it took me making a lot of mistakes along the way. The reason the mistakes were important is that it drills into your head how important data formats and data structures are to being able to work efficiently in Tableau (or Excel, or a multitude of other BI programs). When you are on the receiving end of data that you have never seen before, what you get isn’t always what you need (that sounds like song verse and in my head I can see Mick Jagger singing that song at the beginning of #TCC15.).
My best advice when starting a new project is to write a data specification after you receive the first data set. What? What does that mean? I know that sounds ridiculous because you already have the data, so why do you need a data specification (and what exactly is that?). If you stick with me for a few more paragraphs, I’ll explain what I mean.
The Iterative Cycle of Data Delivery and Analysis
In the hundreds of scientific and business projects in which I have been involved, the data never, ever is delivered perfectly the first time. In fact, I have come to learn that I have to treat all data as guilty until I can prove it to be innocent. There are problems with formats, problems with data content, problems with missing fields, problems with erroneous data, problems with how dates are formatted, etc. I could go on and on about how data is never, ever delivered properly the first time.
The reason this occurs is that there are a lot of people involved in a typical project initiation. Project instructions, goals and intent are lost from person to person by the time the data warehouse/data delivery person writes the scripts to do the first data extraction. Their job is to get the data moving out of their system and into the hands of the consultant. They don’t necessarily know the details on how the data will be used or how it has to be formatted for maximum efficiency on your end as you begin working with it in Tableau. Lastly, every data warehouse has its own methods for storing and writing data and these methods vary from site to site and system to system.
To eliminate your future data headaches, I have three simple steps for you to follow.
Step #1. Get the name, phone number, and email address of the person that is going to deliver the data to you. Do this in the project kick-off meeting and don’t leave without it. Meet the data delivery person if they are in the meeting and if they are not, see if it is possible to meet them before you leave the site. Many times they are not at the client site but are physically located at a data center that is in a different building, state, or even country, so it might not be possible to meet them.
Step #2. Once you receive the first data delivery from the data person, call or write to them and thank them for sending you the data because they are about to become your best friend. Tell them that you are going to write a data specification that will be used for future data deliveries and that you will send it to them in a little while. Explain that you are going to call them to explain what the specification actually means.
By placing this one phone call you will be able to solve one of the biggest problems that you will encounter in any new project. That problem is getting the correct data with the the correct formats, definitions and structures so that you can do your job in its entirety with efficiency and effectiveness in Tableau. Â After all, working in Tableau is a lot more fun than toiling away, fixing formats, swapping columns, and otherwise just wasting time changing data that could have been fixed in the beginning of the project.
Step #3. Once you have completed step #2, begin the iterative process to resolve all the problems inherent in receiving the proper data. This simply means that you ask the data delivery person to rewrite the data and to send it to you with data specifications applied. By applying the formats and structures that you produced in the data specification, the data delivery person will make the corrections needed to give you a file that will be Tableau compatible and will allow you to fly directly into your analysis.
With complicated data or multiple data tables, you might have to iterate a few times with the data delivery person to get everything resolved. With good communication and another simple trick (ask for only a few records during each iteration), you will eliminate these problems at the beginning of the project. This will save you a huge amount of time and effort, especially in projects that have periodic data deliveries (i.e., weekly or monthly data deliveries).
If you follow these three steps, you will be able to communicate with the data delivery person such that by the time you finish data delivery iteration #2 or #3, you will have the data you need for your project. You will know what the data means and you will know with certainty that your analysis can proceed and that future data deliveries will be consistent with what you started with! Â The other advantage of having this documentation is that as the project moves into the future and the data delivery person is replaced by someone else on the project (which happens at least 50% of the time), you can give the data spec sheet to the new data delivery person to ensure consistency of the data.
If you don’t follow these steps, prepare for mistakes, extra workload in reformatting data, and receiving the ire of your boss because you have taken too long to get the job done. Also, expect that your wife or husband will be unhappy with you since you’ll be working late at night and on weekends trying to catch-up with the project demands. So unless you want to find yourself in the dog house at home and at work, read the following section carefully.
The Data Specification
A data specification does not have to be a big, technical document. If you are in the medical industry, you might be thinking the following: “Oh no, a data specification will require signatures and take me weeks to complete!” Do not worry, however, this is not what I am talking about.
This type of data specification is far simpler than a medical data specification and Figure 1 shows an example. In this example, this data specification changes the order of fields 1 and 3 to accommodate sorting of the *.csv file (millions of records) and it specified different formats for fields 1 and 3. The quotes had to be added to the DMA field because certain DMA names they sent in iteration #1 had commas in the name, which throws off the *.csv import to Tableau.
Figure 1 – A very simple data specification example that changes the order and format of fields 1 and 3.
This simple example only has 6 fields in the database but the basic concept of the data spec can be used for much more complex data sets. If the field names were not self-explanatory, you would introduce another column in the data spec to ask for the data field definition (not necessary in this case).
You should also stick with a standard date format (mm/dd/yyyy) whenever possible. By doing so, you can avoid having to do Tableau-based operations that create dates from string or integer fields. Â For example, if the client gives you two fields called “order year” (yyyy) and another called “order month” (mm), just have them write one field for you called “order date” as (mm/dd/yyyy), where dd = 01. If you do not do this, you will have to assemble the “order date” field using string operations and concatenations, as well as casting the result as a date.Those operations will take a lot of time and cause you more grief than it is worth when you are processing across millions of records. By pre-processing the data formats, field orders and table structures, you can save yourself millions of Tableau operations every time you receive a new file from the client.
You can create this type of data spec in Excel and send it to your client as a working document. While on the phone with the data delivery person in step #2, you will systematically work your way through each field, taking the time to look at the format and definitions for each existing field (from iteration #1) and then comparing them to what you really want to receive in all future data deliveries. Explain to them why it is important for them to make the changes you are suggesting. Since you have already established your relationship with the data delivery person, this working session will be easy and will pay dividends far into the future.
It is also in this stage that you have to add any other fields that you might need for your analysis. You should also use the opportunity to do a little “data mining” of your own by asking the data person about other types of information that they have that could help you in your analysis. These people know the data better than anyone else in the company. For example, you might find that they has access to physical attribute data like latitude and longitude of stores, store square footage, DMA information, or a multitude of other types of data that can help you in your Tableau analysis. You might find yourself writing a data spec for this data, too. Always remember that Tableau excels at both temporal and spatial analysis, so you will also need that type of data to really impress your boss and your client. Finally, thank the data delivery person every time they send you updates. Praise the data delivery people in your project meetings and your projects will go smooth as silk because you will have a friend on your side as well as your most powerful ally: Tableau.
Great to see your process clearly laid out like this, Ken. In my practive I have been following similar routine but not in such a systematic way. Thanks for putting your method in writing, it is very helpful.
Thanks for the kind words, George. Like I said in the post, this took me years to determine. As you grow into a career and experience recurring problems, you have to find a way to improve your work process. That is what this article describes. It isn’t sexy or glamorous, but if you think about what I am saying, it will make your life much easier if you are a consultant that works with a bunch of different clients. The key finding is that by directly communicating with the data provider (i.e., the script writer), you have taken away any possibility of ambiquities with respect to the data that you want to receive.
Pingback: How To Achieve Better Data Comprehension, Part 1 | 3danim8's Blog
Pingback: How To Achieve Better Data Comprehension, Part 2 | 3danim8's Blog