How to Add Geographies to Lat/Long Data for Better #Tableau Mapping

Featured


Introduction

Little did I realize over 4 years ago that this simple article would become my most popular article ever. Much to my amazement, it has been viewed over 13,000 times and continues to be popular over 4 years later, as shown in Figure 1!

 

Geo_Performance_DB

Figure 1 – The time series history of the original article about lat/long techniques for Tableau, which was written in Sept 2013. It took about 6 months, but suddenly that article started getting attention, and that has continued for 4 years.

The Title Mistake

Even with all that activity and the countless times I read that article, I never knew that I made a mistake in the title until a few minutes ago. Can you see the mistake, which is shown in Figure 2? This just cracks me up! Maybe I should make mistakes in all my titles!

Title_mistake

Figure 2 – I never knew that I used the word “add” twice in this title!


 

Although I never expected that type of readership to happen for that article, sometimes simple techniques can be very powerful and have widespread usage. When you need to plot data based on some type of geography, having various techniques at your disposal is a necessity, especially for data that needs to be visualized at a high level of detail.

Defining The Inverse Geographical Problem

In this case, I will explain how to solve the inverse problem. We start with a data set that contains accurate location information about where and when an event took place (i.e., I have the latitude and longitude of the event), but I don’t know anything else about it. Maybe I want to add the state name to the record, the county of occurrence or even the zip code region the event occurred in. In these situations, I know the finest location details, but now I need information about some type of larger-scale geographical entity.

In this situation, Figure 3 shows the data that I start with where I know where and when an event occurred in terms of latitude and longitude. This type of data is typically created by a device with a GPS (global positioning unit) installed.  Since there are now zillions of these running around, this type of data is becoming more common.

LatLong_only

Figure 3 – This is my starting data. It is all I have. Now I want to determine in which US State did the event occur.


 

Figure 4 shows the data after I complete the operation to add the state abbreviation and name to the records. I needed this to be able to filter or aggregate my data in Tableau based on the state of occurrence.  Having this level of control is very advantageous for large datasets.

 

LatLong_withstate

Figure 4 – Now the data set has been augmented with the State abbreviation and the State Name. How I did this is shown in the next section.


The Alteryx Solution

In the future, I might add a section to this article that shows how this can be done in Tableau (see next section). However, for now, I’m going to show how this is done in Alteryx because it is what I have already completed and I am really short on time.

Figure 5 is the workflow that is used to complete the mapping from lat/long to the US States. The shapefile shown in the lower left part of the workflow (cb_2016_us_state_500K.shp) can be obtained by clicking here. If you want zip codes shapefiles (click here) or county shapefiles (click here), you can read my previous articles. Another useful article on this topic includes some techniques for cleaning geographical data (click here). To do the spatial matching operation in Alteryx requires the geographical package add-on to the desktop.

US_state_mapping

Figure 5 – An Alteryx workflow to add state names and abbreviations to a database that starts off with only latitude and longitudes.


 

I explain how this works in the following short video.


Doing This Type of Operation With A Custom Program

If any readers want to practice building an advanced tool for Tableau, you can convert this custom C code to a python-based function that can determine if a point is inside a polygon. In this example, I chose to map 45,000 points into the states of occurrence. If you choose to download that C code, there are some other goodies included, such as a polygon file for all US state boundaries stored in csv format.

In the custom program, I chose to use the Illinois state boundary as a test case. Figure 6 shows that there is an exact match of 1,968 marks shown in upper right panel (custom C program) and the lower left panel (Alteryx). The lower right panel shows how all 45K points get mapped into their respective states (Alteryx).

Adding_State_to_Data

Figure 6 – Comparing the results from the custom C program to Alteryx. There was an exact match of 1,968 points found in IL. (Click for a very high-resolution version of the graphic)


 

Figure 7 shows the execution of custom program called poly.c. The code runs within a second.

Program_Execution

Figure 7 – The program execution of the program poly.c, which checkes to see if a series of points are included in a polygon boundary.


 

The concept for using this in Tableau (ver 10.3+) would be to ingest a shapefile, which are the polygon definitions (the array Xb,Yb).  You send to the InsidePolygon function an array of points and the point to check (Xp,Yp). If the function returns a 1, the point is inside the polygon. Many thanks to my buddy for providing this function and helping me with its implementation.

int InsidePolygon(double*Xb,double*Yb,int n,double Xp,double Yp,double Xm,double Xx,double Ym,double Yx)
{
int above1,above2,i,right=0;
double X1,X2,Y1,Y2;
if(Xp<Xm)
return 0;
if(Xp>Xx)
return 0;
if(Yp<Ym)
return 0;
if(Yp>Yx)
return 0;
X2=Xb[n-1];
Y2=Yb[n-1];
above2=Y2>Yp?1:0;
for(i=0;i<n;i++)
{
X1=X2;
Y1=Y2;
X2=Xb[i];
Y2=Yb[i];
above1=above2;
above2=Y2>Yp?1:0;
if(above1==above2)
continue;
if(X1>Xp&&X2>Xp)
right++;
else if(Y1<Y2)
{
if((Xp-X1)*(Y2-Y1)<(X2-X1)*(Yp-Y1))
right++;
}
else if(Y1>Y2)
{
if((Xp-X1)*(Y2-Y1)>(X2-X1)*(Yp-Y1))
right++;
}
}
return(right&1);
}

 

Future Work

If can find some future time to complete a Python equivalent, I’ll develop and demonstrate how to use this function within Tableau.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.