Introduction
Little did I realize over 4 years ago that this simple article would become my most popular article ever. Much to my amazement, it has been viewed over 13,000 times and continues to be popular over 4 years later, as shown in Figure 1!
The Title Mistake
Even with all that activity and the countless times I read that article, I never knew that I made a mistake in the title until a few minutes ago. Can you see the mistake, which is shown in Figure 2? This just cracks me up! Maybe I should make mistakes in all my titles!
Although I never expected that type of readership to happen for that article, sometimes simple techniques can be very powerful and have widespread usage. When you need to plot data based on some type of geography, having various techniques at your disposal is a necessity, especially for data that needs to be visualized at a high level of detail.
Defining The Inverse Geographical Problem
In this case, I will explain how to solve the inverse problem. We start with a data set that contains accurate location information about where and when an event took place (i.e., I have the latitude and longitude of the event), but I don’t know anything else about it. Maybe I want to add the state name to the record, the county of occurrence or even the zip code region the event occurred in. In these situations, I know the finest location details, but now I need information about some type of larger-scale geographical entity.
In this situation, Figure 3 shows the data that I start with where I know where and when an event occurred in terms of latitude and longitude. This type of data is typically created by a device with a GPS (global positioning unit) installed. Since there are now zillions of these running around, this type of data is becoming more common.
Figure 4 shows the data after I complete the operation to add the state abbreviation and name to the records. I needed this to be able to filter or aggregate my data in Tableau based on the state of occurrence. Having this level of control is very advantageous for large datasets.
The Alteryx Solution
In the future, I might add a section to this article that shows how this can be done in Tableau (see next section). However, for now, I’m going to show how this is done in Alteryx because it is what I have already completed and I am really short on time.
Figure 5 is the workflow that is used to complete the mapping from lat/long to the US States. The shapefile shown in the lower left part of the workflow (cb_2016_us_state_500K.shp) can be obtained by clicking here. If you want zip codes shapefiles (click here) or county shapefiles (click here), you can read my previous articles. Another useful article on this topic includes some techniques for cleaning geographical data (click here). To do the spatial matching operation in Alteryx requires the geographical package add-on to the desktop.
I explain how this works in the following short video.
Doing This Type of Operation With A Custom Program
If any readers want to practice building an advanced tool for Tableau, you can convert this custom C code to a python-based function that can determine if a point is inside a polygon. In this example, I chose to map 45,000 points into the states of occurrence. If you choose to download that C code, there are some other goodies included, such as a polygon file for all US state boundaries stored in csv format.
In the custom program, I chose to use the Illinois state boundary as a test case. Figure 6 shows that there is an exact match of 1,968 marks shown in upper right panel (custom C program) and the lower left panel (Alteryx). The lower right panel shows how all 45K points get mapped into their respective states (Alteryx).
Figure 7 shows the execution of custom program called poly.c. The code runs within a second.
The concept for using this in Tableau (ver 10.3+) would be to ingest a shapefile, which are the polygon definitions (the array Xb,Yb). You send to the InsidePolygon function an array of points and the point to check (Xp,Yp). If the function returns a 1, the point is inside the polygon. Many thanks to my buddy for providing this function and helping me with its implementation.
int InsidePolygon(double*Xb,double*Yb,int n,double Xp,double Yp,double Xm,double Xx,double Ym,double Yx)
{
int above1,above2,i,right=0;
double X1,X2,Y1,Y2;
if(Xp<Xm)
return 0;
if(Xp>Xx)
return 0;
if(Yp<Ym)
return 0;
if(Yp>Yx)
return 0;
X2=Xb[n-1];
Y2=Yb[n-1];
above2=Y2>Yp?1:0;
for(i=0;i<n;i++)
{
X1=X2;
Y1=Y2;
X2=Xb[i];
Y2=Yb[i];
above1=above2;
above2=Y2>Yp?1:0;
if(above1==above2)
continue;
if(X1>Xp&&X2>Xp)
right++;
else if(Y1<Y2)
{
if((Xp-X1)*(Y2-Y1)<(X2-X1)*(Yp-Y1))
right++;
}
else if(Y1>Y2)
{
if((Xp-X1)*(Y2-Y1)>(X2-X1)*(Yp-Y1))
right++;
}
}
return(right&1);
}
Future Work
If can find some future time to complete a Python equivalent, I’ll develop and demonstrate how to use this function within Tableau.