Juan Huerta is a contributing author to Making Data Meaningful. He is currently a Senior Data Scientist at PlaceIQ where he focuses on location-based analytics. Juan was a speaker at the 2013 Business Intelligence Symposium where he spoke on his work of taking large amounts of structured and unstructured data and how he extracts patterns, trends, intelligence and context from this data. He holds a PhD from Carnegie Melon University and resides in the Greater New York City area.
The availability of data incorporating location information is growing. This influx of data has been affected by the emergence and broad adoption of mobile, the intersection of diverse streams of information, the abundance of data-generating and location-enabled devices, and the availability of tools and techniques to extract insights from this type of data, among other things.
Location-annotated data is increasingly abundant.
In addition to its abundance, location information has proven its value as a proxy for human behavior. Location is a primary marker of consumer intent, both at an individual and segment level. The ebbs and flows resulting from constant movement of mobile devices provide us with a picture from which patterns and insights can be extracted.
Because of these characteristics, it is not surprising that there is an increasing interest across industries in attaining movement-based consumer insights. Marketers, analysts, and decision-makers are realizing the value of this type of data in delivering new types of consumer insights. The possibilities promised by the juxtaposition of information streams relating location, movement, demographics and behavior, are truly exciting.
At the same time, because of its particular dynamic and large-scale nature, the mechanisms, tools and abstractions available for general data do not seem to suffice. Customized approaches to leverage this data are necessary.
Here are a few considerations we need to make when approaching this domain:
Data: We must consider the nature of the data. More specifically, where is the location-related data coming from? To better understand this data, we can categorize it into two types – static and dynamic (i.e., movement data). Static data includes census-related data, satellite photography and maps, business listings, and so on. Dynamic data includes events that occur and are registered as consumers move around in their daily lives. The most important source of dynamic information is the data generated by mobile devices. As ad requests are generated, devices leave a “digital footprint,” or trace, of ad-requests generated. Billions of these requests are generated daily by mobile devices. That’s some big data.
Framework of reference: After provisioning the data, we need a way of organizing it. Given its enormous volume and heterogeneous nature, how can one make sense of it all without being overwhelmed? To help us in this task, a grid-based frame of reference is very useful. In this way we can aggregate, or tally, the location types that are located within a tile. Static information is anchored in this grid, while dynamic information is overlaid on this grid. It is easier to perform searching and averaging with this system. Similarly, time can be discretized.
Abstractions: We must have the adequate abstractions to join the dynamic data with the static data. One powerful abstraction is the audience. An audience is a segment of the population with homogeneous patterns of behavior. Naturally, when working with location information, location should substantially inform our audience design. Audiences, in turn, can be designed and characterized by the information they convey – for example, audiences based on location categories, audiences based on behaviors, and audiences enriched by the use of contextual features.
Algorithms, Metrics and Analysis: After the data and audiences are in place, it is necessary to have the right location-information processing algorithmic pipelines as well as adequate metrics. In the case of location, an example of a very powerful measurement is Place Visit Rate (PVR™), which measures the percentage of people that visit a location of interest during a given period of time. When applied to marketing campaigns, we can focus on the PVR™ lift that a campaign attains.
Bringing it all together: Once the right data, abstractions, and algorithms are in place, it is possible to address questions that were once difficult – or even impossible – to answer. For example, if we were interested in understanding and analyzing the drivers to purchase for a particular retailer, we could focus on measuring and characterizing the PreVisit behaviors in terms of audiences (i.e., “where was this particular group observed before they shopped at X”). Additionally, we could focus on response rates in terms of PVR™ lift for different behavioral audiences.
Not only does location-based analytics enable us to answer these types of questions, but it also opens the door to many new types of analysis. The possibilities are truly endless.