In focus

Location, location, location: why geospatial data science matters for investors


Data science has been an established part of the investors’ toolkit at Schroders since the Data Insights Unit (DIU) was established in 2014. Geospatial data science is a sub-set of this field which combines an understanding of location, distance and geography with traditional data science skills such as programming and statistics, with a view to uncovering new ways of looking at the world.

At Schroders, geospatial data science is one specialisation of many within the DIU. The team provides a research service to help Schroders’ investors extract actionable insight from alternative and big data, helping them to make better investment decisions. Geospatial data science is now part of the range of resources available to our investors, who are looking at companies as varied as supermarkets, industrial factories and wind turbine manufacturers. Here we will explore the history behind geospatial data science, the evolution of geospatial techniques, and some case studies of applied geospatial analysis in investment.

The birth of geospatial analysis

Geospatial analysis is not a new technique. One of the earliest examples dates back to the 19th century. In 1832, cartographer Charles Picquet created a map to show the extent of a cholera outbreak in the districts of Paris.

Geospatial-heatmap-363354.jpg

This was an incredibly important step forward, as it showed the physicians of the day the locations of the spread of the disease through the population.

Inspired by Picquet, John Snow, a leading British physician, adopted the same principle during a severe cholera epidemic in London in 1854. An outbreak hit Soho in August; over the course of three days, 127 people died from the disease. By September, over 500 had died. 

Snow evolved Picquet’s concept by delivering an argument supported entirely by spatial data.  He plotted the locations of the local water wells, and the houses who lost people to cholera, and identified the relationship between them. He presented this relationship to the local council, arguing that the Broad Street pump was contaminated. Convinced, the local council removed the handle, disabling the pump. Snow was proved right, as the outbreak was quickly brought under control. This map is often cited as one of the earliest known examples of using geographic inquiry to understand a health epidemic.

Geospatial-choleramap-363354.jpg

These studies laid down the principles of geospatial data science that are still in use today – namely that some data only make sense when presented or interpreted on a map. However, the toolkit has expanded beyond hand-drawn maps – today, the Geographic Information System  (GIS) is one of the geospatial scientist’s most important tools. Geospatial data science involves much more than looking up locations on Google Maps or Citymapper. 

Fast Forward to the present day…

Today’s geospatial data scientists have access to an ever-increasing variety of datasets, and are armed with sophisticated techniques to tease out the information and insights from them. Datasets range from points of interest, such as the location of airports, hospitals, shops or ATMs , to the demographic characteristics of individual neighbourhoods, to ”derived” data products  that might plot drive times between customers and stores.

This explosion in the quantity of geospatial data is down to technological advances which have made the gathering and processing of satellite imagery and mobile phone GPS traces standard practice. Data scientists can also drill into the growing  number of devices connected to the ”internet of things” – be they fridges or freight vehicles. It’s also becoming quicker to discover, download and dissect this data, connecting more and more data producers and data consumers.

Common formats for mapped data

Modern geospatial data comes in two main types:

  • Vector: composed of sets of mapped points, lines or polygons, this is the format of choice scattered or irregular data sets. Store locations, administrative boundaries, or catchment areas would all be examples of vector data.
  • Raster: by contrast, raster data are grids of information, and look rather like a chess board overlaid on a geographical area. This is the format of choice for remotely-sensed data, such as satellite imagery, where each pixel has a value that can be interpreted.

Geospatial-vector-ratser-data-DB-363354.jpg

How data is crunched

The geospatial industry has undergone considerable change in recent years, driven by the open source community.. Software programmes have emerged that are essential for analysing raw data that is far too big for regular tools like Excel. PostGIS[1] has emerged as a winner for data storage and manipulation, and QGIS for cartography and visualisation. Python, and to a lesser extent, R, have powerful open-source libraries for handling vector and raster data. These languages make up the tools of the majority of data scientists, and are used for data extraction, manipulation and analysis. 

How we use geospatial findings

Geospatial analysis is used widely within Schroders across multiple investment desks.  This has included:

  • analysing the rollout of broadband across the UK
  • investigating a grocery retailer’s growth opportunities in Turkey
  • understanding the joint market share for a combined entity after the merger of two companies
  • creating risk exposure models to score real estate asset exposure to a range of natural disasters, including hurricanes, earthquakes and landslides
  • validating company assumptions in relation to weather, e.g. low footfall
  • understanding a company's footprint in terms of demographics and geographical reach
  • estimating market share within various regions and countries

Geospatial data science in action

The techniques used to deliver this analysis are varied. Here are three real examples that illustrate three types of powerful analysis:

Gravity models – A gravity model provides an estimate of the volume of flows of, for example, goods, services, or people between two or more locations. This could be the movement of people between cities, or the volume of trade between countries.

In November 2018, The Restaurant Group proposed to take over Wagamama and its existing stores.  There were questions around the prospect of an overall reduction in market share between the two combined entities.

Using demographic data and store catchments, the affluence of local populations was mapped to each Wagamama and Restaurant Group location. This analysis showed that Wagamama’s locations were in complementary positions to the Restaurant Group portfolio, and that different kinds of customers went to different restaurants. This meant that the combined entity would be unlikely to cannibalise clientele of the existing Restaurant Group locations.

Geospatial-wagamama-363354.jpg

Customer affluence table

Geospatial-affluence-363354.jpg

Source: Schroders Data Insights Unit, October 2018. Securities shown are for illustrative purposes only and should not be viewed as a recommendation to buy or sell.

Accessibility analysis – there are many cases when looking at a company or an entire sector where it is useful to identify which locations are well connected, and those which are not. Isochrones are a good way of expressing this. An isochrone describes the set of places which can be reached within a certain time (e.g. 30 minutes) from a single starting point. This is particularly useful for estimating the catchment areas of stores, or the number of amenities that can be reached from a hotel or residential building.

Construction starts in the US

When new construction or infrastructure projects are planned for a community such as housing, shopping centres or hospitals, many companies might experience a surge in demand for their products or services. One such sector is industrial equipment leasing. This is where companies lease out tools such as excavation equipment, diggers and mobile power generators.

Using the locations of industrial equipment rental stores and county-level data on new construction starts,  Schroders’ DIU was able to calculate the distance of each store to the location of new construction starts. Based on these results, the team were able to identify the equipment rental companies that were best placed to benefit from increasing demand.

A  proprietary tool was created to view hypothetical merger situations between equipment rental companies and what market share the combined entities would have, state by state.

The analysis highlighted that BlueLine was an attractive acquisition target for rival firms, given its market share in areas experiencing high construction growth. The map below shows the percentage of the market that the BlueLine and United Rentals estate would have if they were to merge.

Geospatial-bluerental-363354.jpg

Source: Schroders Data Insights Unit, September 2018. Securities shown are for illustrative purposes only and should not be viewed as a recommendation to buy or sell.

In September 2018, it was announced that United Rentals would acquire BlueLine from Platinum Equity.

Zonal statistics – these are statistics derived for a catchment area, like a country, state, or neighbourhood. They are used to characterise differences between locations. Census data are an example of zonal statistics applied to demographics – but less obvious applications are also possible, such as measuring the exposure of different neighbourhoods in cities to flood risks or landslides.

Risk modelling

Another greenfield of untapped potential is risk modelling. The insurance industry has long made use of sophisticated geospatial modelling to calibrate premiums. This has included modelling of natural disasters, including climate events such as flooding, hurricanes and wildfires, to geophysical events like landslides, earthquakes and volcanic eruptions.

However, traditional asset management has rarely made serious efforts to get to grips with the exposure of different companies, cities or individual assets to these events, despite the implication for investment returns.

The DIU works with investment teams to cover this blind spot, using a mixture of remote sensed and traditional alternative data. Measuring cities’ exposure to natural hazards, as well as their environmental impact and sustainability, is crucial to identifying which will be long-term winners. Delivering this consistently for cities worldwide requires data with global coverage, gathered to the same standards, at approximately the same time – making remotely-sensed geospatial data the natural choice. 

The map below uses 100 years of data to show where the eye of major tropical storms landed in the US, taking the path of the storm and scoring it based on intensity of event. Locations that are more regularly impacted by higher intensity storms will have a higher score compared to locations that experience storms of a weaker intensity. This is a useful metric to be able to calculate risk for businesses in these regions.

Geospatial-tropical-storm-363354.jpg

Eyes in the sky: Focus on satellite data

This type of data is produced by satellites, or in-orbit projects like the International Space Station. Since the mid 20th century, satellites have regularly collected a range of useful data, much of which is beyond the human visual range.

  • Elevation – global elevation data measures the elevation of most of the planet by bouncing radar waves off the earth’s surface
  • Biosphere health – quantifying the health of vegetation and crops, the progression of deforestation, and the water content of topsoil, by looking at infra-red absorption and other properties
  • Urbanisation – observing the progression of human development, often by looking at the change in intensity of lights at night

Thanks to the open-sourcing of some of this data by NASA and the European Space Agency’s Copernicus program, commercial application of this data is increasingly possible. Much unlike traditional geospatial data, remote sensed data often requires some significant refinement and processing to make insights available.

Conclusion

Geospatial data science is a fast growing area of data science and is rich with potential insights for investors. It offers a unique method of viewing a sector or companies in ways that have not traditionally been used in the asset management industry. As both the data sets involved and the skills required to tease out insights have multiple use cases, the discipline has become a valuable asset to investment teams. Due to technological advances and the increasing number of data sets available, we believe that this discipline will only grow in importance in the future.