From Ürümqi to Minneapolis: Clustering City Climates with Self-Organising Maps
By: Niall McCarroll
As a Research Software Engineer, my job involves developing, testing and maintaining software that scientists can use to analyse earth observation and climate data. Recently I’ve been developing some software that can be used to visualise climate data. A Self-Organising Map is an artificial neural network algorithm invented in the 1980s by Finnish scientist Teuvo Kohonen. Artificial neural networks are computer programs which attempt to replicate the interconnection of neurons in the brain in order to learn to recognise patterns in input data. The Self-Organising Map algorithm helps us compare items that are described by a list of many data values, by plotting them on a two-dimensional map such that items that have similar lists of data values appear closer together on the map. By doing so, we are clustering similar items together.
To help me test the software I chose a simple example task to solve, in a domain that I can easily understand. Suppose that we would like to compare the climates of many different cities. City location data was obtained from https://simplemaps.com/data/world-cities. We can obtain climate data from the global meteorological dataset ERA5 released by the European Centre for Medium Range Weather Forecasts (ECMWF). ERA5 includes mean monthly estimates of air temperatures over land (Muñoz Sabater, J., 2019). From this we can calculate the monthly mean temperatures from a 20km square area containing each city we’d like to compare, for the years from 2000 to 2021. I prepared a dataset of 120 large cities with the series of 12 monthly mean temperatures at their locations from the ERA5 data.
We could easily base our climate comparison on single data values, for example the mean annual temperature around each city, but that would miss some important differences. For example, Belo Horizonte (Brazil) and Houston (USA) have very similar annual mean temperatures according to this dataset, but widely different seasonal variations in their temperatures – we could not say that they enjoyed a similar climate.
Instead, we can use the Self-Organising Map algorithm on this data to plot each city onto a “climate map” (Figure 1) where cities that have similar monthly mean temperature patterns should be clustered closer together on the climate map. The original location of cities on a conventional world map is ignored. You’ll see that the climate map is divided into hexagonal cells to which cities are allocated by the algorithm. I have coloured each cell according to the mean annual temperature of the cities placed by the algorithm into that cell. Blank cells happen to have no cities from the test dataset allocated – but cannot be considered to represent areas like oceans or ice caps on a conventional map where cities cannot exist.To test the software, we need to consider whether the algorithm has made a reasonable attempt to place the cities from our dataset into clusters in our climate map. For those cities with which I am familiar, the map does appear to have clustered cities with similar temperature patterns together. The map colours indicate that we see larger regions made up of multiple cells containing generally warmer or cooler climates. In most but not all cases, cities from the same original region appear nearby in the new map – intuitively we would expect this.
We can plot the temperature patterns for cities that are clustered close together in the new map and check that the patterns are similar. This gives us some confidence that the software may be working as expected. Figure 2 shows plots of the two cities, Minneapolis (USA) and Ürümqi (China) located in the same cell (highlighted in Figure 1) in our self-organising map. You can see that the variation In monthly mean temperatures are similar.
This simple dataset has been useful for testing my implementation of the Self-Organising Map algorithm. For a more realistic comparison of climates as we experience them, we would need to expand our dataset to consider other variables such as rainfall, snowfall, wind, humidity and consider how temperatures vary between day and night. I hope this post has helped to explain what Self-Organising Maps can be useful for, in the context of understanding climate data.
Muñoz Sabater, J., (2019) was downloaded from the Copernicus Climate Change Service (C3S) Climate Data Store.
The results contain modified Copernicus Climate Change Service information 2023. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains.
Muñoz Sabater, J., 2019: ERA5-Land monthly averaged data from 1981 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), accessed 06 January 2023, https://doi.org/10.24381/cds.68d2bb30