Measuring Temperature Anomalies Like a Scientist
In our previous article we explored the National Oceanic and Atmospheric Administration (NOAA) Global Surface Summary of Day Weather Data (GSOD) data, available in the BigQuery public dataset. We looked at how to write simple SQL queries to explore the data, as well as utilizing PowerBI to visualize the station distribution over time and the temperature difference by country. This article will focus on identifying outliers as temperature anomalies based on how scientists define a temperature anomaly.
One method for identifying temperature anomalies is to compare the daily temperature for a station against the average temperature for the station and day over a 30-year period (1951-1980). The differences would be shown as positive values, which show that the temperature was higher, and negative values indicating a lower temperature. Then the daily temperature differences are averaged by month by station. Additionally, the monthly station values can be averaged by an area making a five-by-five-degree grid based on the longitude and latitude of the station. By using this method, an analyst can account for gaps in the data. The GSOD dataset is not the only data set used by scientists to measure climate change. It is used in conjunction with the NOAA ICOADS dataset that includes temperature collections from ships and buoys with records as far back as 1662. This dataset is also available in BigQuery public datasets. Other data sources include data from the NASA Goddard Institute for Space Sciences, the Met Office Hadley Center Climate Research Unit, and the Japanese Meteorological Agency datasets.(1)(2)
For this article series, we will stay focused on the GSOD dataset and apply the same standard scientists use by looking at the average temperature for each day and station from 1951-1980, then averaging the difference by month. Using SQL, we can create a table in BigQuery that handles the calculations needed to identify the temperature anomalies. The data can then be accessed from PowerBI and a visualization can be created to show the average temperature difference by location and year.
The data led us to develop the visualizations below. As shown, the average temperature anomaly difference seems to increase, and decrease based on the area. Europe and a portion of Russia show an average decrease in 2010, but increases in 1990, 2000, and 2020. Overall, 2020 shows more of an average increase in temperature anomaly differences.
Globally, the trend also indicates an increase in the average anomaly temperature differences.
To conclude, in this article we have looked at some methods and datasets used by scientists to determine temperature anomalies. We then applied the methodology to the NOAA GSOD dataset that we accessed in BigQuery and created a view that we connected to with PowerBI. Then using PowerBI, we created visualizations of the temperature anomalies over time and by location. The visualizations show that there is an overall increase in temperature over time, and that the station location and year have an impact on the variance in temperature anomalies.
Are there other variables that affect the variance in temperature anomalies? Is there a way that we can see the variables that have the most impact on temperature anomalies? Can we predict temperature anomalies? In the next article we will look at whether we can predict temperature anomalies using the random forest regression algorithm. We will also look at adding more information into our dataset by joining to other BigQuery public datasets. Finally, we will identify the variables that act as the best predictors and create some visualizations that show the relationships.
References: (1) CarbonBrief, Explainer: How do scientists measure global temperature? | Carbon Brief (2) NASA Earth Observatory, World of Change: Global Temperatures (nasa.gov)