Verstand AI data experts look for ways to highlight data democratization in subject areas that would benefit from collective research. This three-part article series will focus on climate change data analysis and visualization and is an example of Verstand AI’s continued effort to undertake projects that have a positive impact on society. To that end, Verstand has turned its attention to climate change– a subject that has been discussed for the last few decades and has provided a basis for the increasing need for data collection and analysis.
In this article series, article one will focus on access to climate data with a few visualizations that show the observation locations [MA1] [TB2] [MA3] and average temperatures, as well as average temperature change by country. Article two will provide some insight on how scientists measure climate change with visualizations that look at temperature anomalies over time. And finally, article three will focus on adding features for analysis by joining to additional datasets and then dive into the application of a random forest regression algorithm to predict the temperature anomalies, find the best preforming features, and create visualizations to see the relationships.
Accessing Climate Data Made Easy In BigQuery
Google continues its mission with the implementation of public datasets in BigQuery. Making data more “accessible and useful”(1), Google publishes new datasets and provides free access through BigQuery with up to 1 terabyte of query processing per month. The more than 200 public datasets include data from a variety of sources. These range from GitHub commits, contents, and file information for open-source repositories, to human genome variants, and from bitcoin to blackhole data.(2)
This article will focus on one of the twenty-one National Oceanic and Atmospheric Administration (NOAA) datasets available, the Global Surface Summary of Day Weather Data (GSOD). This dataset includes climate observations from over 9000 stations around the world, starting in 1929 and updated daily. The GSOD dataset includes the average daily values for temperature, visibility, dew point temperature, and wind speed, as well as the minimum and maximum temperature, maximum gust, wind speed, snow depth, and precipitation amount.(3)
Accessing the dataset is made simple using BigQuery. Setting up an account is required, and free. Charges only apply once the 1 terabyte limit is reached. Using the search feature in the Explorer section, the noaa_gsod dataset is easy to find. Google includes a description of the dataset along with some basic information.
The tables are arranged by year and include keys that allow for joins to a stations table which stores the latitude and longitude for each station. Each table includes the schema information as well as a description for each column.
BigQuery is Google’s serverless, petabyte, SQL data warehouse. BigQuery includes standard SQL which makes it is easy to query across tables and create a quick summary, including a Pearson coefficient of correlation calculation on the average temperature compared to the latitude and longitude.
Looking at quick statistics tells us about the dataset overall. But, by using a visualization tool like PowerBI(4), we can easily see average temperatures over time and by location and even by country. PowerBI desktop can be downloaded for free and connects to BigQuery tables and views.
The below visualization shows the average temperature by location and decade starting in 1950. You can see that the number of stations has grown over the decades and the pattern of average temperatures seem to remain the same; with higher average temperatures based on the location’s distance from the equator. The closer the station is to the equator, the hotter the average temperature. Also, the number of stations dramatically increased in 1980. The increase in data collection can be seen with these visualizations.
The below chart shows the average temperature change by country. It is calculated by taking the station average temperature in 2020 and subtracting the average station temperature in 1950. Then the values are averaged based on the country where the station is located. There are significant differences between countries, with China showing an average decrease of over 6 degrees, and Armenia showing an average increase of over 14 degrees.
In summary, within this first article we have shown how to access the NOAA GSOD data in BigQuery, visualized the temperature on a map using PowerBI, we looked at the station distribution and temperature changes over time, and finally we analyzed the temperature difference by country from 1950 to 2020. What does this analysis tell us about climate and climate change? How do scientists interpret this data? In the next article we will be discussing methods and data sources that scientists use to identify temperature anomalies and assess climate change.
References: (1) Google Search “Our Approach to Search”, https://www.google.com/search/howsearchworks/mission/#:~:text=Our%20company%20mission%20is%20to,a%20wide%20variety%20of%20sources. (2) Google Cloud “Google Cloud Public Datasets”, https://cloud.google.com/public-datasets (3) Google Cloud Platform “GSOD”, GSOD – Marketplace – Google Cloud Platform (4) Microsoft Power BI Desktop, Power BI Desktop—Interactive Reports | Microsoft Power BI