Overview
This repository contains tools and datasets related to the analysis of GHCN (Global Historical Climatology Network) data. The main focus is on extracting and analyzing brightness information (BI) and built-up area (BU) for climate stations, using datasets from Google Earth Engine (GEE). The repository contains multiple files, each with a specific purpose:
- GHCNv4_stations_with_BI_BU_orwell2022.csv: This file contains metadata and analysis results for GHCN stations, including BI and BU values.
- Extract_BI_for_GHCN.ipynb: A Jupyter notebook that details the process of extracting brightness information for GHCN stations.
- GHCN_US-vs-global.ipynb: A Jupyter notebook that compares temperature trends between US stations and global stations.
- Plot_T_anomalies_GHCN.ipynb: A Jupyter notebook that compares temperature trends with various filter and binning options (BU…)
Fields Explained
Built-up Area (BU)
- BU values represent the built-up area percentage around each station, indicating urbanization levels. The data is derived from the Global Human Settlement Layer (GHSL) dataset.
- The BU values are available for the following years: 1975, 2020.
- BU Source: Global Human Settlement Layer (GHSL)
- BI values represent the average brightness around each station, extracted from satellite night-time lights data. This data helps in understanding the level of human activity and light pollution around the station.
- Note: The BI values from older versions (like NASA-GISS) set all BI < 5 to 0, which might result in skewed data representation. The current BI values have been updated for accuracy.
- BI values are provided for the years 2012, 2017, 2020, and 2023.
- Additional columns calculated from BI values:
- BI_2020_lsq: Linear least squares estimate for brightness in the year 2020.
- BI_trend_lsq: Trend of brightness values calculated using a linear least squares regression.This data helps in understanding the level of human activity and light pollution around the station.
- BI values are provided for the years 2012, 2017, 2020, and 2023.
- Additional columns calculated from BI values:
- BI_2020_lsq: Linear least squares estimate for brightness in the year 2020.
- BI_trend_lsq: Trend of brightness values calculated using a linear least squares regression.
- BI Source: NOAA VIIRS Nighttime Lights
Parameters in GHCNv4_stations_with_BI_BU_orwell2022.csv
The CSV file contains the following columns:
- ID: Unique identifier for each station.
- Station: Name of the station.
- USCRN_Y_N: Indicates if the station is part of the US Climate Reference Network (Yes/No).
- Lat: Latitude of the station.
- Lon: Longitude of the station.
- Elev-m: Elevation of the station (in meters).
- BI: Outdated brightness information from NASA-GISS (with values < 5 set to 0 by NASA obviously).
- Built_1975_50km_percent, Built_2020_50km_percent: Built-up area percentage at a 50km radius for the years 1975 and 2020.
- Percentage_Change_50km: Percentage change in built-up area at a 50km radius from 1975 to 2020.
- Built_1975_10km_percent, Built_2020_10km_percent: Built-up area percentage at a 10km radius for the years 1975 and 2020.
- Percentage_Change_10km: Percentage change in built-up area at a 10km radius from 1975 to 2020.
- Built_1975_2km_percent, Built_2020_2km_percent: Built-up area percentage at a 2km radius for the years 1975 and 2020.
- Percentage_Change_2km: Percentage change in built-up area at a 2km radius from 1975 to 2020.
- USCRN_Station: Whether the station is part of the US Climate Reference Network.
- BI_2012, BI_2017, BI_2020, BI_2023: Brightness values for different years.
- BI_change_decade: Decadal change in brightness values from 2012 to 2023.
- BI_2020_lsq: Linear least squares estimate for brightness in 2020, calculated using all available BI values.
- BI_trend_lsq: Linear trend of brightness over the years, derived using linear regression.
GEE Data Sources
GHCN_US-vs-global.ipynb
Notebook
This notebook provides a comparative analysis of temperature trends between US stations and global stations, using GHCN data. It includes visualizations and statistical comparisons to understand how temperature trends differ regionally and globally.
How to Use
- To analyze brightness or built-up area data, use the
Extract_BI_for_GHCN.ipynb
notebook, which provides step-by-step instructions on extracting and processing the data.
- The CSV file
GHCNv4_stations_with_BI_BU_orwell2022.csv
can be used directly for statistical analysis or visualization in any tool of choice, such as Python (Pandas), R, or Excel.
- Use the file as meta data file for the GHCNv4 T file by NASA-GISS. Merge by station ID
Some pictures and examples


Ineractive map showing all stations which deviate being BI = 0-0.1 both in old (NASA) and new (orwell2022) BI analysis.

Mapping Brightness and Built-Up Indexes Across the Globe
In this analysis, we explore how NASA’s Brightness Index (BI) compares to actual built-up areas around meteorological stations. Using data from NASA’s GHCNv4 and VIIRS/NOAA, we mapped stations with minimal urbanization (BU_10km ≤ 1%).
Colors Explained:
Purple: Stations with a NASA flagged BI above 15.
Red: Stations with a a NASA flagged BI between 10 and 15
Blue: Stations with a a NASA flagged BI 6-10,
Despite these classifications, all stations shown here have both BI_2020 < 1 and BU_10km < 1%, meaning they are in deep rural areas. This highlights how NASA’s BI misclassifies these locations, suggesting the need for improved metrics (use of GHSL_S etc) to distinguish urban from rural areas.
Footnote:The colors are used to quickly highlight the most extreme misclassifications. All points are rural, with BU_10km < 1%.

Correlation plot BI_2020_sqr versus BU_2km, colored by old BI values. 🔴 for BI>6.
Brightness Index (BI) is nearly a random number generator, failing to rigorously separate Rural 🔴 from Urban ⚫️ stations (“R”, “U” mark the ensemble’s weight centers). This is why no difference appears in the temperature curves either. Proper GHSL data should be used.

Trend lines (added feature on Nov 6th 2024, due to US Election Boredout Syndrome)
Motivation - to compare with Java. E.g. here

What can you do with this?
Extract trend slopes for various BU bins with a flexible start point.

The analysis on the right is done ad-hoc (just feed GPT with the trend values and do the plot). It depends on the experiment perfromed.

Comaring adjusted versus raw.

#Note on output file GHCNv4_stations_with_BI_BU_orwell2022.csv
https://github.com/orwell2024/GHCN-tools/blob/main/GHCNv4_stations_with_BI_BU_orwell2022.csv
The image provides a global map of GISS GHCN stations, comparing them against the GHSL built-up land data retrieval process.
Gray dots represent all existing stations in the dataset.
Red dots highlight 386 stations missing from the fetched GHSL dataset, meaning built-up data could not be retrieved for these locations.
The missing stations are primarily located in polar regions (e.g., above ~70°N and Antarctica), suggesting a limitation in GHSL data coverage for extreme latitudes.
This issue could be due to:
GHSL dataset gaps where satellite-derived built-up data is unavailable in remote locations.
Gray dots: Existing stations.
Red dots: Stations missing from the fetched GHSL dataset.
Total stations in fetched dataset: 27,519.
Stations missing from the fetched GHSL list: 386.
