airpurifyr: Open Air Quality in R

M. J. Lydeamore, D. Wu, J. P. Lakshika

23 October 2024

Air Quality

All sorts of processes release particles into the air

These measurements together form “air quality”.

Has been linked with health conditions, life expectancy, mental conditions, poorer economic outcomes, global development indexes…

Air Quality

Typical measurements:

  • pm2.5/pm5: Particles that are 2.5/5 microns. Typically come from fires, industry, car exhausts etc
  • so2: Sulfur Dioxide. Typically oil refineries, diesel vehicles, coal power stations
  • o3: Ozone. Typically bushfires, power stations. We want ozone but not too much!
  • no2: Nitrogen dioxide. Typically car exhausts.
  • co: Carbon monoxide. Typically Wood smoke, car exhausts.

Uses

Public Health

  • Association of Changes in Air Quality With Incident Asthma in Children in California, 1993-2014 (Garcia et. al, JAMA 2019)
  • Uncertainty and Variability in Health-Related Damages from Coal-Fired Power Plants in the United States (Levy et. al, Risk Analysis 2009)

Uses

Economics

  • A cost-effectiveness analysis of alternative air quality control strategies (Atkinson & Lewis, Journal of Environmental Economics and Management 1974)
  • Cost of economic growth: Air pollution and health expenditure (Chen & Chen, Science of the Total Environment 2021)

…and more

Air Quality

So people need this. But how?

  • Ad-hoc
  • Single dataset/source
  • Range of dates/times
  • “Whatever is available”

So how is it collected?

Some (in this case 61%) governments have programs to collect this data. Much of the global data is sought from citizen science projects.

An air quality sensor

OpenAQ

OpenAQ is an environmental tech nonprofit.

Aggregate and harmonize open air quality data from across the globe onto an open-source, open-access data platform

Freely available API and data explorer

The package

airpurifyr brings this API into R.

  • Uses httr (will one day be ported)
  • v2 OpenAQ API (deprecated 18 days ago 😭)
  • Requires a free API key

Package example

API effectively works on locations and measurements

australia_measurements <- get_measurements_for_location(
  country = "AU",
  max_observations = 1000,
  date_from = lubridate::ymd("2020-01-01"),
  date_to = lubridate::ymd("2020-01-14"),
  parameter = "pm25"
)

australia_measurements
# A tibble: 19,032 × 9
   location_id location parameter value date_utc            unit    lat  long
         <int> <chr>    <chr>     <dbl> <dttm>              <chr> <dbl> <dbl>
 1        2487 Bathurst pm25       18.9 2020-01-14 00:00:00 µg/m³ -33.4  150.
 2        2487 Bathurst pm25       19.4 2020-01-13 23:00:00 µg/m³ -33.4  150.
 3        2487 Bathurst pm25       20.2 2020-01-13 22:00:00 µg/m³ -33.4  150.
 4        2487 Bathurst pm25       20.9 2020-01-13 21:00:00 µg/m³ -33.4  150.
 5        2487 Bathurst pm25       22.1 2020-01-13 20:00:00 µg/m³ -33.4  150.
 6        2487 Bathurst pm25       23   2020-01-13 19:00:00 µg/m³ -33.4  150.
 7        2487 Bathurst pm25       23.7 2020-01-13 18:00:00 µg/m³ -33.4  150.
 8        2487 Bathurst pm25       24.6 2020-01-13 17:00:00 µg/m³ -33.4  150.
 9        2487 Bathurst pm25       25.3 2020-01-13 16:00:00 µg/m³ -33.4  150.
10        2487 Bathurst pm25       25.8 2020-01-13 15:00:00 µg/m³ -33.4  150.
# ℹ 19,022 more rows
# ℹ 1 more variable: country <chr>

Package example

Important

You will need to aggregate this data - sensors often report times to the second and may be slightly off!

lubridate::floor_date is great for this

Package example

Package example

locations_of_interest <- australia_measurements |>
  # East coast of Australia (roughly)
  dplyr::filter(long > 141, lat < -31) |>
  dplyr::distinct(location) |>
  dplyr::pull()

au_east_coast_2020 <- get_measurements_for_location(
  country = "AU",
  location = locations_of_interest,
  max_observations = 10000,
  date_from = lubridate::ymd("2019-12-01"),
  date_to = lubridate::ymd("2020-02-01"),
  parameter = "pm25"
)

Package example

states <- ozmaps::ozmap_states |>
  filter(NAME %in% c("New South Wales", "Victoria"))

stations <- au_east_coast_2020 |>
  distinct(lat, long)

ggplot(states) +
  geom_sf() +
  geom_point(
    aes(x = long, y = lat), 
    data = stations
  ) +
  theme_bw() +
  labs(x="Longitude", y="Latitude") +
  coord_sf()

Examples

melb_weather <- get_data_drill(
  latitude = -37.8,
  longitude = 145,
  start_date = "20200101",
  end_date = "20200630",
  values = "all"
)
x <- get_data_drill(
  latitude = -37.8,
  longitude = 145,
  start_date = "20200701",
  end_date = "20201231",
  values = "all"
)
melb_weather <- bind_rows(melb_weather, x)

Examples

High pressure \rightarrow temperature inversion \rightarrow more ozone loss

Industrial fires

Thanh Cuong Nguyen & Arun Krishnasamy

Industrial fires

On July 11, there was a major factory fire in Brooklyn.

Nearby sensors

Pooja Rejendran Raju & Thi My Ngoc Tran

Nearby sensors

We can check geographical coherence

Rush hour

Namandeep Kaur Saluja & Rowshni Farnaz Fatema

Rush hour

Can we pick up extra pollutants from the weekday “rush-hour” in Melbourne CBD?

Summary

  • OpenAQ gives reasonably clean air quality data
  • airpurifyr helps bring this into R
  • Plenty of hypotheses ready to explore

Next steps:

  • v3 API
  • httr2
  • More convenience cleaning/aggregating

Available at https://github.com/numbats/airpurifyr