Background

The following is an analysis of Environmental Protection Agency’s Greenhouse Gas Reporting Program. In 2009, the EPA published a rule (40 CFR Part 98) for the mandatory reporting of GHG from sources that “in general emit 25,000 metric tons or more of carbon dioxide equivalent per year.” According to the agency, around 85-90% of the country’s total emissions are covered by the 8000 facilities in the program.

Methodology

For this project, we focused on finding the largest emitters in Colorado. The latest available data covers 2011-2022. After importing the file, we filtered for the state and we normalized some of the text (like names and addresses) by making everything uppercase. We also replaced some NAs for 0s in the number fields to perform some calculations.

library(tidyverse)
library(readxl)
library(janitor)
library(data.table)
library(DT)

# import emissions data
data <- read_xlsx("01-data-raw/2022_data_summary_spreadsheets/ghgp_data_by_year_2022.xlsx", sheet = "Direct Emitters", skip = 3) %>% 
  clean_names() %>%
  filter(state == "CO") %>% # 189 obs
  mutate(facility_name = toupper(facility_name),
         city = toupper(city),
         address = toupper(address),
         county = gsub("COUNTY", "", county),
         county = toupper(county)) %>%
  mutate_at(vars(14:25), ~ ifelse(is.na(.), 0, .)) # convert NAs to 0s

This leaves us with 189 facilities in Colorado. A facility, as defined by the program, can be fully or partly owned by a company, but more on that later.

After the initial cleaning, we added the emissions over the entire data period. We also calculated each facilities’ contribution of emissions for the latest year (2022) and overall period (2011-2022).

# GHG data has units of metric tons of carbon dioxide equivalent
data <- data %>%
  mutate(historic_emissions = x2022_total_reported_direct_emissions +
           x2021_total_reported_direct_emissions +
           x2020_total_reported_direct_emissions +
           x2019_total_reported_direct_emissions +
           x2018_total_reported_direct_emissions +
           x2017_total_reported_direct_emissions +
           x2016_total_reported_direct_emissions +
           x2015_total_reported_direct_emissions +
           x2014_total_reported_direct_emissions +
           x2013_total_reported_direct_emissions +
           x2012_total_reported_direct_emissions +
           x2011_total_reported_direct_emissions,
         pct_emissions_2022 = round((x2022_total_reported_direct_emissions/sum(x2022_total_reported_direct_emissions, na.rm = T))*100, 2),
         pct_emissions_hist = round((historic_emissions/sum(historic_emissions, na.rm = T))*100, 2))

# get top 10 emitters over entire time period
top_10_historic <- data %>%
  arrange(-historic_emissions) %>%
  head(10)

We used the “historic emissions” total to then find the 10 largest emitters. These 10 facilities contributed to 67.33% of all the emissions reported to this program in the state over 2011-2022.

According to the 2023 Colorado GHG Inventory, the state emitted 128.901 MMT COe (excluding LULUCF) in 2020. The top 10 facilities contributed to 20.35% of this.

All of these are power plants and they all have a NAICS industry code of 221112, which stands for “Fossil Fuel Electric Power Generation.” Here is a preview:

By converting the data from a “wide” to a “long” format, we can plot it as a timeseries.

# pivot from wide to long
data_long <- data %>%
  select(-historic_emissions, -pct_emissions_2022, -pct_emissions_hist) %>%
  pivot_longer(cols = starts_with("x20"), 
               names_to = "year", 
               values_to = "emissions")

data_long$year <- as.numeric(gsub("x([0-9]+)_.*", "\\1", data_long$year))

# filter for top 10 historic emitters
top_10_historic_timeseries <- data_long %>%
  filter(facility_id %in% top_10_historic$facility_id)

Similarly, we can aggregate data by industry type sectors and get a timeseries format.

industry <- data %>%
  group_by(latest_reported_industry_type_sectors) %>%
  summarise(total_emissions_2022 = sum(x2022_total_reported_direct_emissions, na.rm = T),
            total_emissions_historic = sum(historic_emissions, na.rm = T)) %>%
  mutate(pct_emissions_2022 = round((total_emissions_2022/sum(total_emissions_2022)*100),2),
         pct_emissions_historic = round((total_emissions_historic/sum(total_emissions_historic)*100),2))

industry_timeseries <- data_long %>%
  group_by(year, latest_reported_industry_type_sectors) %>%
  summarise(emissions = sum(emissions, na.rm = T))

Power plants, highlighted in orange above, are by far the largest emitting industry type sector. But they have also been reducing emissions steadily (with a bigger dip in 2020, during the COVID-19 pandemic).

From 2011 to 2022, power plants reduced emissions by 22.54%. The majority of the reduction can be attributed the top 10 facilities.

To finish this, we imported parent company data and we matched it to the top 10 facilities by using a unique identifier, Facility ID. Here is a preview:

parent_company <- read_xlsx("01-data-raw/ghgp_data_parent_company_09_2023.xlsx", sheet = "2022") %>%
  clean_names() %>%
  filter(ghgrp_facility_id %in% top_10_historic$facility_id)

Interactive Data Viz