Background

Data comes from EPA’s Greenhouse Gas Reporting Program.

EPA’s GHGRP includes “facility-level data covering largest emitters (usually over 25,000 MTCO2e)” and “approximately 7,600 facilities report direct emissions, covering approximately 50 percent of total U.S. emissions” according to their website.

That means it does not cover the full range of the country’s emissions like the Greenhouse Gas Inventory. But, unlike the GHG Inventory, this data set contains more specific information about the direct emitters.

Most of these direct emitters are power plants and large industrial facilities. In addition to the direct emitters, there are a couple of “suppliers of fuel and industrial gases” and “underground injection of carbon dioxide (Subpart UU).”

Those classifications don’t include a specific type for agriculture so we will focus on using NAICS to make our own division.

Loading

After importing the file, we have 8325 entries and 22 variables in a data set called yearly.

library(tidyverse)
library(readxl)
library(knitr)
library(DT)

yearly <- read_excel("Data/EPA/ghgp_data_by_year.xlsx", sheet = 1, skip = 3)

Cleaning

The following will rename columns for easier handling and change NAICS variable type to something we can use to filter the data set.

yearly <- select(yearly, 
                 Facility.ID = `Facility Id`,
                 Name = `Facility Name`,
                 Primary.NAICS = `Primary NAICS Code`,
                 Sector = `Latest Reported Industry Type (sectors)`,
                 e2019 = `2019 Total reported direct emissions`,
                 e2018 = `2018 Total reported direct emissions`,
                 e2017 = `2017 Total reported direct emissions`,
                 e2016 = `2016 Total reported direct emissions`,
                 e2015 = `2015 Total reported direct emissions`,
                 e2014 = `2014 Total reported direct emissions`,
                 e2013 = `2013 Total reported direct emissions`,
                 e2012 = `2012 Total reported direct emissions`,
                 e2011 = `2011 Total reported direct emissions`,
                 City,
                 State, 
                 Zip = `Zip Code`,
                 Address,
                 County, 
                 Latitude, 
                 Longitude)

yearly$Primary.NAICS <- as.factor(yearly$Primary.NAICS)

Now we have to filter those eight thousand facilities to Big Ag. I made two new columns that will detect NAICS code 311 and 11.

Those correspond to Food Manufacturing and Agriculture, Forestry, Fishing and Hunting.

More info about the codes can be found on the Census website.

Then I filtered the data set to only rows matching those NAICS codes into a new set called big_ag.

yearly$n311 <- str_detect(yearly$Primary.NAICS, "^311")
yearly$n11 <- str_detect(yearly$Primary.NAICS, "^11")

yearly_311 <- filter(yearly, n311 == T)
yearly_11 <- filter(yearly, n11 == T) 

big_ag <- rbind(yearly_311, yearly_11)

rm(yearly_311)
rm(yearly_11)

big_ag has 378 facilities, from which 372 are from Food Manufacturing and just 6 from Agriculture, Forestry, Fishing and Hunting.

(e2019 is emissions from 2019 and so on. R just doesn’t let you name a column with a name)

You can explore the table and download a copy over here:

Transforming

To find which facilities have been the biggest emitters, we will make a new column that sums the total emissions from 2011 through 2019. (Note: not all the facilities have reported emissions throughout all those years.)

big_ag <- mutate(big_ag, total.emissions = e2019+e2018+e2017+e2016+e2015+e2014+e2013+e2012+e2011)

In order to plot this data over time we need each year to be a row, not a column. So I transposed a simpler data set, only containing the facility ID and yearly emissions.

The reason for this is that in order to transpose we can only keep one variable tied to each emission number, and having the facility ID is better than the name since a company name can appear several times because they have more than one facility in the data set.

But that’s something we will change on the final graphics.

# this makes the data set with only the ID and emissions
simple <- select(big_ag,
                 Facility.ID,
                 total.emissions,
                 e2019,
                 e2018,
                 e2017,
                 e2016,
                 e2015,
                 e2014,
                 e2013,
                 e2012,
                 e2011)

# sort it by highest emitters (total emissions from 2011-2019)
simple <- arrange(simple, desc(total.emissions))

# transpose
transposed <- as.data.frame(as.matrix(t(simple[,-1])))

# store the IDs here 
n <- simple$Facility.ID

# now paste those back on each column
colnames(transposed) <- n

# and change the year names to each row
rownames(transposed) <- NULL
transposed$year <- c("total", 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011)
transposed <- transposed[,c(379,1:378)]

That transformed table looks like this below. This is what we can use on Flourish to see how each Big Ag facility reported emissions from 2011 to 2019. Each row represents one facility and its corresponding yearly emissions of CO2e.

Stats

With that table ready, we can also calculate some stats for the top emitter. The first row from the table above has the ID: 1005661, and if you look that up on the first table that belongs to: Archer Daniels Midland Co. located at 4666 FARIES PARKWAY in DECATUR, IL.

This plant polluted 4,402,190 tons of CO2 equivalent in 2019 compared to the 4,710,939 in 2011.

That represents a 6.6 percent decrease over that time period.

The following table has a column for the calculated percent changes in emissions 2011 to 2019. They are ordered by the biggest increase to biggest reduction. (Note: not all the facilities have reported emissions in both 2011 and 2019, their percent change will not appear.)

To give you a better idea, the 378 Big Ag facilities we filtered, emitted a total of 35,653,618 tons of CO2e in 2019. While the entire data set contains 2,622,343,176 tons of CO2e reported emissions in 2019.

The ADM Decatur plant represents 12.3 percent of Big Ag emissions from this specific data set.

The entire Big Ag emissions reported from these facilities represent 1.4 percent from the entire data set.

Graph

To see the context around this plant, I plotted the 5 Big Ag facilities with the highest total emissions from 2011 through 2019.

The graph shows that the the top 4 facilities belong to ADM.

Map

The following graph shows the distribution of Big Ag facilities from this data. The tower height represents emissions from 2019, the latest available year.