Background

The following is a data analysis of nitrate concentration spikes in water after rainfall events at a location in Illinois. Nitrate data comes from the US Geological Survey. Precipitation data comes from Daymet.

Since there are not nitrate monitoring locations in every county, we focused on doing a case study on the location that has the most data. To find this location, we used the following criteria:

  1. Monitors the parameter “99133,” or nitrates in milligrams per Liter

  2. Location is still active today

  3. Had at least one nitrate spike above the federal threshold of 10mg/L this year

  4. Its data goes as far back as possible

Part 1

First, we loaded the R libraries we will use, including dataRetrieval. This is a package that allows us to request USGS data from R Studio.

library(dataRetrieval)
library(tidyverse)
library(lubridate)
library(dygraphs)
library(sp)
library(rgeos)
library(xts)
library(data.table)
library(DT)

Then, we will run the following code chunk. It uses a dataRetrieval. function to request all the USGS monitoring locations in Illinois and then filters them by keeping the ones that are still active and that have nitrate data. This will give us the site number, coordinates and how long each location has been active.

today <- Sys.Date()

IL_site <- readNWISdata(stateCd= "IL", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2022-02-03") %>% # today
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date)

Then we made this function that does the following:

  1. Pulls all the site numbers for the state and requests nitrate data from this year.

  2. It aggregates the data to find the maximum nitrate reading, yearlyPeak.

  3. It creates a new column that detects if the maximum nitrate level in each location was above 10 mg/L or not.

  4. It removes sites that did not have a nitrate peak higher than the federal threshold this year.

  5. It arranges the remainder locations from oldest to newest and selects the top one.

This means the function will chose one location in the state that has data that goes as far back as possible and that had a nitrate spike above federal guidelines.

state_function <- function(state){
  
  site <- state
  
  # nitrate levels from this year
  iv <- readNWISdata(siteNumbers = site$site_no, parameterCd = "99133",
                     startDate = "2021-01-01", endDate = "2021-08-18",
                     service = "iv")
  
  # now aggregate by yearly max 
  iv$year <- year(iv$dateTime)
  
  yearPeak <- iv %>% 
    group_by(site_no, year) %>%
    summarise(yearlyPeak = max(X_99133_00000, na.rm = T)) %>%
    filter(yearlyPeak <= 40) %>% # wonky filter 
    select(-year)
  
  # join these to "site", sort by date, pick oldest one above 10mg/L  
  site <- site %>% 
    right_join(yearPeak) %>%
    arrange(begin_date) %>%
    mutate(illegal = case_when(
      yearlyPeak >= 10 ~ "Yes",
      yearlyPeak < 10 ~ "No")) %>%
    filter(illegal == "Yes") %>%
    select(-illegal) %>%
    head(1) # this is our top location at this state
  
  return(site)
}

# The following site has the data we are looking for:
IL_site <- state_function(IL_site) # 03336890 (IL)

This is the location we will analyze in this case study:

Located in Champaign County, IL: