Analyzing Data for Racial Equity

Module 2: Interrogating Mechanism

Module 1: Identifying Injustice Recap

  • Racial equity is a practice, not a target
  • Racial inequity is a difference, or disparity, in quality of service or access to resources
  • Disparities can represent injustice
  • Identifying the mechanism of injustice helps us:
    • Contemplate effective interventions
    • Identify other places affected by the same mechanism

Module 1: Identifying Injustice Recap

As of 2024, California had 220 failing drinking water systems serving nearly half a million people.

Lamont has received $25 million in Water Board funding to help fix their failing drinking water system, which included three wells exceeding the Maximum Contaminant Level of arsenic and 1,2,3-tricholoropropane.

Module 1: Identifying Injustice Recap

We found that people living in the city center are more likely to use public water, and that in Lamont, those people are more likely to be Hispanic in origin.

We also know from the linked press release that mitigation efforts in Lamont included the destruction of three 45-year-old wells that exceeded the state Maximum Contaminant Levels (MCL) for arsenic and 1,2,3-trichloropropane.

Root cause analysis

“Technique that helps identify the fundamental reasons, or root causes, of a problem or unwanted outcome”

A few considerations:

  • Your team should include community partners
  • Allocate enough time to be thoughtful and pursue leads
  • Honor hard truths!

Root cause analysis

Methods for interrogating the problem

What are the root causes of contamination and exposure to it?

Take 10 minutes to discuss the root causes of contaminated water and exposure to it in Lamont. Some questions to consider include:

  • How are people exposed to the contaminated water?
  • How did contaminants get into the water?
  • Why hadn’t contaminants been removed from the water?

Use this press release for extra context, if needed.

One person from each group will share a summary of their group’s discussion when we reconvene.

How are people exposed to the contaminated water?

  • Public water use, which as we learned, is a function of where you live
  • Why do people live where they live?
    • Redlining
    • Tribal lands
    • Proximity to work

How did contaminants get into the water?

  • Age of wells
  • Concentration of agricultural activity

Why hadn’t contaminants been removed from the water?

  • Permitting
  • Cooperation
  • Limited resources

Scaling intervention

Where else is the mechanism at play?

SAFER has funded 15 water system consolidation projects in Kern County, where Lamont is located. That’s about 16 percent of SAFER-funded consolidations statewide.

Is there a common factor affecting water systems in Kern County that might also affect water systems in other counties?

🛎️ Chime in or share your ideas in the chat.

Where else is the mechanism at play?

Let’s focus on one of the mechanisms we suspect led to contamination: agricultural activity. If we can identify other places where agricultural activity occurs, then we might find other places where the water is contaminated.

What data might help us understand the relationship between agriculture and water system health?

🛎 Chime in or share your ideas in the chat.

Mapping notes: Projections

  • Maps are flat, the earth is not
  • Projection determines how your mapping software translates the earth’s curvature into a flat map
  • Different projections won’t overlay cleanly
    • A lot of software, like R, will throw an error if you ask it to operate on map layers in different projections

Examples of different map projections

Mapping notes: Projections

Mapping notes: Accessibility

Mapping notes: Storytelling

Map type Pros âś… Cons đźš«
Points Communicates precise location Illegible above a certain number of points
Dot density Communicates approximate distribution Imprecise, illegible below and above a certain number of points
Choropleth Communicates approximate distribution, able to distill large values Difficult to layer

Pesticide use

Let’s use CalEnviroScreen data to examine patterns of pesticide use across the state.

Why CES?

  • Pesticide use is a proven vector of contamination
  • CES does a lot of pre-processing for us ❤️
    • Filters for most hazardous and/or volatile pesticide ingredients
    • Calculates percentiles for comparability
    • Includes population characteristics (and other indicators!) for more detailed analysis

Pesticide use

First, let’s retrieve our data from CalEnviroscreen.

unzip(paste(here(), "data/raw/calenviroscreen40shpf2021shp.zip",
1    sep = "/"), exdir = "data/raw/calenviroscreen40shpf2021shp")
ces_shp <- paste(here(), "data/raw/calenviroscreen40shpf2021shp/CES4 Final Shapefile.shp",
    sep = "/")
ces <- read_sf(ces_shp) %>%
2    st_transform(3310)
1
Extract data from ZIP archive
2
Read shapefile into R, using the Albers California projection

Pesticide use

Now, let’s map pesticide use by tract.

plotTheme <- list(
  theme_void(), 
  theme(
    plot.title = element_text(size = 14, face = "bold")
  )
1)

2pesticide_use <- ggplot() + geom_sf(
  ces, color = "white", linewidth = 0.001, mapping = aes(fill = PesticideP)
) + scale_fill_distiller(
3  palette = "Greens", direction = 1
) + labs(
4  title = "Agricultural Pesticide Use by Census Tract", subtitle = "Darker shade = higher percentile = more intensive pesticide use"
) + plotTheme
  
pesticide_use
1
Set some attractive default styles for our maps
2
Add tracts shaded by pesticide use percentile
3
Generate a legible color scale from ColorBrewer
4
Add a title and subtitle

Pesticide use

Pesticide use

Let’s try a version of the map that only shows counties at or above the 75th percentile to really emphasize heavy users.

top_pesticide_use <- ggplot() + geom_sf(ces, color = "lightgray",
    mapping = aes()) + geom_sf(ces %>%
    filter(PesticideP >= 75), color = "white", linewidth = 0.001,
1    mapping = aes(fill = PesticideP)
) + scale_fill_distiller(palette = "Greens",
    direction = 1) + labs(title = "Agricultural Pesticide Use by Census Tract - Top 25%",
    subtitle = "Darker shade = higher percentile = more intensive pesticide use") +
    plotTheme

top_pesticide_use
1
Filter data to only include tracts with pesticide use >= 75th percentile

Pesticide use

Pesticide use

Failing water systems

Now, let’s take a look at SAFER data.

safer_ra <- read_csv(paste(here(), "data/raw/SAFER_RA.csv", sep = "/"))
safer_ra_as_sf <- st_as_sf(safer_ra %>%
    filter(LATITUDE_MEASURE != 0, LONGITUDE_MEASURE != 0), coords = c("LONGITUDE_MEASURE",
    "LATITUDE_MEASURE"), crs = 4326) %>%
1    st_transform(3310)
1
Create geospatial version of SAFER data in Albers California projection

Failing water systems

Plot the location of failing water systems on a map.

safer_point_map <- ggplot() + 
1  geom_sf(ces, color = "lightgray", mapping = aes()) +
2  geom_sf(safer_ra_as_sf %>% filter(CURRENT_FAILING == 'Failing'), colour = "darkred", alpha = 0.5, mapping = aes()) +
  labs(title = "Failing Water Systems", subtitle = "Status determined by 2024 SAFER Risk Analysis") + 
  plotTheme
  
safer_point_map
1
Add a tract base map
2
Add a point layer showing failing water systems

Failing water systems

More pesticide use = more failing water systems?

Let’s overlay failing systems with pesticide use to see if we can identify a relationship.

ggplot() + geom_sf(ces, color = "lightgray", mapping = aes()) +
    geom_sf(ces %>%
        filter(PesticideP >= 75), color = "white", linewidth = 0.001,
        mapping = aes(fill = PesticideP)) + scale_fill_distiller(palette = "Greens",
    direction = 1) + geom_sf(safer_ra_as_sf %>%
    filter(CURRENT_FAILING == "Failing"), colour = "darkred",
    alpha = 0.5, mapping = aes()) + labs(title = "Pesticide Use and Failing Water Systems") +
    plotTheme

More pesticide use = more failing water systems?

More pesticide use = more failing water systems?

Yikes! Let’s make a version that’s color blind friendly.

ggplot() + geom_sf(ces, color = "lightgray", mapping = aes()) +
    geom_sf(ces %>%
        filter(PesticideP >= 75), color = "white", linewidth = 0.001,
        mapping = aes(fill = PesticideP)) + scale_fill_distiller(palette = "Blues",
    direction = 1) + geom_sf(safer_ra_as_sf %>%
    filter(CURRENT_FAILING == "Failing"), colour = "darkorange",
    alpha = 0.5, mapping = aes()) + labs(title = "Pesticide Use and Failing Water Systems") +
    plotTheme

More pesticide use = more failing water systems?

More pesticide use = more failing water systems?

We can see a relationship, but how strong is it? Let’s do a quick statistical analysis to see whether there’s a significant relationship between pesticide use and failing water systems.

First, we need to aggregate the SAFER data by tract.

1systems_with_tracts <- st_join(safer_ra_as_sf, ces)

failing_systems_by_tract <- systems_with_tracts %>%
    group_by(Tract) %>%
    summarize(n_systems = n(), n_failing = sum(CURRENT_FAILING ==
        "Failing", na.rm = TRUE), pct_failing = n_failing/n_systems *
2        100)

3tracts_with_failing_systems <- st_join(ces, failing_systems_by_tract)
1
Perform geospatial join of water systems and tracts with CES data
2
Summarize number of systems, number of failing systems, and proportion systems failing by tract.
3
Join summary data to tracts with CES, so that we can use both CES data and summary values for analysis.

More pesticide use = more failing water systems?

Next, we’ll perform the analysis.

Measure Definition Result
Correlation Strength of relationship (0-1) 0.22
(weak positive)
R-squared How much outcome affected by factor (0-1) 0.049 (4.9%)
P-value Likelihood of false positive (0-1) 0.00000000000000045
(highly significant ***)

Curious how we got here? Check out the code on GitHub!

What should we do?

  • What do we have the authority to do? What cause/s can we address?
  • What interventions are already in place?
  • Determine relationship to existing interventions

🛎️ Chime in or share your ideas in the chat.

Filling the gaps

Let’s target water systems in tracts with high pesticide use that have not received SAFER funding.

unfunded_failing_systems <- systems_with_tracts %>% 
1  filter(CURRENT_FAILING == 'Failing', FUNDING_RECEIVED_SINCE_2017 == 0, PesticideP >= 50) %>%
2  select(WATER_SYSTEM_NUMBER, PL_ADDRESS_CITY_NAME, POPULATION, SERVICE_CONNECTIONS, FUNDING_RECEIVED_SINCE_2017, SERVICE_AREA_ECONOMIC_STATUS, PesticideP) %>%
3  arrange(desc(POPULATION))
1
Apply filters
2
Retrieve subset of columns
3
Sort by highest population first

Filling the gaps

WATER_SYSTEM_NUMBER PL_ADDRESS_CITY_NAME POPULATION SERVICE_CONNECTIONS FUNDING_RECEIVED_SINCE_2,017 SERVICE_AREA_ECONOMIC_STATUS PesticideP geometry
CA1,610,005 LEMOORE 27,185 7,474 0 DAC 61 POINT (18,109 -191,201)
CA1,510,021 WASCO 22,757 5,361 0 SDAC 77 POINT (59,192 -269,383)
CA1,010,018 KERMAN 17,256 4,094 0 SDAC 95 POINT (-5,669 -143,576)
CA1,510,013 MCFARLAND 14,161 2,849 0 SDAC 82 POINT (69,475 -259,710)
CA3,610,850 CHINO 10,667 1,912 0 Non-DAC 84 POINT (214,349 -445,739)

Filling the gaps

Finally, let’s create a bubble map of those failing water systems, where the point is scaled to the population served.

1BREAKS = c(0, 1000, 2500, 5000, 10000, 20000)
ggplot() +
2  geom_sf(ces %>% st_transform(4326), color = "lightgray", mapping = aes()) +
  geom_point(
3    unfunded_failing_systems %>% inner_join(safer_ra, by = "WATER_SYSTEM_NUMBER") %>% filter(LATITUDE_MEASURE != 0) %>% arrange(POPULATION.x),
    alpha = 0.75,
    shape = 21,
    stroke = 0,
4    mapping = aes(size = POPULATION.x, fill = POPULATION.x, x = LONGITUDE_MEASURE, y = LATITUDE_MEASURE),
5  ) +
  scale_fill_fermenter(
    type = "seq", 
    palette = "YlGnBu", 
6    breaks = BREAKS,
    direction = 1,
    name = "Population",
7    guide = guide_legend(override.aes = list(alpha = 1, stroke = 0.5, color = "black"))
  ) +
  scale_size_binned(
    breaks = BREAKS,
    name = "Population",
    guide = guide_legend(override.aes = list(alpha = 1, stroke = 0.5, color = "black"))
  ) + 
    labs(title = "Unfunded Failing Water Systems\nin Tracts with High Pesticide Use", subtitle = "Systems scaled and colored by population served") +
    plotTheme
1
Define custom breaks to better show variation
2
Re-project the tract base map to 4326, since we will be mapping lat/lon
3
Join failing system data to original SAFER data to retrieve lat/lon
4
Scale and color points by population
5
Use geom_point() instead of geom_sf(), because the latter doesn’t behave correctly with the size aesthetic.
6
Pass custom breaks into our color and scale functions
7
Specify that color/size should be shown on the same legend

Filling the gaps

Putting it all together

Once you’ve identified your candidate water systems, repeat the original analysis with your reproducible code!

  • May find a dozen places that “look” the same as Lamont
  • Prioritize places where you see disparity
  • Validate your hypothesis that the same mechanism is at work by engaging with the community, peers, etc.

Where do we go from here?

Is our analysis complete? What are we missing? What assumptions are we making? What would you do next?

🛎️ Chime in or share your ideas in the chat.

Where do we go from here?

  • There are lots of wrong ways, but also lots of right ways
  • Prioritize community engagement and partnerships
    • Validate and/or sharpen findings from data analysis
    • Address mechanisms outside your remit
    • Get at the full root cause, not just the water-shaped parts of it

Additional resources

You are not alone!