Analyzing Data for Racial Equity

Module 2: Interrogating Mechanism

Module 1: Identifying Injustice Recap

Racial equity is a practice, not a target
Racial inequity is a difference, or disparity, in quality of service or access to resources
Disparities can represent injustice
Identifying the mechanism of injustice helps us:
- Contemplate effective interventions
- Identify other places affected by the same mechanism

Module 1: Identifying Injustice Recap

As of 2024, California had 220 failing drinking water systems serving nearly half a million people.

Lamont has received $25 million in Water Board funding to help fix their failing drinking water system, which included three wells exceeding the Maximum Contaminant Level of arsenic and 1,2,3-tricholoropropane.

Module 1: Identifying Injustice Recap

We found that people living in the city center are more likely to use public water, and that in Lamont, those people are more likely to be Hispanic in origin.

We also know from the linked press release that mitigation efforts in Lamont included the destruction of three 45-year-old wells that exceeded the state Maximum Contaminant Levels (MCL) for arsenic and 1,2,3-trichloropropane.

Root cause analysis

“Technique that helps identify the fundamental reasons, or root causes, of a problem or unwanted outcome”

A few considerations:

Your team should include community partners
Allocate enough time to be thoughtful and pursue leads
Honor hard truths!

Root cause analysis

Methods for interrogating the problem

What are the root causes of contamination and exposure to it?

Take 10 minutes to discuss the root causes of contaminated water and exposure to it in Lamont. Some questions to consider include:

How are people exposed to the contaminated water?
How did contaminants get into the water?
Why hadn’t contaminants been removed from the water?

Use this press release for extra context, if needed.

One person from each group will share a summary of their group’s discussion when we reconvene.

How are people exposed to the contaminated water?

Public water use, which as we learned, is a function of where you live
Why do people live where they live?
- Redlining
- Tribal lands
- Proximity to work

How did contaminants get into the water?

Age of wells
Concentration of agricultural activity

Why hadn’t contaminants been removed from the water?

Permitting
Cooperation
Limited resources

Scaling intervention

Where else is the mechanism at play?

SAFER has funded 15 water system consolidation projects in Kern County, where Lamont is located. That’s about 16 percent of SAFER-funded consolidations statewide.

Is there a common factor affecting water systems in Kern County that might also affect water systems in other counties?

🛎️ Chime in or share your ideas in the chat.

Where else is the mechanism at play?

Let’s focus on one of the mechanisms we suspect led to contamination: agricultural activity. If we can identify other places where agricultural activity occurs, then we might find other places where the water is contaminated.

What data might help us understand the relationship between agriculture and water system health?

🛎 Chime in or share your ideas in the chat.

Mapping notes: Projections

Maps are flat, the earth is not
Projection determines how your mapping software translates the earth’s curvature into a flat map
Different projections won’t overlay cleanly
- A lot of software, like R, will throw an error if you ask it to operate on map layers in different projections

Examples of different map projections

Mapping notes: Projections

Projections to know
- WGS84: ESPG:4326 - Global lat/long
- California Albers Equal Area: ESPG:3310 - Statewide analysis of California
- California State Plane Coordinate System (SPCS) - Detailed analysis of California regions

Mapping notes: Accessibility

Accessibility is part of equity!
Color blindness
- ColorBrewer
- In R (ggplot):
  - Discrete* scales: Brewer color scales
  - Discrete and continuous scales: Viridis color scales
Vision impairment: Consider complementary text and/or data table

Mapping notes: Storytelling

Map type	Pros ✅	Cons 🚫
Points	Communicates precise location	Illegible above a certain number of points
Dot density	Communicates approximate distribution	Imprecise, illegible below and above a certain number of points
Choropleth	Communicates approximate distribution, able to distill large values	Difficult to layer

Pesticide use

Let’s use CalEnviroScreen data to examine patterns of pesticide use across the state.

Why CES?

Pesticide use is a proven vector of contamination
CES does a lot of pre-processing for us ❤️
- Filters for most hazardous and/or volatile pesticide ingredients
- Calculates percentiles for comparability
- Includes population characteristics (and other indicators!) for more detailed analysis

Pesticide use

First, let’s retrieve our data from CalEnviroscreen.

unzip(paste(here(), "data/raw/calenviroscreen40shpf2021shp.zip",
1    sep = "/"), exdir = "data/raw/calenviroscreen40shpf2021shp")
ces_shp <- paste(here(), "data/raw/calenviroscreen40shpf2021shp/CES4 Final Shapefile.shp",
    sep = "/")
ces <- read_sf(ces_shp) %>%
2    st_transform(3310)

1: Extract data from ZIP archive
2: Read shapefile into R, using the Albers California projection

Pesticide use

Now, let’s map pesticide use by tract.

plotTheme <- list(
  theme_void(), 
  theme(
    plot.title = element_text(size = 14, face = "bold")
  )
1)

2pesticide_use <- ggplot() + geom_sf(
  ces, color = "white", linewidth = 0.001, mapping = aes(fill = PesticideP)
) + scale_fill_distiller(
3  palette = "Greens", direction = 1
) + labs(
4  title = "Agricultural Pesticide Use by Census Tract", subtitle = "Darker shade = higher percentile = more intensive pesticide use"
) + plotTheme
  
pesticide_use

1: Set some attractive default styles for our maps
2: Add tracts shaded by pesticide use percentile
3: Generate a legible color scale from ColorBrewer
4: Add a title and subtitle

Pesticide use

Pesticide use

Let’s try a version of the map that only shows counties at or above the 75th percentile to really emphasize heavy users.

top_pesticide_use <- ggplot() + geom_sf(ces, color = "lightgray",
    mapping = aes()) + geom_sf(ces %>%
    filter(PesticideP >= 75), color = "white", linewidth = 0.001,
1    mapping = aes(fill = PesticideP)
) + scale_fill_distiller(palette = "Greens",
    direction = 1) + labs(title = "Agricultural Pesticide Use by Census Tract - Top 25%",
    subtitle = "Darker shade = higher percentile = more intensive pesticide use") +
    plotTheme

top_pesticide_use

1: Filter data to only include tracts with pesticide use >= 75th percentile

Pesticide use

Pesticide use

Failing water systems

Now, let’s take a look at SAFER data.

safer_ra <- read_csv(paste(here(), "data/raw/SAFER_RA.csv", sep = "/"))
safer_ra_as_sf <- st_as_sf(safer_ra %>%
    filter(LATITUDE_MEASURE != 0, LONGITUDE_MEASURE != 0), coords = c("LONGITUDE_MEASURE",
    "LATITUDE_MEASURE"), crs = 4326) %>%
1    st_transform(3310)

1: Create geospatial version of SAFER data in Albers California projection

Failing water systems

Plot the location of failing water systems on a map.

safer_point_map <- ggplot() + 
1  geom_sf(ces, color = "lightgray", mapping = aes()) +
2  geom_sf(safer_ra_as_sf %>% filter(CURRENT_FAILING == 'Failing'), colour = "darkred", alpha = 0.5, mapping = aes()) +
  labs(title = "Failing Water Systems", subtitle = "Status determined by 2024 SAFER Risk Analysis") + 
  plotTheme
  
safer_point_map

1: Add a tract base map
2: Add a point layer showing failing water systems

Failing water systems

More pesticide use = more failing water systems?

Let’s overlay failing systems with pesticide use to see if we can identify a relationship.

ggplot() + geom_sf(ces, color = "lightgray", mapping = aes()) +
    geom_sf(ces %>%
        filter(PesticideP >= 75), color = "white", linewidth = 0.001,
        mapping = aes(fill = PesticideP)) + scale_fill_distiller(palette = "Greens",
    direction = 1) + geom_sf(safer_ra_as_sf %>%
    filter(CURRENT_FAILING == "Failing"), colour = "darkred",
    alpha = 0.5, mapping = aes()) + labs(title = "Pesticide Use and Failing Water Systems") +
    plotTheme

More pesticide use = more failing water systems?

More pesticide use = more failing water systems?

Yikes! Let’s make a version that’s color blind friendly.

ggplot() + geom_sf(ces, color = "lightgray", mapping = aes()) +
    geom_sf(ces %>%
        filter(PesticideP >= 75), color = "white", linewidth = 0.001,
        mapping = aes(fill = PesticideP)) + scale_fill_distiller(palette = "Blues",
    direction = 1) + geom_sf(safer_ra_as_sf %>%
    filter(CURRENT_FAILING == "Failing"), colour = "darkorange",
    alpha = 0.5, mapping = aes()) + labs(title = "Pesticide Use and Failing Water Systems") +
    plotTheme

More pesticide use = more failing water systems?

More pesticide use = more failing water systems?

We can see a relationship, but how strong is it? Let’s do a quick statistical analysis to see whether there’s a significant relationship between pesticide use and failing water systems.

First, we need to aggregate the SAFER data by tract.

1systems_with_tracts <- st_join(safer_ra_as_sf, ces)

failing_systems_by_tract <- systems_with_tracts %>%
    group_by(Tract) %>%
    summarize(n_systems = n(), n_failing = sum(CURRENT_FAILING ==
        "Failing", na.rm = TRUE), pct_failing = n_failing/n_systems *
2        100)

3tracts_with_failing_systems <- st_join(ces, failing_systems_by_tract)

1: Perform geospatial join of water systems and tracts with CES data
2: Summarize number of systems, number of failing systems, and proportion systems failing by tract.
3: Join summary data to tracts with CES, so that we can use both CES data and summary values for analysis.

More pesticide use = more failing water systems?

Next, we’ll perform the analysis.

Measure	Definition	Result
Correlation	Strength of relationship (0-1)	0.22 (weak positive)
R-squared	How much outcome affected by factor (0-1)	0.049 (4.9%)
P-value	Likelihood of false positive (0-1)	0.00000000000000045 (highly significant ***)

Curious how we got here? Check out the code on GitHub!

What should we do?

What do we have the authority to do? What cause/s can we address?
What interventions are already in place?
Determine relationship to existing interventions

🛎️ Chime in or share your ideas in the chat.

Filling the gaps

Let’s target water systems in tracts with high pesticide use that have not received SAFER funding.

unfunded_failing_systems <- systems_with_tracts %>% 
1  filter(CURRENT_FAILING == 'Failing', FUNDING_RECEIVED_SINCE_2017 == 0, PesticideP >= 50) %>%
2  select(WATER_SYSTEM_NUMBER, PL_ADDRESS_CITY_NAME, POPULATION, SERVICE_CONNECTIONS, FUNDING_RECEIVED_SINCE_2017, SERVICE_AREA_ECONOMIC_STATUS, PesticideP) %>%
3  arrange(desc(POPULATION))

1: Apply filters
2: Retrieve subset of columns
3: Sort by highest population first

Filling the gaps

WATER_SYSTEM_NUMBER	PL_ADDRESS_CITY_NAME	POPULATION	SERVICE_CONNECTIONS	SERVICE_AREA_ECONOMIC_STATUS	PesticideP	geometry
CA1,610,005	LEMOORE	27,185	7,474	DAC	61	POINT (18,109 -191,201)
CA1,510,021	WASCO	22,757	5,361	SDAC	77	POINT (59,192 -269,383)
CA1,010,018	KERMAN	17,256	4,094	SDAC	95	POINT (-5,669 -143,576)
CA1,510,013	MCFARLAND	14,161	2,849	SDAC	82	POINT (69,475 -259,710)
CA3,610,850	CHINO	10,667	1,912	Non-DAC	84	POINT (214,349 -445,739)

Filling the gaps

Finally, let’s create a bubble map of those failing water systems, where the point is scaled to the population served.

1BREAKS = c(0, 1000, 2500, 5000, 10000, 20000)
ggplot() +
2  geom_sf(ces %>% st_transform(4326), color = "lightgray", mapping = aes()) +
  geom_point(
3    unfunded_failing_systems %>% inner_join(safer_ra, by = "WATER_SYSTEM_NUMBER") %>% filter(LATITUDE_MEASURE != 0) %>% arrange(POPULATION.x),
    alpha = 0.75,
    shape = 21,
    stroke = 0,
4    mapping = aes(size = POPULATION.x, fill = POPULATION.x, x = LONGITUDE_MEASURE, y = LATITUDE_MEASURE),
5  ) +
  scale_fill_fermenter(
    type = "seq", 
    palette = "YlGnBu", 
6    breaks = BREAKS,
    direction = 1,
    name = "Population",
7    guide = guide_legend(override.aes = list(alpha = 1, stroke = 0.5, color = "black"))
  ) +
  scale_size_binned(
    breaks = BREAKS,
    name = "Population",
    guide = guide_legend(override.aes = list(alpha = 1, stroke = 0.5, color = "black"))
  ) + 
    labs(title = "Unfunded Failing Water Systems\nin Tracts with High Pesticide Use", subtitle = "Systems scaled and colored by population served") +
    plotTheme

1: Define custom breaks to better show variation
2: Re-project the tract base map to 4326, since we will be mapping lat/lon
3: Join failing system data to original SAFER data to retrieve lat/lon
4: Scale and color points by population
5: Use geom_point() instead of geom_sf(), because the latter doesn’t behave correctly with the size aesthetic.
6: Pass custom breaks into our color and scale functions
7: Specify that color/size should be shown on the same legend

Filling the gaps

Putting it all together

Once you’ve identified your candidate water systems, repeat the original analysis with your reproducible code!

May find a dozen places that “look” the same as Lamont
Prioritize places where you see disparity
Validate your hypothesis that the same mechanism is at work by engaging with the community, peers, etc.

Where do we go from here?

Is our analysis complete? What are we missing? What assumptions are we making? What would you do next?

🛎️ Chime in or share your ideas in the chat.

Where do we go from here?

There are lots of wrong ways, but also lots of right ways
Prioritize community engagement and partnerships
- Validate and/or sharpen findings from data analysis
- Address mechanisms outside your remit
- Get at the full root cause, not just the water-shaped parts of it

Additional resources

You are not alone!

GitHub
Modules
Handbook
EJ Roundtable Equity Data Subcommittee SharePoint
- Next meeting: July 30, 2025
- To join the subcommittee: Submit this form!
Feedback survey
- Estimated time to complete: 5 min