Data Collection

Important

You should have addressed this during the Plan & Prepare phase of your process…but just in case you haven’t (yet) or you need a refresher - we’re restating here:

Achieving racial equity outcomes means that race can no longer be used to predict life outcomes and outcomes for all groups are improved (Glossary)

So, as you begin to collect the data for your project, be sure it includes:

  1. Data that can represent your management question(s) or project objectives.

  2. Data that can tell us something about the extent to which we are achieving equity outcomes. This may be limited to simple demographics data - but it could also be something more! Working with Tribal and community experts to decide what type(s) of data are most applicable to and reflective of their lived experiences as they relate to your management questions and project objectives is a great place to start!

The data related to your project may come from direct human observation, laboratory and field instruments, experiments, simulations, surveys, and/or compilations of data from other sources. Below we focus on data that can be downloaded from open data sources, and survey guidance.

Data Limitations

It’s important to remember that ALL data have limits in what they can actually tell us, constraints on how they should be used appropriately, and biases related to initial data collection or generation - and it’s crucial to be aware of and account for them during your project.

In most cases, the Water Boards programs were not developed or designed to collect the types of data needed to conduct analyses with an equity lens as a matter of process which means that most will rely on a few key data sources.

Data Benefit vs Risk

Incorporating a racial equity lens during data analysis includes incorporating individual, community, political, and historical contexts of race to inform analysis, conclusions, and recommendations. Solely relying on statistical outputs will not necessarily lead to insights without careful consideration during the analytic process, such as ensuring data quality is sufficient and determining appropriate statistical power. Disaggregation of data is also a series of tradeoffs. Without disaggregating data by subgroup, analysis can unintentionally gloss over inequity and lead to invisible experiences. On the other hand, when analysts create a subgroup, they may be shifting the focus of analysis to a specific population that is likely already over-surveilled. (Centering Racial Equity Throughout Data Integration)

Centering racial equity means paying attention to which data are highlighted and how they are framed, as well as the readability and accessibility of the communication method. This involves strategic consideration of the audience and the mode of dissemination that most effectively conveys the information. There are many ways to communicate information. These include briefs, interactive documents, websites, dashboards, social media content, data walks, posters, briefs, and infographics. Regardless of the form, content geared toward the public should avoid jargon that may be otherwise appropriate for internal program staff or academic audiences, while also using person-centered language and translating materials into languages most applicable to your community context. (Centering Racial Equity Throughout Data Integration)  

Furthermore, good quality data regarding marginalized communities is often lacking, but it is still important to discuss impacts to BIPOC communities. It may be appropriate in some cases to still present or analyze this data and also present caveats for the data limitations. In other cases, it may be more appropriate to rely only on qualitative discussion based on information derived from background research and feedback from affected communities.

Common Data Sources

Below we have provided a list of common data sources that can tell us something about the extent to which we are achieving equity outcomes.

Open Data Portals

Water Boards Data

Port over relevant water boards data and databases content?

Federal Data

United States Census Data

American Community Survey Data

Internal Administrative Data (i.e. Human Resources data)

Local and Regionally Collected Data

Demographics Data

The key to an equity analysis is to compare it to a baseline where race and ethnicity were not factors. Adding demographics data to your data project can help increase understanding of potential correlations or relationships between your data and demographic and socioeconomic characteristics of locations of interest.

Depending on what demographics data sources you decide to use, the methods needed to combine, overlay, or compare with the data you are interested in may vary. Below we outline methods of comparing demographics data to point, line, and polygon data types.

Demographics Data Needs Context

It’s important to remember that you can always benefit from setting context before trying to communicate demographic or specific racial equity answers to questions posed by our Board, a member of the public, or a partner in this work. What your program is about, what does it do, how well does it do these things (aka Performance Report), etc. This may take a few different visualizations to help frame the context of the program mission but will help viewers understand how you are approaching the racial equity data work within the scope of your program. 

Often the go-to resources for making inferences to demographic and socioeconomic characteristics is the National Census dataset and the associated American Community Survey dataset. While we are fortunate to have just updated this dataset in 2020 there are limitations and potential inaccuracies associated with relying solely on census data to enumerate demographic characteristics within a given census tract. This tool from the Department of Finance exists to measure this limitation.

A detailed example of using R programming to estimate demographics and other characteristics with U.S. census data to be used for custom spatial features is available and can be tailored to programs with the help of a data scientist proficient in R and staff familiar with the program.  https://daltare.github.io/example-census-race-ethnicity-calculation/example_census_race_ethnicity_calculation.html 

Example - SAFER Fund Expenditure Plan

Section VIII.G. Racial Equity and Environmental Justice provides several tables with data that incorporates demographics data. The tables and the text lack clarity and indepth analysis on why the data is telling the story it is. Race and ethnicity of the populations served by the systems likely isn’t the only only difference between the systems. Are there other factors that are associated with different populations that could be driving the imbalance in failing systems and funding? Perhaps, the majority-Hispanic population systems are much larger, or older, or have more severe problems. An apples to apples comparison would look at the major factors that determine cost and compare the racial difference between those subgroups. For example, comparing the racial and ethnic differences between systems of medium sized cities with a water treatment plant built within the past 30 years. Making these proportions more explicit and adding a section where Hispanics population systems get more funding, proportionally and list some additional explanatory factors that should be explored.

TIP: It’s ok to note data gaps, and in the case of racial equity data, gaps will be the norm. It is a key part of a good analysis to identify data gaps and set a course for filling those gaps.

Multi-Source Data Tools

California Water Boards Racial Equity Data Resource Hub

The California Water Boards Racial Equity Data Resource Hub provides a list of Water Boards Developed Racial Equity Data Tools

CalEnviroScreen

CalEnviroScreen 4.0

CalEnviroScreen can be a helpful tool in creating visualizations and performing analysis as it provides a number of index, as well as a “rolled-up” score that combines environmental and demographic data together. However, there can be things to consider, a couple of which are discussed below.

Missing Values for CalEnviroScreen Scores

Users conducting an analysis with the CalEnviroScreen (CES) 4.0 dataset should be aware that it contains missing values, both for individual indicators and overall CES scores. These missing values are distinct from zeros, which are also in the CES dataset. For more information about the missing (and zero) values, see the data dictionary (calenviroscreen40resultsdatadictionary_F_2021.pdf) that accompanies the CalEnviroScreen 4.0 results Excel workbook, available for download as a zip file here.

In the CES 4.0 data (for the version available as of April 2023), the shapefile containing CES 4.0 scores (available here) encodes these missing values as negative numbers (-999 for most variables, and -1998 for one variable). The Excel workbook containing CES 4.0 scores (available here) encodes these missing values as NA. Users should account for these missing values – and their different encodings – as needed when doing any analysis using CES data. Also, note that the CalEnviroScreen 3 shapefile (June 2018 update version) encoded missing values as 0, so users should be aware of this change if/when updating an analysis from CES 3 to CES 4.0 data.

Inconsistent Census Tract Boundaries in CalEnviroScreen 4.0 Shapefile

In the CES 4.0 data (for the version available as of April 2023), the shapefile containing CES 4.0 scores (available here) uses a simplified version of the polygons that represent 2010 census tracts. The boundaries of the census tracts defined by these simplified polygons do not always align with the boundaries of neighboring census tracts, resulting in slight gaps or overlaps between some neighboring census tracts. These inconsistencies are not likely to have a significant impact on most uses of the CES data, but they could impact some types of analysis based on CES data. For example, when assessing sites or facilities based on the CES score of the census tract they are located in, sites located near a census tract boundary could be associated with more than one census tract (and more than one CES score) in areas where there are overlapping census tract polygons, or not associated with any census tract (and no CES score) in areas where there are gaps between census tract polygons. This issue may be addressed in a future release of the CES dataset; in the meantime, a possible workaround is to use the official 2010 census tract boundaries from the US Census Bureau for any calculations, then use census tract IDs to tie this information to the associated CES score for each tract.

CalEnviroScreen vs. EnviroFacts vs. EnviroMapper vs. EJ Screen

Surveys

There may be instances where you need to collect new data using survey(s).

Survey Design

Creating surveys that yield actionable insights is all about the details - and writing effective survey questions is the first step. You do not have to be an expert to build and distribute an effective online survey, but by checking your survey against tried-and-tested benchmarks, you can help ensure you are collecting the best data possible.  

Tips for Building an Effective Survey:

  1. Make Sure That Every Question Is Necessary.

  2. Keep it Short and Simple.

  3. Ask Direct Questions.

  4. Ask One Question at a Time.

  5. Avoid Leading and Biased Questions.

  6. Speak Your Respondent’s Language.

  7. Use Response Scales Whenever Possible.

  8. Avoid Using Grids or Matrices for Responses.

  9. Rephrase Yes/No Questions if Possible

  10. Take Your Survey for a Test Drive

A good comprehensive guide for survey design can be found here:  https://files.eric.ed.gov/fulltext/ED619797.pdf 

https://www.qualtrics.com/blog/10-tips-for-building-effective-surveys/

Picking a Survey Software

Most Water Board staff will use Microsoft Forms which is available to all staff through the Microsoft 365 suite of applications.  Microsoft Forms has a lot of advantages because of its integration with other Microsoft tools like Excel and PowerBi which allow for the survey results to be analyzed and visualized.  Here is video on how to make that connection between Forms and PowerBi via Sharepoint that allows for consistent updating of results:  https://youtu.be/XBFVDedwLiY?si=O161oYja-FBhG1W7 

One issue with Microsoft Forms and other free software like Google Forms is that they produce wide data in Excel of Google Sheets.  This type of data is more difficult to transform.