Data Collection

Important

You should have addressed this during the Plan & Prepare phase of your process…but just in case you haven’t (yet) or you need a refresher - we’re restating here:

Achieving racial equity outcomes means that race can no longer be used to predict life outcomes and outcomes for all groups are improved (Glossary)

So, as you begin to collect the data for your project, be sure it includes:

Data that can represent your management question(s) or project objectives.
Data that can tell us something about the extent to which we are achieving equity outcomes. This may be limited to simple demographics data - but it could also be something more! Working with Tribal and community experts to decide what type(s) of data are most applicable to and reflective of their lived experiences as they relate to your management questions and project objectives is a great place to start!

The data related to your project may come from direct human observation, laboratory and field instruments, experiments, simulations, surveys, and/or compilations of data from other sources. Below we focus on data that can be downloaded from open data sources, and survey guidance.

Data Limitations

As you begin to collect the data required for your project, it’s important to remember that ALL data have limits in what they can actually tell us, constraints on how they should be used appropriately, and biases related to initial data collection or generation - and it’s crucial to be aware of and account for them during your project.

In addition to data limitations related to data quality or analysis methods (e.g., insufficient quality, low sample size, data gaps; see the the Quality Assurance and Data Analysis pages for more details), be sure to also keep the below considerations in mind so that you can prepare and analyze your data in a way that supports the advancement of equity, inclusion, and justice.

Remember - it’s OK to have data gaps!

It’s OK to have some data gaps, and in the case of conducting data analyses with a racial equity lens, gaps will be the norm. What’s important is to acknowledge what gaps exist, document how you will account for them, and (ideally) set a course for filling those gaps, as appropriate.

Unconscious bias of data sources

We all have unconscious biases and operate in inequitable and unjust systems. That can unconsciously and unintentionally impact how data are collected and result in datasets that reflect those biases. Take time to ask the following questions about the data you’re interested in using for your project so you can have a better understanding of the data’s context and be better able to detect and account for potential biases of said data:

Who collected the data?
Why were these specific data collected? What were the data collection goals?
How was data collected?
What was prioritized during data collection?
What assumptions were made during data collection?

Who is missing from the data

It’s not uncommon for project teams to have gaps in data, even after all of your diligent work and investment into the planning and data preparation steps of the project. This limitation of the data may be out of your control - especially when you are using data from external sources. Quality data regarding marginalized communities is often lacking. For example, recent research has shown that US lakes are monitored disproportionately less in communities of color - similar trends may be likely in other data sources.

When this happens, it’s important to acknowledge, document, and accessibly communicate those gaps and who is not adequately represented in your data product. In some cases, It may be appropriate to still present or analyze these data and also present caveats for the data limitations. In other cases, it may be more appropriate to rely only on qualitative discussion based on information derived from background research and feedback from affected communities.

Consider how you will group the data

As you collect your data, start thinking through how the raw data you are collecting is already grouped and how you intend on grouping the data. You may not be able to decide on how you will ultimately aggregate or disaggregate the data until you are further along in the analysis or visualization stages - but it’s important to begin thinking about this issue during the data collection phase for the reasons outlined below.

How we aggregate or disaggregate the data can impact which groups are “seen” and represented (or not) in our data products. This can also influence who is centered, valued, or prioritized in the narrative of the visualization, and who is excluded.

Carefully consider how groups are lumped or split - by aggregating many groups in the visualization beyond what might be statistically necessary (and not acknowledging who is being grouped together and why), we can unintentionally misrepresent said groups, minimize inequities and perpetuate invisible and erased experiences of those communities. On the other hand, when analysts create a subgroup, they may be shifting the focus of analysis to a specific population that is likely already over-surveilled (Centering Racial Equity Throughout Data Integration).

The UNC Health’s Equity and Inclusion Analytics Workgroup recommends we ask ourselves the following questions when we’re thinking about how we will aggregate the data (or not):

Is important data/nuance lost by combining categories? Ensure there is not a meaningful difference in our ability to understand equity outcomes between groups that would be lost if combined.
Does the inclusion of uncombined data negatively impact the interpretation of the data visualization? Having too many groups can make visualizations cluttered and hard to interpret. Additionally, disaggregation leads to smaller group sizes, which can make comparisons to larger groups more difficult and quantifying statistical significance more challenging. For those reasons, it can sometimes be best to combine groups.
Does sharing uncombined data compromise confidential information (e.g., Personal Identifiable Information) or information considered private by the community from which it comes (e.g., locations of sacred practices)? This will depend on the audience you are sharing the visualization with (e.g. internal vs public) and what information it contains.

If you ultimately decide to aggregate / combine groups, be sure to:

Avoid creating a dichotomy of races. Don’t use “White” vs. “non-White” or “people of color.” Rather, disaggregate the “non-White” group to show the diversity among communities.
Be transparent about why you’re making those decisions (including the trade offs you considered) and documenting those decisions accordingly.
Acknowledge who is now not included in the data or visualization and explain what groups have been combined and why. Use comments, tooltips, or footnotes that can be easily accessed within the visualization to make it easier for users to find this information.
Think carefully about how groups are lumped in the “other” category of our analysis or visualization. Sometimes it’s necessary to combine groups into a single “other” category (e.g. to generalize small groups to protect confidentiality or to achieve adequate sample size for your analysis). The Urban Institute’s Do No Harm Data Visualization Recommendations include considering alternatives to using the term “other” as a catch-all category, including:
- Another ______ (e.g. Another race or Another group)
- Additional ______ (e.g. Additional races or Additional languages)
- All other self-descriptions
- People identifying as other or multiple races
- Identity not listed
- Identity not listed in the survey or dataset

Common Data Sources

In most cases, the Water Boards programs were not developed or designed to collect the types of data needed to conduct analyses with an equity lens as a matter of process which means that most will rely on external data sources. Below we have provided a list of common data sources that can tell us something about the extent to which we are achieving equity outcomes.

Demographics Data

Adding demographics data to your data project can help increase understanding of potential correlations or relationships between your data and demographic and socioeconomic characteristics of locations of interest. More specifically, the integration of demographics data into your data project can highlight the extent to which race can still be used to predict outcomes associated with your project and therefore underscore where and to what extent racial equity outcomes have not yet been achieved.

Demographics Data Needs Context

It’s important to remember to provide context about your project before trying to communicate demographics related results or answers to specific racial equity questions posed by our Board, the public, or our partners in this work. Contextual topics could include:

What is your program/project about? What is the mission?
What is your program/project meant to do? What are your objectives/goals?
How well does your program/project do these things now (aka Performance Report)?
What approaches are you taking with this project to advance equity outcomes?
What equity and data related questions do you have for your program/project? Which data types and datasets would be the best to use to answer that questions?

Communicating this contextual information may require different complementary modes of communication (e.g. presentation, fact sheets, visualizations), but investing the time to provide that grounding and framing will help the audience understand how you are approaching your data work with an equity lens within the scope of your program/project.

ALL of the questions above (and more!) should have already been answered to some degree during the Plan & Prepare Phase of your project.

Be sure to reference, pull from, and build on information you have already synthesized in your Equity Assessment and Data Management Plan documents!

Demographics Data Consideration Example

The Safe and Affordable Funding for Equity and Resilience (SAFER) Program developed the Final FY 2024-25 Fund Expenditure Plan for the Safe and Affordable Drinking Water Fund in response to the Senate Bill (SB) 200 (Ch. 120, Stats. 2019) which requires the annual adoption of a Fund Expenditure Plan for the Safe and Affordable Drinking Water Fund.

The SAFER Program’s goal is to provide safe and affordable drinking water in every California community, for every Californian. FY 2024-25 marks the halfway point of the initial ten years of continuously appropriated funding to the Safe and Affordable Drinking Water Fund as originally envisioned in SB 200.

Looking at the SAFER Program goal with an equity lens, we can add that the goal would be for race to no longer predict a person’s ability to have access to safe and affordable drinking water.

The Racial Equity and Environmental Justice Section of the Plan (Section VIII.G.) provides several tables with data that incorporates demographics data. As we see below, presenting demographics data alone does not tell the full story of the issue. The tables and the text lack clarity and in depth analysis on why the data is telling the story it is.

To tell a compelling story and truly advance equity - adding demographics data to an analysis is not enough!

As you go through the process of collecting demographics data for your project - be sure to understand:

WHY demographics data are needed for your project
How you plan on using demographics data, in concert with the other data sources you identified in your Data Management Plan, to tell the (often complex and nuanced) story behind the data.
How you plan on taking action to advance equity based on whatever is shown in the analysis. You might not be able to have a full plan in place at the data collection step - but now is the time to begin brainstorming so that you can take action as swiftly as possible once the analysis or data product is complete.

For this example, race and ethnicity of the populations served by the water systems likely isn’t the only difference between the systems. What other factors are associated with different populations that could be driving the imbalance in failing systems and funding? Perhaps the majority-Hispanic population systems are much larger, or older, or have more severe problems.

A more nuanced comparison would look at the major factors that determine cost (or other metrics driving the imbalance in failing systems and funding) and compare the racial difference between those subgroups. For example, comparing the racial and ethnic differences between systems of medium sized cities with a water treatment plants built within the past 30 years.

Some recommendations to improving the storytelling and meaningful impact of the data being shown in the report to support the advancement of equity could include:

making proportions of different populations and their association to failing systems (or other metrics) more explicit
adding a section articulating how the most impacted and burdend population systems (Hispanics in this case) will get more funding, proportionally
list some additional explanatory factors that could be explored in future analyses

Common Data Sources

The data sources many use to make inferences related to demographic and socioeconomic characteristics are from the United States Census Bureau and the associated American Community Survey (ACS) Data.

While we are fortunate to have just updated this dataset in 2020 there are limitations and potential inaccuracies associated with relying solely on census data to enumerate demographic characteristics within a given census tract. This tool from the Department of Finance exists to measure this limitation.

The ACS also has a Handbooks for Data Users to make it easier for folks to use the data appropriately.

Key Terms

In addition to the terms defined in the Handbook Glossary, users should know the following terms when using demographics data:

Demographics: statical data relating to characteristics of human populations, such as: age, race, ethnicity, sex, gender, income, education, etc.
Geographic (Geospatial) Vectors: how data are stored in geographic information systems:
- Points: zero-dimensional objects that contain only a single coordinate pair (i.e. latitude, longitude) to define their location. Example: Census landmarks
- Lines: one-dimensional features composed of multiple, explicitly connected points. Examples: roads, streams
- Polygons: two-dimensional features created by multiple lines that loop back to create a “closed” feature. Examples: region, city, county, census tract, block boundaries, lakes
Geographies: the geographic unit(s) into which demographic data are aggregated.
- Blocks (Census Blocks or Tabulation Blocks) are statistical areas bounded by visible features, such as streets, roads, streams, and railroad tracks, and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short line-of-sight extensions of streets and roads. Blocks are numbered uniquely with a four-digit census block number from 0000 to 9999 within census tract, which nest within state and county.
- Block Groups are statistical divisions of census tracts, are generally defined to contain between 600 and 3,000 people, and are used to present data and control block numbering. A block group consists of clusters of blocks within the same census tract that have the same first digit of their four-digit census block number.
  - Tribal Block Groups are only applicable to legal federally recognized American Indian reservation and off-reservation trust land areas, and are defined independently of the standard county-based block group delineation. Tribal block groups use the letter range A through K (except “I,” which could be confused with a number “1”) to identify and code the tribal block group. Tribal block groups nest within tribal census tracts.
- Census Tracts are small, relatively permanent statistical subdivisions of a county or statistically equivalent entity, that generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy.
  - Tribal Census Tracts are only applicable to legal federally recognized American Indian reservation and off-reservation trust land areas, are defined independently of the standard county-based census tract delineation, and can cross state and county boundaries. Tribal census tract codes are six characters long with a leading “T” alphabetic character followed by five-digit numeric codes having an implied decimal between the fourth and fifth character.
  Census Geography Unit Hierarchies

Also see this U.S. Census Glossary for additional terms specific to U.S. Census programs, data, and products.

Data Integration Methods

Depending on what demographics data sources and software you decide to use, the methods needed to combine, overlay, or compare with the data you are interested in may vary. See the Demographcs Use Case page for step by step guidance on how to download and compare demographics data to point, line, and polygon data types.

Important Reminders Before You Dive In

Data are NOT people - We need to use these data to get a better understanding of what’s going on in our communities, but the data (at best) only represent a sample of the communitie’s population and in no way reflect everyone or their lived experiences.

There’s no such thing as “equity data” - how we use data, interpret it, and act on what we learn makes our use equitable (or not). Simply including demographics data in your project’s analysis or data products does not make those resources equitable - to operationalize equity we need to take actions and make decisions in ways to advance equitable outcomes.

The data you’re using has limitations, be sure you know what they are before moving forward - as discussed above, all data have limitations, and that is particularly true for demographics data. Be sure you have a clear and comprehensive understanding of the limitations that apply to the specific datasets you’re using so you can collect and eventually process and analyze those data in ways that are appropriate.

CalEnviroScreen

CalEnviroScreen is a mapping tool that helps identify California communities that are most affected by many sources of pollution, and where people are often especially vulnerable to pollution’s effects. CalEnviroScreen can be a helpful tool in creating visualizations and performing analysis as it provides a number of indices, as well as a “rolled-up” score that combines environmental and demographic data together. As with any dataset or visualization tool there can be things to consider, a couple of which are discussed below.

Missing Values for CalEnviroScreen Scores

Users conducting an analysis with the CalEnviroScreen (CES) 4.0 dataset should be aware that it contains missing values, both for individual indicators and overall CES scores. These missing values are distinct from zeros, which are also in the CES dataset.

In the CES 4.0 data (for the version available as of April 2023), the shapefile containing CES 4.0 scores encodes these missing values as negative numbers (-999 for most variables, and -1998 for one variable). The Excel workbook containing CES 4.0 scores encodes these missing values as NA. Also, note that the CalEnviroScreen 3 shapefile (June 2018 update version) encoded missing values as 0, so users should be aware of this change if/when updating an analysis from CES 3 to CES 4.0 data.

Users should account for these missing values – and their different encodings – as needed when doing any analysis using CES data.

For more information about the missing (and zero) values in the CES 4.0 dataset, see the data dictionary (PDF file) that accompanies the CalEnviroScreen 4.0 results Excel workbook, both of which are available for download as a zip file.

Inconsistent Census Tract Boundaries in CalEnviroScreen 4.0 Shapefile

In the CES 4.0 data (for the version available as of April 2023), the shapefile containing CES 4.0 scores uses a simplified version of the polygons that represent 2010 census tracts. The boundaries of the census tracts defined by these simplified polygons do not always align with the boundaries of neighboring census tracts, resulting in slight gaps or overlaps between some neighboring census tracts. These inconsistencies are not likely to have a significant impact on most uses of the CES data, but they could impact some types of analysis based on CES data. For example, when assessing sites or facilities based on the CES score of the census tract they are located in, sites located near a census tract boundary could be associated with more than one census tract (and more than one CES score) in areas where there are overlapping census tract polygons, or not associated with any census tract (and no CES score) in areas where there are gaps between census tract polygons.

This issue may be addressed in a future release of the CES dataset; in the meantime, a possible workaround is to use the official 2010 census tract boundaries from the US Census Bureau for any calculations, then use census tract IDs to tie this information to the associated CES score for each tract.

CalEnviroScreen Resources

Analysis of Race/Ethnicity and CalEnviroScreen 4.0 Scores Storymap | Report
SB 535 Disadvantaged Communities
CalEnviroScreen page of the California Water Boards Racial Equity Resource Hub

U.S. EPA EJScreen

EJScreen is EPA’s environmental justice mapping and screening tool that provides EPA with a nationally consistent dataset and approach for combining environmental and socioeconomic indicators.

First-time users may find the 5-minute EJScreen in 5: A Quick Overview of EJScreen video helpful as an introduction to the tool.

EJScreen Resources

EJScreen User Guide for navigating the various features of the tool,
EJScreen Glossary for better understanding the map layers and indicators being displayed in the tool, and
Frequent Questions about EJScreen

Internal Administrative Data

Most organizations, including the Water Boards, have various types of Administrative Data which includes internal demographics data related to the workforce in the organization. This data is normally confidential but can be very valuable when working on addressing workforce equity. If and when these data are used, it is critical to ensure the protection and security of the data to preserve confidentiality through the development and sharing (or not) of the final data product. See the Data Collection and Processing section of the Planning Phase for more guidance.

Surveys

There may be instances where the data you need are not already available and you need to collect it yourself through the use of survey(s).

Survey Design

Creating surveys that yield actionable insights is all about the details - and writing effective survey questions is the first step. You do not have to be an expert to build and distribute an effective online survey, but by checking your survey against tried-and-tested benchmarks, you can help ensure you are collecting the best data possible.

Tips for Building an Effective Survey:

Make Sure That Every Question Is Necessary
Keep it Short and Simple
Ask Direct Questions
Ask One Question at a Time
Avoid Leading and Biased Questions
Speak Your Respondent’s Language
Use Response Scales Whenever Possible
Avoid Using Grids or Matrices for Responses
Rephrase Yes/No Questions if Possible
Take Your Survey for a Test Drive

Guides for good survey design include:

Beware of Common Types of Survey Bias

As you develop your survey, it’s important to design and implement the survey in a way that minimizes or eliminates the biases as much as possible. Below are three tables that provide an overview of common survey biases to be aware of and avoid¹:

Biases Associated with Question Design

While the specific sources below vary, they can generally be caused by one or more of the following issues: problematic wording, data for the intended purpose of the question is missing or inadequate, the scale included the question is faulty, questions that are leading, intrusive, or inconsistent.

Bias Source	Description	Tips to Minimize Bias
Abrupt question	The question is too short and can come off as jarring to the respondent.	Add a transition or introduction to the beginning of the question.
Ambiguous question	The question leads respondents to understand the question differently than was intended and, therefore, to answer a different question than was intended	Have a trusted partner, but one who had not participated in survey development, review and provide feedback on questions.
Complex question	The question is usually long, vague, too formal, or overly complex.	Keep questions short and to the point.
Data degradation	The question is phrased in a way that encourages respondents to provide less accurate information, which, can not be recovered or made more accurate after the survey is complete.	Phrase questions in a manner that collects the most accurate information (ideally in the form of continuous data) Example: What age category do you belong to? ➔ What is your age? OR What year were you born? Note that sometimes it is more beneficial to collect generalized information (e.g. age category) so that you can prevent the unnecessary collection of Personal Identifiable Information (PII).
Double-barrelled question	The question is made up of two or more questions, and therefore make it difficult for the respondent to know which part of the question to answer and for the investigator to know which part of the question the respondent actually answered	Separate questions so that each question is only asking for one thing.
Forced choice (aka insufficient category)	The question provides too few categories that results in the respondents being forced to choose an imprecise option.	Add one or more of the below category options to your question: Unsure Don’t know Other Not Applicable (N/A)
Hypothetical question	The question asks the respondent about their beliefs (hypothetical), which can yield more generalized results than are helpful.	Keep questions specific to the respondent’s behaviors. Example: Do you think…? ➔ Have you ever…?
Incomplete interval	The question does not include all possible intervals so respondents cannot select a category that most accurately reflects their experience.	Add more intervals, or broaden interval categories, when appropriate. Example - insufficient intervals Once per month Once per week More than once per week Example - sufficient intervals Less than once per month Once per month to once per week More than once per week
Insensitive scales	The question does not contain a range or scale that would result in sufficient discriminating power to differentiate the respondents because of the limited categories.	Use a scale has five or more categories. Example: On a scale of 1-3…? ➔ On a scale of 1-5… OR On a scale of 1-10…?
Intrusive question	The question is requesting sensitive information (e.g. PII, income, identity, culture, etc.) too abruptly or directly and can feel intrusive to the respondent. This can result in the respondent electively suppressing information, providing inaccurate information and can influence how the respondent answers subsequent questions.	Confirm the information you are asking for is truly essential to your project. If it is, frame the question in a way that respects the sensitivity of the information you are requesting. Try adding a transition into the question that provides context and explains why the information is needed.
Leading question	The question is worded in a way that subtly guides respondents toward a certain answer.	Use wording that is generalized and unbiased towards the respective answer choices. Example: Don’t you agree that…? ➔ Do you agree or disagree that…? OR What are your thoughts about…?
Overlapping interval	The question contains intervals that overlap, which can result in respondent confusion related to which interval should be selected	Use intervals that do not overlap Example - Overlapping intervals None 5 or less 5 - 10 10 or more Example - sufficient intervals None 1 - 4 5 - 9 10 or more
Scale format	An even or an odd number of categories in the scale for the respondents to choose from may produce different results. Questions with an odd number of categories in the scale can result in neutral answers, whereas those with even categories forces respondents to pick a side on the scale.	There is no consensus on which approach is better overall (i.e. even vs. odd). What’s important is that you select the scale that is most appropriate for the question and data you need.
Technical jargon	The question includes technical terms that are specific to a profession that may not be understood by those outside of that field.	Replace jargon with more plain, accessible, and inclusive language.
Time period	The question does not identify a common time period for the respondents experience	Include a specific time period in the question. Example: In the last 12 months…? ➔ Between Jan 1 to Dec 31 of last year…?
Uncommon word	The question uses uncommon or difficult words.	Replace uncommon words with those that are more commonly used by your survey audience. Example Uncommon/Common word pairs
Vague word	The question includes words that undefined or may have multiple definitions.	Replace vague words with those that are more precise. Examples: Occasionally ➔ one per month Regularly ➔ once per week Bi-weekly ➔ twice per week OR every other week Bi-monthly ➔ twice per month OR every other month
Word choice	The question may use words or phrases that pull focus or frame aspects of the question in a way that increases the likelihood of respondents choosing an answer that does not accurately reflect their intended choice.	Use consistent question structure and terminology. Example - poor word choice: Which operation would you prefer? An operation that has a 5% mortality. An operation in which 90% of the patients will survive. Patients scheduled for surgery may choose the second option when they see or hear the words “90%” and “survive,” but in fact a 90% survival rate (or 10% mortality) is worse than a 5% mortality Example - improved word choice: Which operation would you prefer? An operation in which 95% of the patients will survive. An operation in which 90% of the patients will survive.

Biases Associated with Survey Design

While the specific sources below vary, they can generally be caused by one or more of the following issues: inconsistencies with past surveys which makes it difficult to compare responses over time, problematic formatting or design that can cause confusion or fatigue among respondents and result in inaccurate answers.

Bias Source	Description	Tips to Minimize Bias
Horizontal formatting	For questions associated with multiple choice responses, displaying multiple choices horizontally can cause confusion. This is especially important for surveys completed on paper.	List multiple choices vertically. Example - horizontal formatting Excellent … [ ] Good … [ ] Fair … [ ] Poor … [ ] Example - vertical formatting Excellent …….. [ ] Good …………. [ ] Fair …………… [ ] Poor ………….. [ ]
Inconsistency among surveys	Components of a survey are changed over time and over the course of multiple offerings of the survey. When components of a survey change, it can influence how people respond and the results of the different surveys may not be comparable. Survey components include: Formatting Word choice Scales / multiple choice options Definitions used	If there is interest in comparing responses to surveys that have been administered at multiple points in time, keep the sections of the survey that you want to compare identical. If there are instances where new questions arise and are essential to include in future surveys, add the new questions to the end of the existing survey so the respondent’s experience for the initial questions are as comparable as possible. If new questions are added - be sure the survey length does not exceed the times recommended in the “response fatigue” row below.
Juxtaposed scale (aka Likert scale)	This is often referred to as a Likert Scale question, which displays a list of single-answer questions and a rating scale for the answers, so a respondent can select a value from the scale to answer each question. Likert Scale questions tend to ask about: Agreement Frequency Satisfaction Importance Likelihood Quality Interest Usefulness Ease of use	The advantage of using Juxtaposed (aka Likert) scale formatting is that it can force respondents to think about and compare their responses for each item because they are side by side. However, this format has been shown to cause confusion among respondents who are less educated. If you suspect that may apply to your intended survey audience, you may prefer to separate the questions so respondents only review and answer one question at a time.
No-saying / yes-saying (aka nay-saying / yea-saying)	For groups of questions that only include statements associated with yes/no response options, respondents tend to answer yes to all questions or no to all questions.	Use both positive and negative statements about the same issue sprinkled through the group of questions to break up the pattern and encourage respondents to consider one question at a time, rather than to group them together.
Non-response	Even with adequate sample representation (see below), individuals are choosing not to respond to your survey.	Consider when and how you are reaching out to your audience. Maybe changing the timing or method through which you are offering the survey will improve the response rate. Consider reducing the length of time it take to complete the survey. Remember that the average attention span for adults in the U.S. is approximately 8 seconds - taking even a five minute survey may feel like too much for your audience. Consider offering incentives for completing the survey.
Open-ended questions	Open-ended questions allow respondents to provide short or long text responses to a question. There is no way to standardize the quality and vocaublary used in the responses which can make analysis more challenging. Moreover, respondents are less likely to take time to answer the questions fully.	Only use open-ended questions when necessary. Open-ended questions are more appropriate than close-ended questions, particularly in surveys of knowledge and attitudes, and can yield a wealth of information. If you decide to use an open-ended question, be sure to decide on how you will analyze the responses using appropriate qualitative methods.
Sample representation (aka sampling bias, selection bias)	The sample of individuals selected to complete the survey is not representative of the population, which can lead it inaccurate results and conclusions.	Ensure you are delivering the survey randomly to the audience(s) that represent the population, and that you have enough responses to be representative of the population. Consider the best way(s) to reach your target audience - and use methods that would be the most effective for and accessible to them. Consider extending the deadline to complete the survey and sending reminders to those who have not yet responded.
Skipping questions	Questions that instruct respondents to skip to another question based on their response can lead to the loss of important information.	Be sure questions that will be skipped cannot be applied to different respondents based on their first response. Work with partners to test the survey before it is administered to work out any such issues. Example of a skipped question: (1) Are you self-employed? Yes No (Go to question 8) In this case, individuals who are not self-employed would not be able to complete questions 2 - 7. If the information requested in one or more of those skipped questions is pertinent to all respondents, try re-ordering or grouping questions so respondents are able to provide all essential information regardless of their choices.
Response choice alignment	If interviewers are completing the survey for the respondents (e.g., during in-person or telephone interviews): placing the check-box to the left of (before) the possible options can result in errors. If respondents are completing the survey themselves (e.g., mailed, online surveys): placing the check-box to the right of (after) the possible options can make it more difficult for respondents to complete the survey.	Select the response choice alignment according to how you will be delivering the survey to reduce confusion and errors. If interviewers are completing the survey for the respondents, use a right-aligned format: Excellent …….. [ ] Good …………. [ ] Fair …………… [ ] Poor ………….. [ ] If respondents are completing the survey themselves, use a left-aligned format: [ ] excellent [ ] good [ ] fair [ ] poor
Response fatigue	The survey is too long, which induces fatigue among respondents and can result in rushed, uniform and/or inaccurate answers. For example, towards the end of a lengthy survey, respondents tend to say all yes or all no or refuse to answer all remaining questions.	Review the original purpose of your project and survey and only include essential questions. The length of time to complete surveys should not exceed the following times: Self-administered survey (e.g. online, by mail): 10 to 20 minutes Telephone survey: 30 to 60 minutes In-person survey: 50 to 90 minutes

Biases Associated with Survey Implementation

While the specific sources below vary, they can generally be caused by one or more of the following issues: the interviewer is not objective, respondent’s conscious or subconscious reactions, learning, inaccurate recall, or perception of questions based on their lived experiences and culture.

Bias Source	Description	Tips to Minimize Bias
Acquiescence (aka yes bias, friendliness bias, confirmation bias)	Respondents tend to agree with survey questions, regardless of their actual beliefs, to avoid being disagreeable in the eyes of the interviewer or to expedite completing the survey.	Reframe questions so they use more neutral language and avoid asking for agreement on a topic. Avoid close-ended questions that do not leave room for nuance; allow for multiple choice and scale questions and provide space for additional open-ended responses. Train interviewers to deliver surveys consistently and objectively - so respondents don’t feel pressured to agree with the interviewer. Enable respondents to complete the survey anonymously and without an interviewer present.
Cultural differences	The culture and lived experiences of the respondents can affect their perception of questions and therefore their answers.	Have a trusted partner, but one who had not participated in survey development, review and provide feedback on questions.
End aversion (aka central tendency)	Respondents usually avoid ends of scales in their answers and tend to provide responses that are somewhere closer to the middle of the response options. Example: Respondents are more likely to check “Agree” or “Disagree” than “Strongly agree” or “Strongly disagree”	None - but be aware of this potential bias as you analyze your results.
Extreme response bias	Respondents tend to submit responses that are at the ends of scales and provide responses that are at the extremes of the possible response options. Example: Respondents are less likely to check “Agree” or “Disagree” than “Strongly agree” or “Strongly disagree”	None - but be aware of this potential bias as you analyze your results.
Faking bad (aka hello-goodbye effect)	Respondents try to appear worse off than they actually are to qualify for support or resources that could be granted in according to their responses.	If receipt of resources is tied to how people complete the survey, consider when and how to communicate how respondent data will be used to make such decisions.
Faking good (aka social desirability, conformity bias, obsequiousness)	Respondents may alter their responses in the direction they perceive to be desired by the investigator or society at large. Socially undesirable answers tend to be under-reported.	Use wording that is generalized and neutral. Ask questions that might be associated with individual or social desirability toward the end of the questionnaire so that they will not affect other questions. Let respondents complete the survey anonymously. Make questions about name or contact information optional. Instead of requiring in-person or telephone interviews, let respondents submit anonymous, mailed in surveys. When asking about socially undesirable behaviors, it is better to ask whether the person had ever engaged in the behavior in the past before asking about current practices, because past events are less threatening.
Hypothesis guessing	Respondents may systematically alter their responses when, during the process of answering the survey, they think they know the study hypothesis.	None - but be aware of this potential bias as you analyze your results.
Interviewer data gathering methods	The interviewer can pose questions, or gather data in a way that is informed and led by their own biases, information they think they know about the respondent, etc. This can result in errors that impact survey results.	Train interviewers to deliver surveys consistently and objectively.
Nonblinding	When an interviewer is not blind to the study hypotheses, they may consciously gather selective data	Train interviewers to deliver surveys consistently and objectively and ensure those delivering the survey are blind to the study hypotheses.
Positive satisfaction (aka positive skew)	Respondents tend to give positive answers when answering questions on satisfaction.	None - but be aware of this potential bias as you analyze your results.
Primacy and recency	Research has indicated that in mailed surveys, respondents may tend to choose the first few response options on the list (primacy bias), though in telephone or personal interview surveys, they are more likely to respond in favor of the later categories (recency bias).	Reduce the number of categories presented to respondents and randomize the order of categories in the survey. Randomize the answer option order.
Proxy respondent (aka surrogate data)	Soliciting information from proxies (e.g., spouse, family members) may result in inaccurate information.	Keep questions related to the the respondent’s experience. Do not ask someone to answer attitudinal, knowledge, or behavior questions for others.
Recall / Telescope Bias	This type of bias is because of differences in accuracy or completeness of respondents recall prior to major events or experiences (general recall), and because respondents may recall an event or experience in the distant past as happening more recently (telescope).	None - but be aware of this potential bias as you analyze your results.
Respondent’s learning	Having thought about prior questions can affect the respondent’s answer to subsequent questions through the learning process as the questionnaire is completed.	Randomize the order of the questions for different respondents.
Unacceptability	Questions or measurements that can hurt, embarrass, invade privacy, or require excessive commitment may be systematically refused or evaded.	Whenever possible, do not ask such questions. If asking such a question is absolutely essential and unavoidable, do so with sensitivity and consider using incentives to increase participation rate.

Picking a Survey Software

Most Water Board staff will use Microsoft Forms which is available to all staff through the Microsoft 365 suite of applications. Microsoft Forms has a lot of advantages because of its integration with other Microsoft tools like Excel and PowerBI which allow for the survey results to be analyzed and visualized. See this 6 min video on Using Microsoft Forms data with Power BI for guidance on how to make the connection between Forms and PowerBi via Sharepoint that allows for consistent updating of results

Note that those who use Microsoft Forms and other free software like Google Forms will likely need to transform the form output from a wide format to a long format for analysis. See the Processing Data with an Equity Lens section of the Data Processing page for more guidance.

Footnotes

The bulk of content in this section has been informed by: A Catalog of Biases in Questionnaires ↩︎