Dataset: U.S. School District Mission Statements


Organization: Pew Research Center
Tools: Python, data documentation, data privacy, text processing, data cleaning, content analysis
Tags: open data, education policy, text data, computational social science, dataset release

This dataset release provides the underlying text corpus used in Pew Research Center’s analysis of U.S. school district mission statements, enabling researchers to examine how districts articulate goals, values, and priorities in official language. The dataset includes 1,314 mission statements collected from public school district websites across the United States, along with metadata such as district location, political context, and demographic characteristics.

The dataset supports investigation of themes such as diversity, equity, and inclusion and how these references vary across political and geographic contexts, available in this research report: School District Mission Statements and the Politics of DEI in K–12 Education.

The data has been processed to remove personally identifiable information while preserving the integrity of the text for analysis. This process is explained in this blog post: Computational methods for redacting identifying information in large text data.

What I did

I supported preparation and documentation of the dataset for public release, including structuring text data, validating entries, extracting identifiable information and ensuring usability for secondary analysis and reproducibility.