Mapping Air Pollution
to Economic Disparity
A quantitative analysis of the relationship between PM2.5 pollution exposure and median household income across 615 U.S. counties, using EPA and Census Bureau data.
| Decile | n | Mean PM2.5 | Median PM2.5 | Std Dev |
|---|---|---|---|---|
| 0 (lowest income) | 62 | 7.823 | 7.810 | 2.373 |
| 1 | 61 | 7.407 | 7.514 | 1.727 |
| 2 | 62 | 7.457 | 7.530 | 1.578 |
| 3 | 61 | 7.344 | 7.407 | 1.453 |
| 4 | 62 | 7.633 | 7.725 | 2.107 |
| 5 | 61 | 7.198 | 7.031 | 1.886 |
| 6 | 61 | 7.073 | 7.099 | 1.881 |
| 7 | 62 | 7.136 | 7.171 | 2.093 |
| 8 | 61 | 6.629 | 6.837 | 1.619 |
| 9 (highest income) | 62 | 6.988 | 6.957 | 1.421 |
REGRESSION
Pearson & Spearman correlation, Ordinary Least Squares, Huber robust regression, and median quantile regression for comprehensive relationship characterization.
DECILE ANALYSIS
ANOVA, Kruskal–Wallis tests, and Spearman rank trend tests across income deciles with monthly and seasonal sub-regressions.
QUINTILE COMPARISON
Welch's t-test, Kolmogorov–Smirnov test, and bootstrap confidence intervals comparing extreme income quintiles.
AirHealthLink: Quantifying the Relationship Between PM2.5 Air Pollution and Economic Status Across U.S. Counties
This study analyzes the relationship between fine particulate matter (PM2.5) and median household income by county throughout the United States. Using data from the EPA Air Quality System and the U.S. Census, PM2.5 concentrations were aggregated for 615 counties and then compared with income data through correlation and regression analysis, resulting in a consistent negative association: counties with higher PM2.5 levels tend to have lower median incomes, with an estimated decrease of about $1,350 per μg/m³ increase in PM2.5 (p < 0.001). Income-based comparisons further indicate that lower-income counties tend to experience approximately 0.8 μg/m³ higher PM2.5 concentrations than higher-income counties. These findings provide quantitative evidence of economic disparities in concentrations of air pollution across the United States.
This study examines the relationship between PM2.5 concentrations in air and median household income in the United States. Merging data from the EPA Air Quality System and the U.S. Census, data for 615 counties were analyzed through regression, correlation, and various distributional comparisons. After analysis, significant national-level disparities have been observed between high-income and low-income communities in terms of PM2.5 concentration and exposure.
- Annual mean, annual median, annual standard deviation, and annual 90th percentile concentration (μg/m³).
- Monthly mean concentrations.
- Seasonal averages (December through February, March through May, June through August, September through November).
- Counts of days exceeding 12, 25, and 35 μg/m³ thresholds.
- Correlation Analysis: Pearson and Spearman correlation coefficients were computed between annual mean PM2.5 and median household income.
- Regression Analysis: Ordinary least squares (OLS), robust (Huber), and median quantile regressions were used to model income as a function of PM2.5.
- Income Decile Analysis: Counties were divided into ten income deciles in order to evaluate PM2.5 variation and stratification on economic lines. ANOVA, Kruskal–Wallis, and Spearman rank tests were used to assess monotonic trends.
- Temporal Analysis: Correlations were repeated across monthly and seasonal PM2.5 means to detect possible temporal variation in the income–pollution relationship.
- Extreme Bin Comparison: The lowest and highest income quintiles were compared using Welch's t-test, the Kolmogorov–Smirnov test, and bootstrap confidence intervals to find the difference in mean PM2.5.
--pm25 data/county_level_pm25.csv \
--econ data/county_level_economic_status.csv
The lowest income decile, decile 0, exhibited the highest mean PM2.5 at 7.823 μg/m³ (SD = 2.373). The overall minimum was observed in decile 8 at 6.629 μg/m³ (SD = 1.619), resulting in a difference of 1.194 μg/m³. The highest income decile, decile 9, recorded a mean of 6.988 μg/m³ (SD = 1.421), yielding a gap of 0.835 μg/m³ between the income extremes.
| Decile | n | Mean (μg/m³) | Median (μg/m³) | SD (μg/m³) |
|---|---|---|---|---|
| 0 (lowest) | 62 | 7.823 | 7.810 | 2.373 |
| 1 | 61 | 7.407 | 7.514 | 1.727 |
| 2 | 62 | 7.457 | 7.530 | 1.578 |
| 3 | 61 | 7.344 | 7.407 | 1.453 |
| 4 | 62 | 7.633 | 7.725 | 2.107 |
| 5 | 61 | 7.198 | 7.031 | 1.886 |
| 6 | 61 | 7.073 | 7.099 | 1.881 |
| 7 | 62 | 7.136 | 7.171 | 2.093 |
| 8 | 61 | 6.629 | 6.837 | 1.619 |
| 9 (highest) | 62 | 6.988 | 6.957 | 1.421 |
The lowest income decile also exhibits the highest variability (SD = 2.373), indicating a notably great heterogeneity in pollution exposure among the lowest-income counties. In contrast, the wealthiest decile (SD = 1.421) shows the most consistent exposure (consistently the lowest).
ANOVA and Kruskal–Wallis tests both returned significant results (p < 0.001), confirming that PM2.5 distributions differ across income groups.
Several mechanisms could help to explain this pattern. Lower-income counties may be located closer to industrial facilities, transportation corridors such as highways, or agricultural sources of particulate matter. Wealthier communities may also benefit from being able to afford more effective environmental regulation, superior urban planning, and greater political capacity.
The non-monotonic behavior of decile 4, which stands as the highest value outside of the bottom two deciles, is notable among the data. This may stand to reflect a concentration of middle-income counties in regions possessing mixed industrial and suburban land use, where PM2.5 sources could include agriculture, industrial sites, or city pollution.
The elevated standard deviation in the lower deciles, and specifically in decile 0, with a SD of 2.373, shows that the lowest-income groups are the most internally diverse in pollution exposure. They may potentially be rural with clean and untouched air, while other low-income groups may sit near major pollution sources or in city centers. This diversity of results suggests that further sub-county investigation may be warranted.
The fully reproducible and automated nature of AirHealthLink's pipeline enables other researchers to replicate our findings, extend this analysis to additional years, economic indicators, or pollutants, and facilitate potential longitudinal or comparative studies.
Second of all, although the 615 counties with complete data represent a large subset of the more than 3,100 counties in the United States, they by no means represent every county available. FIPS mismatches and ACS data unavailability, such as some counties not possessing daily records, made many counties' data unusable in the study.
Third, using county-level aggregation masks any intra-county disparities in both wealth and pollution exposure. If intra-county disparities were acknowledged through the use of a finer-grain analysis at the census tract or ZIP code level, potentially greater results may have emerged.
Finally, the study uses a single year's data (2022), which may not capture longer-term trends or the effects of episodal or occasional events such as wildfires. A multi-year analysis would considerably strengthen the generalizability of our findings.
Explore the Source
Dive into the open-source codebase to replicate, verify, and extend the analysis.