Environmental Data Research

Mapping Air Pollution
to Economic Disparity

A quantitative analysis of the relationship between PM2.5 pollution exposure and median household income across 615 U.S. counties, using EPA and Census Bureau data.

Read the Whitepaper View on GitHub

Scroll to explore

−$1,350

Estimated decrease in median household income per 1 μg/m³ increase in PM2.5 concentration (p < 0.001)

📊

615

Counties Analyzed

Complete paired data

🌫️

7.82 → 6.99

PM2.5 by Income Decile

μg/m³ · lowest to highest income

📅

365 Days

Temporal Coverage

Full calendar year 2022

🔬

p < 0.001

Statistical Significance

OLS regression

🏛️

Federal Data Sources

EPA AQS + U.S. Census Bureau

⚙️

Commits

Active development

Mean PM2.5 by Income Decile

Annual mean concentration (μg/m³) across 615 U.S. counties · Decile 0 = lowest income

Decile	n	Mean PM2.5	Median PM2.5	Std Dev
0 (lowest income)	62	7.823	7.810	2.373
1	61	7.407	7.514	1.727
2	62	7.457	7.530	1.578
3	61	7.344	7.407	1.453
4	62	7.633	7.725	2.107
5	61	7.198	7.031	1.886
6	61	7.073	7.099	1.881
7	62	7.136	7.171	2.093
8	61	6.629	6.837	1.619
9 (highest income)	62	6.988	6.957	1.421

Methodology

Rigorous Statistical Framework

Multi-method analytical approach combining parametric and non-parametric techniques to ensure robust, reproducible results.

REGRESSION

Pearson & Spearman correlation, Ordinary Least Squares, Huber robust regression, and median quantile regression for comprehensive relationship characterization.

DECILE ANALYSIS

ANOVA, Kruskal–Wallis tests, and Spearman rank trend tests across income deciles with monthly and seasonal sub-regressions.

QUINTILE COMPARISON

Welch's t-test, Kolmogorov–Smirnov test, and bootstrap confidence intervals comparing extreme income quintiles.

Data Pipeline

End-to-End Reproducible Workflow

From raw API calls to publication-ready analysis — fully automated and open source.

🌐

EPA AQS API

PM2.5 daily data

→

📊

Census API

Income by county

→

🔗

Merge & Clean

County-level join

→

📈

Aggregate

Seasonal + annual

→

🧪

Statistical Tests

Multi-method suite

Python pandas statsmodels scipy EPA AQS API Census API matplotlib

Aggregation

Comprehensive PM2.5 Summaries

County-level daily data aggregated across multiple temporal and statistical dimensions.

📉

Mean · Median · SD · P90

Annual Statistics

🗓️

Jan–Dec Averages

Monthly Resolution

🍂

DJF · MAM · JJA · SON

Seasonal Breakdowns

⚠️

12 · 25 · 35 μg/m³

Exceedance Thresholds

Full Paper

Research Whitepaper

AirHealthLink: Quantifying the Relationship Between PM2.5 Air Pollution and Economic Status Across U.S. Counties

Alexandre De Belen

Independent Project · github.com/leptio/AirHealthLink

Abstract

This study analyzes the relationship between fine particulate matter (PM2.5) and median household income by county throughout the United States. Using data from the EPA Air Quality System and the U.S. Census, PM2.5 concentrations were aggregated for 615 counties and then compared with income data through correlation and regression analysis, resulting in a consistent negative association: counties with higher PM2.5 levels tend to have lower median incomes, with an estimated decrease of about $1,350 per μg/m³ increase in PM2.5 (p < 0.001). Income-based comparisons further indicate that lower-income counties tend to experience approximately 0.8 μg/m³ higher PM2.5 concentrations than higher-income counties. These findings provide quantitative evidence of economic disparities in concentrations of air pollution across the United States.

Section 1

Introduction

Fine particulate matter (PM2.5) is a major air pollutant repeatedly linked to morbidity, environmental degradation, and chronic disease. Its exposure distribution is not uniform between populations, and PM2.5 levels vary substantially across regions: existing evidence associates PM2.5 exposure with socioeconomic status. Despite this, there is a notable lack of a uniform county-level evaluation integrating air quality and economic status across the United States.

This study examines the relationship between PM2.5 concentrations in air and median household income in the United States. Merging data from the EPA Air Quality System and the U.S. Census, data for 615 counties were analyzed through regression, correlation, and various distributional comparisons. After analysis, significant national-level disparities have been observed between high-income and low-income communities in terms of PM2.5 concentration and exposure.

Section 2

Data and Methods

2.1 Data Sources

All PM2.5 data was obtained from the U.S. Environmental Protection Agency's Air Quality System (AQS), which provides daily county-level averages of airborne fine particulate matter (aerodynamic diameter ≤2.5 μm). Household income data was obtained from the U.S. Census Bureau's American Community Survey (ACS) 5-year estimate of median household income.

2.2 Pre-processing

All data were analyzed and all relevant columns were standardized before data analysis began. Dates were all parsed into datetime objects, and all PM2.5 data and economic fields were standardized using numeric conversions. County Federal Information Processing Standard (FIPS) codes are constructed for the data by matching state-level and county-level codes.

2.3 Aggregation

Daily PM2.5 measurements were aggregated to county-level summaries. Each county's dataset includes:

Annual mean, annual median, annual standard deviation, and annual 90th percentile concentration (μg/m³).
Monthly mean concentrations.
Seasonal averages (December through February, March through May, June through August, September through November).
Counts of days exceeding 12, 25, and 35 μg/m³ thresholds.

2.4 Normalization

The AQS and Census Bureau datasets were merged on the standardized FIPS code obtained in pre-processing. Only counties with valid PM2.5 and income data were retained. A merged table of 615 counties was produced.

2.5 Statistical Analysis

The following analyses were executed automatically:

Correlation Analysis: Pearson and Spearman correlation coefficients were computed between annual mean PM2.5 and median household income.
Regression Analysis: Ordinary least squares (OLS), robust (Huber), and median quantile regressions were used to model income as a function of PM2.5.
Income Decile Analysis: Counties were divided into ten income deciles in order to evaluate PM2.5 variation and stratification on economic lines. ANOVA, Kruskal–Wallis, and Spearman rank tests were used to assess monotonic trends.
Temporal Analysis: Correlations were repeated across monthly and seasonal PM2.5 means to detect possible temporal variation in the income–pollution relationship.
Extreme Bin Comparison: The lowest and highest income quintiles were compared using Welch's t-test, the Kolmogorov–Smirnov test, and bootstrap confidence intervals to find the difference in mean PM2.5.

2.6 Reproducibility

After producing a local clone of the repository on your machine and acquiring and using all necessary API keys, the complete analysis can be replicated by running:

python pm25_economic_analyzer.py \
--pm25 data/county_level_pm25.csv \
--econ data/county_level_economic_status.csv

Please remember to leave some time for the data to download and for the analysis to run.

Section 3

Results

3.1 Main Income Decile Results

Across the 615 counties in the analytical dataset, OLS regression identified a statistically significant negative relationship between annual mean PM2.5 and median household income (β ≈ $1,350 per μg/m³; p < 0.001). This finding was corroborated by Huber robust regression and median quantile regression, both of yielding coefficients of comparable magnitude and direction, indicating the result is not driven by outliers.

The lowest income decile, decile 0, exhibited the highest mean PM2.5 at 7.823 μg/m³ (SD = 2.373). The overall minimum was observed in decile 8 at 6.629 μg/m³ (SD = 1.619), resulting in a difference of 1.194 μg/m³. The highest income decile, decile 9, recorded a mean of 6.988 μg/m³ (SD = 1.421), yielding a gap of 0.835 μg/m³ between the income extremes.

Table 1. PM2.5 Summary Statistics by Income Decile

Decile	n	Mean (μg/m³)	Median (μg/m³)	SD (μg/m³)
0 (lowest)	62	7.823	7.810	2.373
1	61	7.407	7.514	1.727
2	62	7.457	7.530	1.578
3	61	7.344	7.407	1.453
4	62	7.633	7.725	2.107
5	61	7.198	7.031	1.886
6	61	7.073	7.099	1.881
7	62	7.136	7.171	2.093
8	61	6.629	6.837	1.619
9 (highest)	62	6.988	6.957	1.421

The general downward trend is not perfectly monotonic. Decile 4 shows a local elevation (7.633 μg/m³), suggesting that middle-income counties may face heightened pollution conditions, potentially reflecting a mix of suburban and industrial geographies. The highest decile, Decile 9's mean, 6.988 μg/m³ (SD = 1.421), shows a local elevation as well, hinting towards the air pollution suffered by high-income residents of urban environments. The Spearman rank trend test, however, confirmed an overall statistically significant negative trend across deciles.

The lowest income decile also exhibits the highest variability (SD = 2.373), indicating a notably great heterogeneity in pollution exposure among the lowest-income counties. In contrast, the wealthiest decile (SD = 1.421) shows the most consistent exposure (consistently the lowest).

ANOVA and Kruskal–Wallis tests both returned significant results (p < 0.001), confirming that PM2.5 distributions differ across income groups.

3.2 Temporal Patterns

Temporal analysis showed that the negative income–PM2.5 correlation persisted across all months and seasons, though the magnitude varied. Summer months (JJA) showed the strongest associations, likely reflecting elevated photochemical activity and wildfire contributions that disproportionately affect lower-income, rural counties.

3.3 Extreme Quintile Comparison

Welch's t-test confirmed a statistically significant difference in mean PM2.5 between the lowest and highest income quintiles. Bootstrap confidence intervals for the mean difference excluded zero, which reinforces the strength of the observed effects arising from wealth disparity.

Section 4

Discussion

The consistently-negative correlation between PM2.5 and income across multiple statistical methods and analyses underscores the economic realities of air pollution exposure in the United States. These findings align with broader literature about environmental justice and ethical environmentalism, which tends to document the reality of lower-income and minority communities bearing a disproportionate pollution burden.

Several mechanisms could help to explain this pattern. Lower-income counties may be located closer to industrial facilities, transportation corridors such as highways, or agricultural sources of particulate matter. Wealthier communities may also benefit from being able to afford more effective environmental regulation, superior urban planning, and greater political capacity.

The non-monotonic behavior of decile 4, which stands as the highest value outside of the bottom two deciles, is notable among the data. This may stand to reflect a concentration of middle-income counties in regions possessing mixed industrial and suburban land use, where PM2.5 sources could include agriculture, industrial sites, or city pollution.

The elevated standard deviation in the lower deciles, and specifically in decile 0, with a SD of 2.373, shows that the lowest-income groups are the most internally diverse in pollution exposure. They may potentially be rural with clean and untouched air, while other low-income groups may sit near major pollution sources or in city centers. This diversity of results suggests that further sub-county investigation may be warranted.

The fully reproducible and automated nature of AirHealthLink's pipeline enables other researchers to replicate our findings, extend this analysis to additional years, economic indicators, or pollutants, and facilitate potential longitudinal or comparative studies.

Section 5

Limitations

Several limitations exist in our analysis. First of all, the analysis is purely observational and cross-sectional: causal inference cannot be drawn from association alone. Confounding variables such as population density, urbanization, industrial composition, racial demographics, or immigration status were not controlled for in this analysis.

Second of all, although the 615 counties with complete data represent a large subset of the more than 3,100 counties in the United States, they by no means represent every county available. FIPS mismatches and ACS data unavailability, such as some counties not possessing daily records, made many counties' data unusable in the study.

Third, using county-level aggregation masks any intra-county disparities in both wealth and pollution exposure. If intra-county disparities were acknowledged through the use of a finer-grain analysis at the census tract or ZIP code level, potentially greater results may have emerged.

Finally, the study uses a single year's data (2022), which may not capture longer-term trends or the effects of episodal or occasional events such as wildfires. A multi-year analysis would considerably strengthen the generalizability of our findings.

References

Works Cited

[1] U.S. Environmental Protection Agency. "Air Quality System (AQS) API." aqs.epa.gov/aqsweb/documents/data_api.html.

[2] U.S. Census Bureau. "American Community Survey (ACS)." census.gov/programs-surveys/acs.

[3] Di, Q., et al. "Air Pollution and Mortality in the Medicare Population." New England Journal of Medicine, 376(26), 2017.

[4] Tessum, C.W., et al. "Inequity in Consumption of Goods and Services Adds to Racial–Ethnic Disparities in Air Pollution Exposure." PNAS, 116(13), 2019.

[5] Colmer, J., et al. "Disparities in PM2.5 Air Pollution in the United States." Science, 369(6503), 2020.

[6] Huber, P.J. Robust Statistics. Wiley, 1981.

[7] Koenker, R., and Bassett, G. "Regression Quantiles." Econometrica, 46(1), 1978.

Explore the Source

Dive into the open-source codebase to replicate, verify, and extend the analysis.