← Back to deck
METHODOLOGY APPENDIX

Universe definitions, sample years, weightings, and suppression rules for every headline metric.

Slide 2 — Pillar 1 (replication)

MetricUniverse
Spanish-at-home rate (1980 / 2024 anchors) All Miami-Dade residents age 5+ (1980, 2024); also native-born age 20-29 cut
Geography sensitivity (Miami-Dade County vs 3-county MSA) Native-born Miami Hispanic 20-29 (county vs broader MSA)

Slide 3 — Spotlight: Spanish retention across 11 metros, 1980-2024

MetricUniverse
3rd+-gen Spanish-at-home rate (strict 2-parent rule) Hispanic adult age 20-35, US-state-born (BPL<100), RELATE=3, GQ∈{0,1,2}, both head AND head's spouse (if present) US-state-born; metro filter per PUMA reconstruction (2020-2024) or METAREA-coterminous-with-principal-county (1980-2010)
National 3rd+-gen Spanish rate (reference line) Same strict rule, nationally pooled (all states)

Co-residence identification — known limitation, validated. The strict rule requires the young adult to co-reside with a parent (the only way to read parent birthplace from the modern ACS). CPS — which collects parent birthplace directly via MBPL/FBPL — confirms that approximately 66% of true 3rd+-gen Hispanic 20-35 in Miami-Dade live with a parent. The strict rule therefore captures roughly two-thirds of the true population. The cross-metro selection bias is largely uniform, and Miami's selection effect (Sp[strict] − Sp[universe] = −19pp) is smaller than the comparison-metro average (−32pp), so the cross-metro ranking is robust. See pipeline/test_coresidence_check.py and pipeline/test_cps_cross_survey_check.py.

Slide 4 — Modern non-Hispanic immigrant comparison

MetricUniverse
Spanish / Mandarin / Tagalog / Japanese at home, by generation Adult age 20-35 in target ethnic group (Hispanic / Chinese-RACED=400 / Filipino-RACED=600 / Japanese-RACED=500), strict 2-parent-native rule, 2024 ACS

Slide 5 — State-level Spanish retention map, 1980-2024

MetricUniverse
State-level 3rd+-gen Spanish-at-home rate Strict 2-parent-native rule (same as Slide 3), per US state, across 6 samples
National rate (sparkline + summary) PERWT-weighted across all states, same strict rule
Cohort-vs-composition shift-share decomposition (method panel) 36 states with n≥30 in both 1980 and 2024 endpoints

Slide 6 — Language methodology + bilingualism + sensitivity

MetricUniverse
Panel A: Bilingualism trajectory 1980-2024 US-born Hispanic adults 25-64 in Miami-Dade who speak Spanish at home
Panel B: Spanish-at-home by generation (looser head-only rule) Hispanic adults age 18+ in Miami-Dade; generation = head-of-household birthplace
Panel C: Three-generation-household Spanish Hispanic adults age 18+ in MD in households spanning 3 generations
Panel D: FB rate by years in US Foreign-born Hispanic adults age 18+ in Miami-Dade
Panel E: Cross-metro bilingualism breakdown (2nd + 3rd+ gen, faceted) Hispanic adults age 20-35 in 11 metros, two parallel universes — (i) 2nd-gen strict rule: US-state-born + RELATE=3 + ≥1 parent foreign-born; (ii) 3rd+-gen strict rule: US-state-born + RELATE=3 + both parents US-state-born
Panel F: Intergenerational Spanish-at-home drop (2nd → 3rd+) Per-metro comparison of the (sp_at_home 2nd-gen, sp_at_home 3rd+-gen) pair
Panel G: Sensitivity — 4 identification rules 3rd+-gen Hispanic in Miami / LA / NYC / Houston under 4 rule variants (strict 20-35, loose head-only 18+, strict 25-40, strict 18-29)

Slide 7 — Within-Miami predictors + cross-metro replenishment

MetricUniverse
Predictor-spread chart (15 candidate predictors) All Hispanic adults age 18+ in Miami-Dade, 2024 ACS
Spouse identity sub-chart Married Hispanic adults age 18+ in Miami-Dade, 2024 ACS, with linked spouse via SERIAL/SPLOC
Replenishment intensity (X) Hispanic adults age 18+ in each comparison metro, 2024 ACS
3rd+-gen retention (Y, same as Slide 3) Strict 2-parent-native rule, age 20-35, per-metro PUMA universe
Pearson r (cross-metro X vs Y) 11 comparison metros; correlations computed (a) across all 11, (b) across 9 non-border metros, (c) across 8 non-border non-Miami

Slide 8 — Education and jobs (Pillar 3 economic)

MetricUniverse
BA+ rate, US-born Hispanic adults 25+ (5 measures) Per-metro: native-born Hispanic age 25+; also local NHW reference
LFPR age 25-64 Native-born Hispanic 25-64
Median wage (2024 USD) Native-born Hispanic 25-64, EMPSTAT=1, INCWAGE>0 and <999998
Mgmt/professional share Native-born Hispanic 25-64, EMPSTAT=1
Homeownership rate Native-born Hispanic householders (RELATE=1) 25-64
Non-Cuban naturalization rate Foreign-born Hispanic adults excluding Cuban-origin (CAA recipients)
Within-origin BA+ (chart B) Native-born Hispanic 25+ stratified by national_origin × metro

Slide 9 — Six-measure assimilation composite (19 metros)

MetricUniverse
Composite score 11 major Hispanic-population metros; native-born Hispanic adults per metric's universe. (Slide 14's 19-metro table uses a broader panel because each cell is shown individually; the composite requires all 6 metric cells to exceed n=30 simultaneously, which restricts to the 11-metro panel.)
NHW marriage propensity index (referenced on Slide 15, not part of the composite) Married US-born Hispanic adults 25-64 per metro

Slide 10 — Mexican-American-only composite

MetricUniverse
Mexican-only 6-measure composite US-born Mexican-origin Hispanic adults per metric's universe; 11 metros where Mexican-origin cell n≥30 per measure

Mexican selection caveat: Mexican-Americans living outside the primary Mexican-settlement metros (LA, Houston, El Paso, San Antonio, McAllen) are positively selected on educational attainment by roughly 21pp on BA+ relative to within-primary populations. The within-origin ranking on Slide 10 is robust, but absolute Mexican-American scores in non-primary metros (including Miami) reflect this selection on top of any metro effect.

Slide 11 — Miami Hispanic structural trajectory, 1980-2024

MetricUniverse
5 structural measures over time (Miami Hispanic only) US-born Hispanic adults age 25-64 in Miami-Dade (1980-2010 METAREA-coterminous; 2020/2024 PUMA reconstruction), non-GQ, across 6 samples (1980, 1990, 2000, 2010, 2020, 2024)

Slide 12 — Miami Hispanic vs national NHW convergence, 1980-2024

MetricUniverse
5 structural measures over time (Miami) US-born Hispanic adults 25-64 in Miami-Dade, non-GQ, across 6 samples (1980, 1990, 2000, 2010, 2020, 2024)
5 structural measures over time (national NHW) Non-Hispanic-white adults 25-64 nationwide, non-GQ, across same 6 samples

Slide 13 (comparison tool) — All metrics

MetricUniverse
15 paired metrics × (10 origins × 11 metros) 2024 ACS only; each metric's natural universe (e.g., BA+ on age 25+; marriage on married adults; veteran on age 18+)

Slide 14 — 5-dimension framing table

MetricUniverse
BA+ rate (Structural) US-born Hispanic adults 25+ in each of 19 metros, 2024 ACS
Mainstream-NHW marriage rate (Marital) Married US-born Hispanic adults 25-64
"White-only" race answer (Racial-identification) Hispanic adults 18+
Hispanic-NHW residential dissimilarity (Residential) All metro residents partitioned by Hispanic / NHW; PUMA as spatial unit
Veteran rate (Civic) US-born Hispanic adults 18+ (default spec includes PR-born as native)

Veteran-rate sensitivity: Miami's rank drops further (≈8th → 10th of 11 in the major-Hispanic-metro sub-panel) if PR-born are excluded from the native universe. The deck's "below average" framing holds under both specifications.

Slide 15 — Marriage patterns

MetricUniverse
Four-bucket spouse breakdown (cross-metro stacked bars) Married US-born Hispanic adults 25-64 per metro
Hialeah sub-chart Married US-born Hispanic adults 25-64 in Hialeah PUMAs (vs rest of Miami-Dade)

Raw mainstream-NHW marriage ranking: Miami ranks 9th of 11. After adjustment for the local NHW share (propensity index = raw rate ÷ metro NHW share), Miami inverts to lead the panel — see Slide 9.

Cross-cutting: data sources, weighting, suppression

ItemDetail
Primary microdata source IPUMS USA (Ruggles et al., University of Minnesota). Samples used: 1980 5% (198001), 1990 5% (199001), 2000 5% (200001), 2010 ACS 1-year (201005), 2020 ACS 5-year (202003), 2024 ACS 5-year (202403).
Cross-validation microdata IPUMS CPS ASEC pooled 2022-2024 (cps_derived.parquet) — used only for adversarial-defense checks (parent-BPL-based generation identification, cross-survey BA+ check), not for any headline number in the deck.
Weighting All rates PERWT-weighted (population weights); sample sizes reported throughout as unweighted n.
Default suppression threshold n_unweighted ≥ 30 per cell (universal). Slide 5 uses a stricter threshold of n ≥ 50 (soft, muted color) and n ≥ 100 (full color).
CPI deflator CPI-U, BLS series, with year → 2024-USD multipliers: 1980 = 3.916, 1990 = 2.475, 2000 = 1.831, 2010 = 1.443, 2020 = 1.222, 2024 = 1.0.
Metro geography METAREA-coterminous-with-principal-county for the 1980/1990/2000/2010 samples (where METAREA is populated); PUMA reconstruction for 2020/2024 (METAREA suppressed in ACS 5-year). Per-vintage PUMA lists in pipeline/metro_flags.py, verified against contemporary census county populations within ±5%.
External validation references Census QuickFacts (Hispanic share of Miami-Dade); Pew Research Center (national Hispanic foreign-born share, English-fluency by nativity, conversational-Spanish-ability by generation); ACS S1501 (BA+ by Hispanic origin). The deck's headline numbers agree with these external sources within sampling tolerance where the constructs match.

Cross-cutting: methodological caveats readers should know

ItemDetail
Puerto Rico-born classification (differs by metric type) The deck's strict 3rd+-gen LANGUAGE rule (Slides 3, 5, 6 Panels E/F/G, 14) excludes PR-born from the "native" universe (BPL < 100; PR-born is BPL=110), because Puerto Rico is itself a Spanish-speaking jurisdiction and counting PR-born grandparents as US-native would conflate "US-experience" with "Spanish-environment-experience" and bias the test toward finding Spanish retention. The deck's structural metrics (BA+, LFPR, wages, mgmt/pro, homeownership, naturalization, military service — Slides 14, 8, 12, 9) include PR-born in the "native" universe (native_headline ∈ {native, pr_born}), because the Spanish/English distinction is not germane to those outcomes. This is a deliberate per-metric choice; readers should not interpret cross-slide differences in "native Hispanic" counts as an error.
Standard-error estimation — known limitation All confidence intervals reported (Wilson CIs on the state map, implied precision in textual deck claims) use PERWT-weighted point estimates with simple binomial variance. The deck does not incorporate ACS replication weights (REPWT) for design-effect-adjusted variance estimation. ACS has a complex stratified clustered sample; the true design-effect inflation is typically 1.5x-3x. Reported precision therefore understates true uncertainty by roughly the square root of the design effect (so ±5pp CIs may actually be ±6-9pp). The deck's qualitative cross-metro rankings are large enough that this gap does not change the substantive findings, but readers comparing borderline-close numbers should be aware that the apparent precision is optimistic. A full replication-weight rerun is a meaningful undertaking and is on the roadmap for any follow-up academic publication.
Multiple-comparisons posture The deck reports many descriptive cross-metro and cross-generation comparisons without applying Bonferroni or false-discovery-rate corrections. Specific individual cells flagged as "interesting" (e.g., the small 2-to-3pp uptick in monolingual-Spanish rate from 2nd to 3rd+ generation in some metros on Slide 6 Panel E) could be sampling noise. The deck's headline findings (Miami leads on 3rd+-gen Spanish-at-home by 25+pp; the bilingualism share is ~95% across major metros for the population that speaks Spanish at home; the national 3rd+-gen Spanish rate fell from 55% to 25%) are large enough that they survive any reasonable correction. Readers are encouraged to focus on the magnitudes of differences rather than on statistical significance per cell.
Suppression threshold context (n ≥ 30) The default n ≥ 30 unweighted threshold is the standard rule-of-thumb for cell suppression in IPUMS-based work. It is not a guarantee of useful precision: at n=30 and p=0.4, the binomial Wilson 95% CI half-width is ±17pp — comfortably wide enough to obscure most ~20pp metro differences. Cells reported at n=30-50 in the comparison tool, the within-origin chart on Slide 8, and the Mexican-only composite on Slide 10 carry wider CIs than the differences they highlight; readers should treat them as descriptive rather than precise. The deck's headline metro × generation cells (Slide 3, Slide 6 Panels E-G) all carry n ≥ 535. The state map (Slide 5) uses stricter tiered thresholds (n ≥ 50 muted; n ≥ 100 full color) for the same reason.
Cross-sectional generation comparison All "2nd vs 3rd+ generation" comparisons in the deck (most prominently Slide 6 Panel F) are cross-sectional: they compare today's 2nd-gen 20-35-year-olds to today's 3rd+-gen 20-35-year-olds. These are different cohorts, born to immigrant ancestors who arrived in different decades. The Miami 3rd+-gen 20-35 cohort today is largely grandchildren of 1960s-70s Cuban arrivals; the 2nd-gen 20-35 cohort is largely children of post-1980 arrivals with a more diverse origin mix. The "intergenerational drop" measured this way includes cohort-composition effects, not just intergenerational language transition within a family. A true within-family longitudinal measure would require linked family records across decades, which are not available in cross-sectional ACS samples.
Aggregate (ecological) correlations The replenishment-vs-retention correlation on Slide 7 is computed across 11 metros (or 9, or 8, depending on the subset). With such small panels, point estimates of Pearson r carry wide confidence intervals — for r=0.58 with n=8, the 95% CI via Fisher z-transformation is approximately (−0.21, 0.91), i.e., not statistically distinguishable from zero. Aggregate-level correlations also can't directly tell us about individual-level mechanisms (the ecological-fallacy risk). The deck treats these correlations as suggestive cross-metro patterns rather than as confirmed mechanisms; readers should do the same.

Claude Code (Opus 4.7) was used to help build this deck and run portions of the underlying data analysis. I take full responsibility for any errors or oversights.