Universe definitions, sample years, weightings, and suppression rules for every headline metric.
Slide 2 — Pillar 1 (replication)
| Metric | Universe |
|---|---|
| Spanish-at-home rate (1980 / 2024 anchors) | All Miami-Dade residents age 5+ (1980, 2024); also native-born age 20-29 cut |
| Geography sensitivity (Miami-Dade County vs 3-county MSA) | Native-born Miami Hispanic 20-29 (county vs broader MSA) |
Slide 3 — Spotlight: Spanish retention across 11 metros, 1980-2024
| Metric | Universe |
|---|---|
| 3rd+-gen Spanish-at-home rate (strict 2-parent rule) | Hispanic adult age 20-35, US-state-born (BPL<100), RELATE=3, GQ∈{0,1,2}, both head AND head's spouse (if present) US-state-born; metro filter per PUMA reconstruction (2020-2024) or METAREA-coterminous-with-principal-county (1980-2010) |
| National 3rd+-gen Spanish rate (reference line) | Same strict rule, nationally pooled (all states) |
Co-residence identification — known limitation, validated. The strict rule requires the young adult to co-reside with a parent (the only way to read parent birthplace from the modern ACS). CPS — which collects parent birthplace directly via MBPL/FBPL — confirms that approximately 66% of true 3rd+-gen Hispanic 20-35 in Miami-Dade live with a parent. The strict rule therefore captures roughly two-thirds of the true population. The cross-metro selection bias is largely uniform, and Miami's selection effect (Sp[strict] − Sp[universe] = −19pp) is smaller than the comparison-metro average (−32pp), so the cross-metro ranking is robust. See pipeline/test_coresidence_check.py and pipeline/test_cps_cross_survey_check.py.
Slide 4 — Modern non-Hispanic immigrant comparison
| Metric | Universe |
|---|---|
| Spanish / Mandarin / Tagalog / Japanese at home, by generation | Adult age 20-35 in target ethnic group (Hispanic / Chinese-RACED=400 / Filipino-RACED=600 / Japanese-RACED=500), strict 2-parent-native rule, 2024 ACS |
Slide 5 — State-level Spanish retention map, 1980-2024
| Metric | Universe |
|---|---|
| State-level 3rd+-gen Spanish-at-home rate | Strict 2-parent-native rule (same as Slide 3), per US state, across 6 samples |
| National rate (sparkline + summary) | PERWT-weighted across all states, same strict rule |
| Cohort-vs-composition shift-share decomposition (method panel) | 36 states with n≥30 in both 1980 and 2024 endpoints |
Slide 6 — Language methodology + bilingualism + sensitivity
| Metric | Universe |
|---|---|
| Panel A: Bilingualism trajectory 1980-2024 | US-born Hispanic adults 25-64 in Miami-Dade who speak Spanish at home |
| Panel B: Spanish-at-home by generation (looser head-only rule) | Hispanic adults age 18+ in Miami-Dade; generation = head-of-household birthplace |
| Panel C: Three-generation-household Spanish | Hispanic adults age 18+ in MD in households spanning 3 generations |
| Panel D: FB rate by years in US | Foreign-born Hispanic adults age 18+ in Miami-Dade |
| Panel E: Cross-metro bilingualism breakdown (2nd + 3rd+ gen, faceted) | Hispanic adults age 20-35 in 11 metros, two parallel universes — (i) 2nd-gen strict rule: US-state-born + RELATE=3 + ≥1 parent foreign-born; (ii) 3rd+-gen strict rule: US-state-born + RELATE=3 + both parents US-state-born |
| Panel F: Intergenerational Spanish-at-home drop (2nd → 3rd+) | Per-metro comparison of the (sp_at_home 2nd-gen, sp_at_home 3rd+-gen) pair |
| Panel G: Sensitivity — 4 identification rules | 3rd+-gen Hispanic in Miami / LA / NYC / Houston under 4 rule variants (strict 20-35, loose head-only 18+, strict 25-40, strict 18-29) |
Slide 7 — Within-Miami predictors + cross-metro replenishment
| Metric | Universe |
|---|---|
| Predictor-spread chart (15 candidate predictors) | All Hispanic adults age 18+ in Miami-Dade, 2024 ACS |
| Spouse identity sub-chart | Married Hispanic adults age 18+ in Miami-Dade, 2024 ACS, with linked spouse via SERIAL/SPLOC |
| Replenishment intensity (X) | Hispanic adults age 18+ in each comparison metro, 2024 ACS |
| 3rd+-gen retention (Y, same as Slide 3) | Strict 2-parent-native rule, age 20-35, per-metro PUMA universe |
| Pearson r (cross-metro X vs Y) | 11 comparison metros; correlations computed (a) across all 11, (b) across 9 non-border metros, (c) across 8 non-border non-Miami |
Slide 8 — Education and jobs (Pillar 3 economic)
| Metric | Universe |
|---|---|
| BA+ rate, US-born Hispanic adults 25+ (5 measures) | Per-metro: native-born Hispanic age 25+; also local NHW reference |
| LFPR age 25-64 | Native-born Hispanic 25-64 |
| Median wage (2024 USD) | Native-born Hispanic 25-64, EMPSTAT=1, INCWAGE>0 and <999998 |
| Mgmt/professional share | Native-born Hispanic 25-64, EMPSTAT=1 |
| Homeownership rate | Native-born Hispanic householders (RELATE=1) 25-64 |
| Non-Cuban naturalization rate | Foreign-born Hispanic adults excluding Cuban-origin (CAA recipients) |
| Within-origin BA+ (chart B) | Native-born Hispanic 25+ stratified by national_origin × metro |
Slide 9 — Six-measure assimilation composite (19 metros)
| Metric | Universe |
|---|---|
| Composite score | 11 major Hispanic-population metros; native-born Hispanic adults per metric's universe. (Slide 14's 19-metro table uses a broader panel because each cell is shown individually; the composite requires all 6 metric cells to exceed n=30 simultaneously, which restricts to the 11-metro panel.) |
| NHW marriage propensity index (referenced on Slide 15, not part of the composite) | Married US-born Hispanic adults 25-64 per metro |
Slide 10 — Mexican-American-only composite
| Metric | Universe |
|---|---|
| Mexican-only 6-measure composite | US-born Mexican-origin Hispanic adults per metric's universe; 11 metros where Mexican-origin cell n≥30 per measure |
Mexican selection caveat: Mexican-Americans living outside the primary Mexican-settlement metros (LA, Houston, El Paso, San Antonio, McAllen) are positively selected on educational attainment by roughly 21pp on BA+ relative to within-primary populations. The within-origin ranking on Slide 10 is robust, but absolute Mexican-American scores in non-primary metros (including Miami) reflect this selection on top of any metro effect.
Slide 11 — Miami Hispanic structural trajectory, 1980-2024
| Metric | Universe |
|---|---|
| 5 structural measures over time (Miami Hispanic only) | US-born Hispanic adults age 25-64 in Miami-Dade (1980-2010 METAREA-coterminous; 2020/2024 PUMA reconstruction), non-GQ, across 6 samples (1980, 1990, 2000, 2010, 2020, 2024) |
Slide 12 — Miami Hispanic vs national NHW convergence, 1980-2024
| Metric | Universe |
|---|---|
| 5 structural measures over time (Miami) | US-born Hispanic adults 25-64 in Miami-Dade, non-GQ, across 6 samples (1980, 1990, 2000, 2010, 2020, 2024) |
| 5 structural measures over time (national NHW) | Non-Hispanic-white adults 25-64 nationwide, non-GQ, across same 6 samples |
Slide 13 (comparison tool) — All metrics
| Metric | Universe |
|---|---|
| 15 paired metrics × (10 origins × 11 metros) | 2024 ACS only; each metric's natural universe (e.g., BA+ on age 25+; marriage on married adults; veteran on age 18+) |
Slide 14 — 5-dimension framing table
| Metric | Universe |
|---|---|
| BA+ rate (Structural) | US-born Hispanic adults 25+ in each of 19 metros, 2024 ACS |
| Mainstream-NHW marriage rate (Marital) | Married US-born Hispanic adults 25-64 |
| "White-only" race answer (Racial-identification) | Hispanic adults 18+ |
| Hispanic-NHW residential dissimilarity (Residential) | All metro residents partitioned by Hispanic / NHW; PUMA as spatial unit |
| Veteran rate (Civic) | US-born Hispanic adults 18+ (default spec includes PR-born as native) |
Veteran-rate sensitivity: Miami's rank drops further (≈8th → 10th of 11 in the major-Hispanic-metro sub-panel) if PR-born are excluded from the native universe. The deck's "below average" framing holds under both specifications.
Slide 15 — Marriage patterns
| Metric | Universe |
|---|---|
| Four-bucket spouse breakdown (cross-metro stacked bars) | Married US-born Hispanic adults 25-64 per metro |
| Hialeah sub-chart | Married US-born Hispanic adults 25-64 in Hialeah PUMAs (vs rest of Miami-Dade) |
Raw mainstream-NHW marriage ranking: Miami ranks 9th of 11. After adjustment for the local NHW share (propensity index = raw rate ÷ metro NHW share), Miami inverts to lead the panel — see Slide 9.
Cross-cutting: data sources, weighting, suppression
| Item | Detail |
|---|---|
| Primary microdata source | IPUMS USA (Ruggles et al., University of Minnesota). Samples used: 1980 5% (198001), 1990 5% (199001), 2000 5% (200001), 2010 ACS 1-year (201005), 2020 ACS 5-year (202003), 2024 ACS 5-year (202403). |
| Cross-validation microdata | IPUMS CPS ASEC pooled 2022-2024 (cps_derived.parquet) — used only for adversarial-defense checks (parent-BPL-based generation identification, cross-survey BA+ check), not for any headline number in the deck. |
| Weighting | All rates PERWT-weighted (population weights); sample sizes reported throughout as unweighted n. |
| Default suppression threshold | n_unweighted ≥ 30 per cell (universal). Slide 5 uses a stricter threshold of n ≥ 50 (soft, muted color) and n ≥ 100 (full color). |
| CPI deflator | CPI-U, BLS series, with year → 2024-USD multipliers: 1980 = 3.916, 1990 = 2.475, 2000 = 1.831, 2010 = 1.443, 2020 = 1.222, 2024 = 1.0. |
| Metro geography | METAREA-coterminous-with-principal-county for the 1980/1990/2000/2010 samples (where METAREA is populated); PUMA reconstruction for 2020/2024 (METAREA suppressed in ACS 5-year). Per-vintage PUMA lists in pipeline/metro_flags.py, verified against contemporary census county populations within ±5%. |
| External validation references | Census QuickFacts (Hispanic share of Miami-Dade); Pew Research Center (national Hispanic foreign-born share, English-fluency by nativity, conversational-Spanish-ability by generation); ACS S1501 (BA+ by Hispanic origin). The deck's headline numbers agree with these external sources within sampling tolerance where the constructs match. |
Cross-cutting: methodological caveats readers should know
| Item | Detail |
|---|---|
| Puerto Rico-born classification (differs by metric type) | The deck's strict 3rd+-gen LANGUAGE rule (Slides 3, 5, 6 Panels E/F/G, 14) excludes PR-born from the "native" universe (BPL < 100; PR-born is BPL=110), because Puerto Rico is itself a Spanish-speaking jurisdiction and counting PR-born grandparents as US-native would conflate "US-experience" with "Spanish-environment-experience" and bias the test toward finding Spanish retention. The deck's structural metrics (BA+, LFPR, wages, mgmt/pro, homeownership, naturalization, military service — Slides 14, 8, 12, 9) include PR-born in the "native" universe (native_headline ∈ {native, pr_born}), because the Spanish/English distinction is not germane to those outcomes. This is a deliberate per-metric choice; readers should not interpret cross-slide differences in "native Hispanic" counts as an error. |
| Standard-error estimation — known limitation | All confidence intervals reported (Wilson CIs on the state map, implied precision in textual deck claims) use PERWT-weighted point estimates with simple binomial variance. The deck does not incorporate ACS replication weights (REPWT) for design-effect-adjusted variance estimation. ACS has a complex stratified clustered sample; the true design-effect inflation is typically 1.5x-3x. Reported precision therefore understates true uncertainty by roughly the square root of the design effect (so ±5pp CIs may actually be ±6-9pp). The deck's qualitative cross-metro rankings are large enough that this gap does not change the substantive findings, but readers comparing borderline-close numbers should be aware that the apparent precision is optimistic. A full replication-weight rerun is a meaningful undertaking and is on the roadmap for any follow-up academic publication. |
| Multiple-comparisons posture | The deck reports many descriptive cross-metro and cross-generation comparisons without applying Bonferroni or false-discovery-rate corrections. Specific individual cells flagged as "interesting" (e.g., the small 2-to-3pp uptick in monolingual-Spanish rate from 2nd to 3rd+ generation in some metros on Slide 6 Panel E) could be sampling noise. The deck's headline findings (Miami leads on 3rd+-gen Spanish-at-home by 25+pp; the bilingualism share is ~95% across major metros for the population that speaks Spanish at home; the national 3rd+-gen Spanish rate fell from 55% to 25%) are large enough that they survive any reasonable correction. Readers are encouraged to focus on the magnitudes of differences rather than on statistical significance per cell. |
| Suppression threshold context (n ≥ 30) | The default n ≥ 30 unweighted threshold is the standard rule-of-thumb for cell suppression in IPUMS-based work. It is not a guarantee of useful precision: at n=30 and p=0.4, the binomial Wilson 95% CI half-width is ±17pp — comfortably wide enough to obscure most ~20pp metro differences. Cells reported at n=30-50 in the comparison tool, the within-origin chart on Slide 8, and the Mexican-only composite on Slide 10 carry wider CIs than the differences they highlight; readers should treat them as descriptive rather than precise. The deck's headline metro × generation cells (Slide 3, Slide 6 Panels E-G) all carry n ≥ 535. The state map (Slide 5) uses stricter tiered thresholds (n ≥ 50 muted; n ≥ 100 full color) for the same reason. |
| Cross-sectional generation comparison | All "2nd vs 3rd+ generation" comparisons in the deck (most prominently Slide 6 Panel F) are cross-sectional: they compare today's 2nd-gen 20-35-year-olds to today's 3rd+-gen 20-35-year-olds. These are different cohorts, born to immigrant ancestors who arrived in different decades. The Miami 3rd+-gen 20-35 cohort today is largely grandchildren of 1960s-70s Cuban arrivals; the 2nd-gen 20-35 cohort is largely children of post-1980 arrivals with a more diverse origin mix. The "intergenerational drop" measured this way includes cohort-composition effects, not just intergenerational language transition within a family. A true within-family longitudinal measure would require linked family records across decades, which are not available in cross-sectional ACS samples. |
| Aggregate (ecological) correlations | The replenishment-vs-retention correlation on Slide 7 is computed across 11 metros (or 9, or 8, depending on the subset). With such small panels, point estimates of Pearson r carry wide confidence intervals — for r=0.58 with n=8, the 95% CI via Fisher z-transformation is approximately (−0.21, 0.91), i.e., not statistically distinguishable from zero. Aggregate-level correlations also can't directly tell us about individual-level mechanisms (the ecological-fallacy risk). The deck treats these correlations as suggestive cross-metro patterns rather than as confirmed mechanisms; readers should do the same. |
Claude Code (Opus 4.7) was used to help build this deck and run portions of the underlying data analysis. I take full responsibility for any errors or oversights.