Kicking off...

Sughosh Dixit
Sughosh P Dixit
2025-11-139 min read

Day 13 — Stratified Sampling: The Smart Way to Sample

Day 13 of 30
Continue from your last DS day in Learning Path
Article Header Image

TL;DR

Quick summary

Stratified sampling divides your population into groups and samples from each separately, guaranteeing coverage of important subgroups and dramatically reducing variance. Learn proportional, equal, and Neyman allocation strategies to maximize precision.

Key takeaways
  • Day 13 — Stratified Sampling: The Smart Way to Sample
Preview

Day 13 — Stratified Sampling: The Smart Way to Sample

Stratified sampling divides your population into groups and samples from each separately, guaranteeing coverage of important subgroups and dramatically reducing variance. Learn proportional, equal, and Neyman allocation strategies to maximize precision.

Day 13 — Stratified Sampling: The Smart Way to Sample

Divide and conquer your sampling strategy for maximum precision.

Stratified sampling guarantees coverage of important subgroups while reducing variance by 50-95% compared to simple random sampling.


The Random Sampling Trap

Imagine you're conducting a health survey in a company of 1,000 employees:

  • 900 office workers (90%)

  • 100 executives (10%)

You randomly sample 100 people. Here's what can go wrong:

Unlucky Sample #1:

Office workers: 95 people

Executives: 5 people

Problem: Only 5 executives - can't say much about this group!

Unlucky Sample #2:

Office workers: 87 people

Executives: 13 people

Different from reality (90/10 split)!

Unlucky Sample #3:

Office workers: 100 people

Executives: 0 people

Complete miss on executive health!

The problem: Simple Random Sampling (SRS) is... well, random!

The solution: Stratified Sampling - sample smartly within groups!


What is Stratified Sampling?

Stratified Sampling means:

  1. Divide population into non-overlapping groups (strata)

  2. Sample from each stratum separately

  3. Combine results with proper weighting

Visual Comparison

Simple Random Sampling (SRS):

Show code (10 lines)
Population:  (office workers)

(executive)

Random sample of 10:

Picked:

Result: All office workers!

Stratified Sampling:

Show code (14 lines)
Population:

Stratum 1:  (90 office workers)

Stratum 2:  (10 executives)

Stratified sample of 10:

From Stratum 1:  (9 people)

From Stratum 2:  (1 person)

Result: Proper representation!

Why Stratify? Three Big Reasons

1. Guaranteed Coverage

Problem with SRS: Might miss rare but important groups

Example:

Show code (16 lines)
City population:

- Urban: 70%

- Suburban: 20%

- Rural: 10%

SRS of 100 might give:

Urban: 65, Suburban: 25, Rural: 10

OR

Urban: 75, Suburban: 22, Rural: 3  ← Rural underrepresented!

Stratified solution:

Show code (10 lines)
Explicitly sample from each:

Urban: 70 people (guaranteed)

Suburban: 20 people (guaranteed)

Rural: 10 people (guaranteed)

Coverage ensured!

2. Variance Reduction

The Math Intuition:

Variance comes from differences:

  • Between-stratum variance: How different are the groups?

  • Within-stratum variance: How different are people within each group?

The core idea: If strata are homogeneous (similar within), stratified sampling has lower variance than SRS!

Visual:

Show code (22 lines)
POPULATION (high variance):

Health scores: 45, 48, 50, 52, 85, 87, 88, 90, 91, 92

↑_________↑  ↑___________________↑

Office       Executives

(lower)      (higher)

Within-stratum variance:

Office: σ² = 6.5 (people similar)

Executive: σ² = 7.8 (people similar)

But between-stratum difference is HUGE (50 vs 90)!

SRS estimates affected by this big gap.

Stratified sampling accounts for it separately!

3. Domain Insights

SRS result:

"Average health score: 75"

Okay... but tells us nothing about groups!

Stratified result:

"Average health scores:

Office workers: 52 (95% CI: 50-54)

Executives: 88 (95% CI: 86-90)"

Rich insights about each segment!

The Math: How Much Better Is It?

Variance Formula

Simple Random Sampling variance:

Show code (12 lines)
Var(ȳ_SRS) = σ²/n × (N-n)/N

Where:

- σ² = overall population variance

- n = sample size

- N = population size

- (N-n)/N = finite population correction

Stratified Sampling variance:

Show code (12 lines)
Var(ȳ_strat) = Σ(Wₕ² × σₕ²/nₕ × (Nₕ-nₕ)/Nₕ)

Where:

- Wₕ = stratum h weight (Nₕ/N)

- σₕ² = variance within stratum h

- nₕ = sample size in stratum h

- Nₕ = population size in stratum h

The Variance Reduction:

Var(ȳ_SRS) - Var(ȳ_strat) = Σ Wₕ(μₕ - μ)²

This is the between-stratum variance!

Translation: The more different your strata are, the bigger the variance reduction!

Example Calculation

Population:

  • Stratum 1 (Office): N₁ = 900, μ₁ = 50, σ₁² = 100

  • Stratum 2 (Executive): N₂ = 100, μ₂ = 90, σ₂² = 64

  • Total: N = 1000

Sample: n = 100

Proportional allocation:

  • n₁ = 90 (90% of sample)

  • n₂ = 10 (10% of sample)

SRS Variance:

First, calculate overall variance:

Show code (12 lines)
μ = 0.9(50) + 0.1(90) = 45 + 9 = 54

σ² = 0.9(100 + (50-54)²) + 0.1(64 + (90-54)²)

= 0.9(100 + 16) + 0.1(64 + 1296)

= 0.9(116) + 0.1(1360)

= 104.4 + 136

= 240.4
Var(ȳ_SRS) = 240.4/100 × (1000-100)/1000

= 2.404 × 0.9

= 2.16

Standard error: √2.16 = 1.47

Stratified Variance:

Show code (16 lines)
W₁ = 900/1000 = 0.9

W₂ = 100/1000 = 0.1

Var(ȳ_strat) = 0.9² × (100/90) × (900-90)/900

+ 0.1² × (64/10) × (100-10)/100

= 0.81 × 1.11 × 0.9

+ 0.01 × 6.4 × 0.9

= 0.81 + 0.058

= 0.87

Standard error: √0.87 = 0.93

The Improvement:

Show code (10 lines)
Variance reduction: 2.16 - 0.87 = 1.29 (60% reduction! )

Standard error:

SRS: 1.47

Stratified: 0.93

Stratified is 58% more precise!

Translation: To get the same precision with SRS, you'd need 2.5× more samples!


Allocation Strategies: How Many Per Stratum?

Once you decide to stratify, how do you divide your sample across strata?

1. Proportional Allocation (Most Common)

Rule: Sample proportionally to stratum size

nₕ = n × (Nₕ/N)

Example:

Population: 900 office, 100 executive (1000 total)

Sample size: n = 100

Office sample: 100 × (900/1000) = 90

Executive sample: 100 × (100/1000) = 10

Pros:

  • Simple, intuitive

  • Self-weighting (no complex weights needed)

  • Represents population structure

Cons:

  • Small strata get small samples (might be imprecise)

2. Equal Allocation 🟰

Rule: Same sample size for each stratum

nₕ = n / H

Where H = number of strata

Example:

Show code (10 lines)
Population: 900 office, 100 executive

Sample size: n = 100

Strata: H = 2

Office sample: 100/2 = 50

Executive sample: 100/2 = 50

Pros:

  • Good for comparing strata (equal precision)

  • Ensures small strata have enough data

Cons:

  • Oversamples small strata (need complex weights)

  • Less efficient for overall mean estimation

3. Neyman Allocation (Optimal)

Rule: Allocate proportional to stratum size AND variance

nₕ = n × (Nₕ × σₕ) / Σ(Nₖ × σₖ)

Intuition: Sample more from:

  • Large strata (more people → more important)

  • High-variance strata (more diverse → need more samples)

Example:

Show code (14 lines)
Stratum 1: N₁ = 900, σ₁ = 10

Stratum 2: N₂ = 100, σ₂ = 8

Stratum 1 weight: 900 × 10 = 9,000

Stratum 2 weight: 100 × 8 = 800

Total weight: 9,800

Office sample: 100 × (9000/9800) = 91.8 ≈ 92

Executive sample: 100 × (800/9800) = 8.2 ≈ 8

Pros:

  • Mathematically optimal (minimizes variance!)

  • Accounts for both size and heterogeneity

Cons:

  • Requires knowing σₕ in advance (often unknown!)

  • Might still undersample important small strata

4. Optimal Allocation with Cost

Rule: Account for different sampling costs per stratum

nₕ = n × (Nₕ × σₕ / √cₕ) / Σ(Nₖ × σₖ / √cₖ)

Where cₕ = cost to sample one unit from stratum h

Example:

Executives cost 5× more to survey (busy, need incentives)

c₁ = $10 (office worker)

c₂ = $50 (executive)

This would reduce executive sample further!

Use when: Budget constrained, different costs per stratum


Visual: Variance vs Allocation

Let's see how variance changes with different allocations:

Show code (24 lines)
Variance (SE²)

3.0

•  SRS

2.5

2.0          • Equal

1.5

1.0                 • Proportional

0.5                          • Neyman

(Optimal!)

0.0

Different Allocation Strategies

Lower is better!

Takeaway: Neyman always wins (if you know the variances)!


Defining Strata: The Art and Science

Good strata are:

1. Mutually Exclusive

Each unit belongs to exactly one stratum

Bad: "Young", "Students"

(Young students counted twice!)

Good: "Student", "Non-Student"

2. Exhaustive

Every unit belongs to some stratum

Bad: "<30", "40-60", ">60"

(Missing 30-40 age range!)

Good: "<30", "30-40", "40-60", ">60"

3. Homogeneous Within 🟰

Units within stratum are similar

Bad stratum: "People" (too diverse!)

Good stratum: "Female doctors aged 40-50"

4. Heterogeneous Between

Strata are different from each other

Bad: "Age 30-40", "Age 31-41"

(Too much overlap, not distinct!)

Good: "Age 18-30", "Age 31-50", "Age 51+"

5. Meaningful

Based on domain knowledge, not arbitrary

Bad: "First 500 rows", "Last 500 rows"

(Arbitrary split!)

Good: "Urban", "Suburban", "Rural"

(Meaningful demographic divisions)

Common Stratification Variables:

Demographics:

  • Age groups

  • Gender

  • Education level

  • Income brackets

  • Geographic region

Business:

  • Customer segments (high/medium/low value)

  • Product categories

  • Time periods (Q1, Q2, Q3, Q4)

Medical:

  • Disease severity (mild/moderate/severe)

  • Treatment type

  • Risk factors present/absent


Wrapping Up

Stratified sampling is the "divide and conquer" of sampling:

Key Concepts:

Strata = non-overlapping, exhaustive groups

Proportional allocation = sample proportionally (simple, self-weighting)

Neyman allocation = optimal (proportional to Nₕ × σₕ)

Variance reduction = can be 50-95% lower than SRS!

Coverage guarantee = ensures rare groups included

Domain insights = separate estimates per stratum

The Math Win:

Variance reduction = Σ Wₕ(μₕ - μ)²

Translation: The more different your strata,

the bigger the improvement!

Allocation Decision Tree:

Show code (10 lines)
Do you know σₕ for each stratum?

Yes → Use Neyman (optimal!)

No → Do you need equal precision per stratum?

Yes → Use Equal allocation

No → Use Proportional (simplest)

Real Impact:

In our exercise, stratified sampling gave 16× more precision than SRS with the same sample size. That's like getting 129 samples for the price of 8!


Where This Shows Up in Practice

  • Data Pipelines: Ensuring high-quality filtering and robust statistical metrics before feeding downstream ML models.
  • Production Anomaly Detection: Tracking system logs, performance latencies, or transaction volumes under heavy skew.
  • A/B Testing & Evaluation: Correctly partitioning user cohorts or comparing treatment outcomes without normal distribution assumptions.

References

  1. Cochran, W. G. (1977). Sampling Techniques (3rd ed.). John Wiley & Sons.

  2. Lohr, S. L. (2019). Sampling: Design and Analysis (3rd ed.). Chapman and Hall/CRC.

  3. Kish, L. (1965). Survey Sampling. John Wiley & Sons.

  4. Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558-625.

  5. Särndal, C. E., Swensson, B., & Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag.

  6. Thompson, S. K. (2012). Sampling (3rd ed.). John Wiley & Sons.

  7. Valliant, R., Dever, J. A., & Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer.

  8. Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey Methodology (2nd ed.). John Wiley & Sons.

  9. Little, R. J., & Rubin, D. B. (2019). Statistical Analysis with Missing Data (3rd ed.). John Wiley & Sons.

Sughosh P Dixit
Sughosh P Dixit
Data Scientist & Tech Writer
Next

Join the Newsletter

Get notified when new deep dives and essays are published. No spam.

9 min read

Discussion