+ - 0:00:00
Notes for current slide
Notes for next slide

The Four Horsemen
of Validity

Session 6

PMAP 8521: Program evaluation
Andrew Young School of Policy Studies

1 / 43

Plan for today

2 / 43

Plan for today

Construct validity

2 / 43

Plan for today

Construct validity

Statistical conclusion validity

2 / 43

Plan for today

Construct validity

Statistical conclusion validity

Internal validity

2 / 43

Plan for today

Construct validity

Statistical conclusion validity

Internal validity

External validity

2 / 43

Construct validity

3 / 43

A new program hopes to
improve student commitment to school

4 / 43

A new program hopes to
improve student commitment to school

Participants score 200 points higher on the SAT and have a 0.3 higher GPA, on average

4 / 43

A new program hopes to
improve student commitment to school

Participants score 200 points higher on the SAT and have a 0.3 higher GPA, on average

Success!  Success?

4 / 43

The Streetlight Effect

Streetlight
5 / 43

Drunk guy looking for keys in the light of the lamppost instead of over in the bushes where they lost them

Source: https://pxhere.com/en/photo/488829

Construct validity

Are you measuring what you want to measure?

6 / 43

Construct validity

Are you measuring what you want to measure?

Do test scores measure commitment to school?
Teacher performance? Principal skill?

6 / 43

Construct validity

Are you measuring what you want to measure?

Do test scores measure commitment to school?
Teacher performance? Principal skill?

Test scores measure how good kids are at taking tests

6 / 43

Construct validity

Are you measuring what you want to measure?

Do test scores measure commitment to school?
Teacher performance? Principal skill?

Test scores measure how good kids are at taking tests

This is why we spend so much time
on outcome measurement construction!

6 / 43

Statistical conclusion
validity

7 / 43

Statistical conclusion validity

8 / 43

Statistical conclusion validity

Are your statistics correct?

8 / 43

Statistical conclusion validity

Are your statistics correct?

Statistical power

8 / 43

Statistical conclusion validity

Are your statistics correct?

Statistical power

Violated assumptions of statistical tests

8 / 43

Statistical conclusion validity

Are your statistics correct?

Statistical power

Violated assumptions of statistical tests

Fishing and p-hacking

8 / 43

Statistical conclusion validity

Are your statistics correct?

Statistical power

Violated assumptions of statistical tests

Fishing and p-hacking

Spurious statistical significance

8 / 43

Power

A training program causes incomes to rise by $40

Person Group Before After Difference
295 Control 122.09 229.04 106.95
126 Treatment 205.60 199.84 -5.76
400 Control 133.25 130.40 -2.85
94 Treatment 270.11 206.56 -63.54
250 Control 344.37 222.89 -121.49
59 Treatment 312.41 268.06 -44.35
9 / 43

Power

Survey 10 participants

10 / 43

Power

Survey 10 participants

Survey 200 participants

10 / 43

What's the right sample size?

Use a statistical power calculator to
make sure you can potentially detect an effect

Google power calculator
11 / 43

Test assumptions

Every statistical test has certain assumptions

12 / 43

Test assumptions

Every statistical test has certain assumptions

For instance, for OLS:

LinearityHomoscedasticityIndependenceNormality

12 / 43

Test assumptions

Every statistical test has certain assumptions

For instance, for OLS:

LinearityHomoscedasticityIndependenceNormality

Make sure you're doing the stats correctly

12 / 43

Fishing and p-hacking

Wouldn't it be awesome to run thousands of models
with different combinations of variables
until you find coefficients that are statistically significant?

13 / 43

Fishing and p-hacking

Wouldn't it be awesome to run thousands of models
with different combinations of variables
until you find coefficients that are statistically significant?

Don't!

13 / 43

Fishing and p-hacking

Wouldn't it be awesome to run thousands of models
with different combinations of variables
until you find coefficients that are statistically significant?

Don't!

p-hacking
13 / 43

Spurious statistical significance

If p threshold is 0.05 and you measure 20 outcomes,
1 will likely show correlation by chance

14 / 43

Spurious statistical significance

If p threshold is 0.05 and you measure 20 outcomes,
1 will likely show correlation by chance

xkcd: significance
14 / 43

Internal validity

15 / 43

Internal validity

16 / 43

Internal validity

Omitted variable bias

Selection Attrition

16 / 43

Internal validity

Omitted variable bias

Selection Attrition

Trends

Maturation Secular trends Seasonality Testing Regression

16 / 43

Internal validity

Omitted variable bias

Selection Attrition

Trends

Maturation Secular trends Seasonality Testing Regression

Study calibration

Measurement error

Time frame

16 / 43

Internal validity

Omitted variable bias

Selection Attrition

Trends

Maturation Secular trends Seasonality Testing Regression

Study calibration

Measurement error

Time frame

Contamination

Hawthorne John Henry

Spillovers Intervening events

16 / 43

Selection

If people can choose to enroll in a
program, those who enroll will be
different from those who do not

17 / 43

Selection

If people can choose to enroll in a
program, those who enroll will be
different from those who do not

How to fix

Randomization into
treatment and control groups

17 / 43

Selection

If people can choose when to
enroll in a program, time might
influence the result

18 / 43

Selection

If people can choose when to
enroll in a program, time might
influence the result

How to fix

Shift time around

18 / 43
19 / 43

(happier people more likely to get married, so without randomly assigning marriage how would you study the impact of marriage on happiness?). They use a simple approach - since happiness varies over time set marriage equal to time zero and build a pre-post design around it. You essentially leverage the within-group variance and iron out across-age differences because of the varying ages of marriage. The whole insight it to change the time-line from calendar years to program years.

https://doi.org/10.1016/j.socec.2005.11.043

20 / 43
21 / 43
22 / 43

Attrition

If the people who leave a program or
study are different than those who stay,
the effects will be biased

23 / 43

Attrition

If the people who leave a program or
study are different than those who stay,
the effects will be biased

How to fix

Check characteristics of those
who stay and those who leave

23 / 43

Fake microfinance program results

ID Increase in income Remained in program
1 $3.00 Yes
2 $3.50 Yes
3 $2.00 Yes
4 $1.50 No
5 $1.00 No

ATE with
attriters = $2.20

24 / 43

Fake microfinance program results

ID Increase in income Remained in program
1 $3.00 Yes
2 $3.50 Yes
3 $2.00 Yes
4 $1.50 No
5 $1.00 No

ATE with
attriters = $2.20

ATE without
attriters = $2.83

24 / 43

Maturation

Growth is expected naturally

e.g. programs targeted at childhood development
contend with the fact that children develop on their own too

25 / 43

Maturation

Growth is expected naturally

e.g. programs targeted at childhood development
contend with the fact that children develop on their own too

How to fix

Use a comparison group to remove the trend

25 / 43
26 / 43

Secular trends

Patterns in data happen
because of larger global processes

RecessionsCultural shiftsMarriage equality

27 / 43

Secular trends

Patterns in data happen
because of larger global processes

RecessionsCultural shiftsMarriage equality

How to fix

Use a comparison group to remove the trend

27 / 43

Seasonal trends

Patterns in data happen because of
regular time-based trends

28 / 43

Seasonal trends

Patterns in data happen because of
regular time-based trends

How to fix

Compare observations from same time period
or use yearly/monthly averages

28 / 43

29 / 43

Testing

Repeated exposure to questions or tasks
will make people improve naturally

30 / 43

Testing

Repeated exposure to questions or tasks
will make people improve naturally

How to fix

Change tests, maybe don't offer pre-tests,
use a control group that receives the test

30 / 43

Regression to the mean

People in the extreme have a tendency to
become less extreme over time

LuckCrime and terrorismHot hand effect

31 / 43

Regression to the mean

People in the extreme have a tendency to
become less extreme over time

LuckCrime and terrorismHot hand effect

How to fix

Don't select super high or
super low performers

31 / 43

This isn’t because the universe trends toward some average; an extreme value is because of systematic and random extremes, which are rare. Luck goes away

Measurement error

Measuring the outcome incorrectly
will bias the effect

32 / 43

Measurement error

Measuring the outcome incorrectly
will bias the effect

How to fix

Measure the outcome well

32 / 43

Time frame

If the study is too short, the effect might not
be detectable yet; if the study is too long,
attrition becomes a problem

33 / 43

Time frame

If the study is too short, the effect might not
be detectable yet; if the study is too long,
attrition becomes a problem

How to fix

Use prior knowledge about the thing
you're studying to choose the right length

33 / 43

Hawthorne effect

Observing people makes them behave differently

34 / 43

Hawthorne effect

Observing people makes them behave differently

How to fix

Hide? Use completely unobserved control groups

34 / 43

Experiments in 1924-1932 at Hawthorne Works

John Henry effect

Control group works hard to prove
they're as good as the treatment group

35 / 43

John Henry effect

Control group works hard to prove
they're as good as the treatment group

How to fix

Keep two groups separate

35 / 43

Spillover effect

Control groups naturally pick up
what the treatment group is getting

ExternalitiesSocial interactionEquilibrium effects

36 / 43

Spillover effect

Control groups naturally pick up
what the treatment group is getting

ExternalitiesSocial interactionEquilibrium effects

How to fix

Keep two groups separate;
use distant control groups

36 / 43

Intervening events

Something happens that affects one of
the groups and not the other

37 / 43

Intervening events

Something happens that affects one of
the groups and not the other

How to fix

🤷‍♂️

37 / 43

Internal validity

Omitted variable bias

Selection Attrition

Trends

Maturation Secular trends Seasonality Testing Regression

Study calibration

Measurement error

Time frame

Contamination

Hawthorne John Henry

Spillovers Intervening events

38 / 43

Fixing internal validity

Randomization fixes a host of issues

SelectionMaturationRegression to the mean

39 / 43

Fixing internal validity

Randomization fixes a host of issues

SelectionMaturationRegression to the mean

Randomization doesn't fix everything!

AttritionContaminationMeasurement

39 / 43

External validity

40 / 43

Generalizability

Are your findings generalizable
to the whole population?

41 / 43

Generalizability

Are your findings generalizable
to the whole population?

Hospital lights increase risk of dying
41 / 43

Generalizability

Are your findings generalizable
to the whole population?

Hospital lights increase risk of dying
…in mice
41 / 43

Lab conditions vs. real world

Study volunteers are weird

42 / 43

Lab conditions vs. real world

Study volunteers are weird

Western, educated, from industrialized,
rich, and democratic countries

42 / 43

Lab conditions vs. real world

Study volunteers are weird

Western, educated, from industrialized,
rich, and democratic countries

Not everyone takes surveys

42 / 43

Lab conditions vs. real world

Study volunteers are weird

Western, educated, from industrialized,
rich, and democratic countries

Not everyone takes surveys

Online surveysAmazon Mechanical TurkRandom digit dialing

42 / 43

Different settings and circumstances

43 / 43

Different settings and circumstances

Does a study in one state
apply to other states?

43 / 43

Different settings and circumstances

Does a study in one state
apply to other states?

Does the effect from a mosquito net trial
in Eritrea transfer to Bolivia?

43 / 43

Plan for today

2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow