Are the two-way fixed-effects (TWFE) models wrong for difference-in-difference (DiD) data with heterogeneous treatment effects?
Note: the post was originally published on another website on 3 September 2022.
I recently came across this very good resource on difference-in-difference (DiD) design. In short, the standard two-way fixed-effects (TWFE) models may give wrong results in the presence of heterogeneous treatment effects and hence new methods are required, and the website reviewed a range of such methods implemented in various statistical packages.
But are the TWFE models really wrong? Let’s take the example documented in the section “So where do TWFE regressions go wrong?” (this is an archived version which may display math notations incorrectly; for the current version see here). The aim is to estimate the treatment effect for the data as visualised as follows:
As shown on the website, if we run a TWFE model we will have the following results (D is the treatment indicator):
. xtreg Y i.t D, fe
Fixed-effects (within) regression Number of obs = 1,800
Group variable: id Number of groups = 30
R-squared: Obs per group:
Within = 0.7320 min = 60
Between = 0.8104 avg = 60.0
Overall = 0.3604 max = 60
F(60,1710) = 77.83
corr(u_i, Xb) = -0.1062 Prob > F = 0.0000
------------------------------------------------------------------------------
Y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
t |
2 | 1 10.25912 0.10 0.922 -19.12175 21.12175
3 | 2 10.25912 0.19 0.845 -18.12175 22.12175
(... other t dummies omitted)
60 | 242.4984 10.79994 22.45 0.000 221.3159 263.6809
|
D | -25.93176 3.374793 -7.68 0.000 -32.55092 -19.3126
_cons | 16.5 7.254293 2.27 0.023 2.271776 30.72822
-------------+----------------------------------------------------------------
sigma_u | 67.425362
sigma_e | 39.733399
rho | .74224277 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(29, 1710) = 97.21 Prob > F = 0.0000
It looks as if the result is wrong: the treatment effects are clearly positive (the slopes increase for all lines) but what we got is a negative coefficient (-25.93)!
However, is it really because the TWFE model is wrong; or is it that we misunderstand the interpretation of the coefficient?
Contrary to popular belief, the coefficient of a linear (OLS) model is not necessarily a treatment effect. As demonstrated by Słoczyński (2022) (for presentation slides see here), the coefficient of a linear model is actually the weighted average of treatment effects on the treated (ATT) and untreated (ATU). Also as shown by Rosenblum and Laan (2010), to use a linear model to estimate treatment effects, one needs to include interactions with other covariates:
Note there is no such requirement (including interactions) for Poisson regression, so we should expect to get a positive treatment effect straight away1. This can be easily verified:
. xtpoisson Y i.t D, fe vce(robust) irr nolog
Conditional fixed-effects Poisson regression Number of obs = 1,800
Group variable: id Number of groups = 30
Obs per group:
min = 60
avg = 60.0
max = 60
Wald chi2(60) = 639556.88
Log pseudolikelihood = -7822.7233 Prob > chi2 = 0.0000
(Std. err. adjusted for clustering on id)
------------------------------------------------------------------------------
| Robust
Y | IRR std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
t |
2 | 1.060606 .0058044 10.75 0.000 1.04929 1.072044
(... other t dummies omitted)
60 | 11.37048 1.198453 23.06 0.000 9.248301 13.97964
|
D | 1.242274 .058687 4.59 0.000 1.132414 1.362791
------------------------------------------------------------------------------
As expected, we got a positive treatment effect of incident ratio = 1.24. We can also get expected treatment effects (ATT) using Słoczyński’s method and a linear model with interactions:
. * Słoczyński's method *
. hettreatreg i.t i.id, outcome(Y) treatment(D)
"OLS" is the estimated regression coefficient on D.
OLS = -25.93
P(d=1) = .523
P(d=0) = .477
w1 = .443
w0 = .557
delta = .079
ATE = -7.245
ATT = 105.2
ATU = -130.4
OLS = w1*ATT + w0*ATU = -25.93
. teffects ra (Y t i.id) (D), atet
Iteration 0: EE criterion = 1.708e-26
Iteration 1: EE criterion = 1.612e-28
Treatment-effects estimation Number of obs = 1,800
Estimator : regression adjustment
Outcome model : linear
Treatment model: none
------------------------------------------------------------------------------
| Robust
Y | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
D |
(1 vs 0) | 105.9309 2.963968 35.74 0.000 100.1217 111.7402
-------------+----------------------------------------------------------------
POmean |
D |
0 | 55.82784 .5471447 102.03 0.000 54.75546 56.90023
------------------------------------------------------------------------------
As can be seen, both methods yield ATT around a positive value of 105. Słoczyński’s method also clearly shows how the coefficient from the OLS model can be decomposed into two parts: ATT and ATU. In simple DiD settings, ATT is equal to ATU, and hence the coefficient (as it is a weighted average of ATT and ATU), so it is ok to just look at the coefficient. In a more complicated DiD setting such as this one, ATT may not be equal to ATU – indeed the difference is very large in this case, which means there is a need to decompose the coefficient.
We can also compare Słoczyński’s ATT estimator with other recently developed DiD estimators. Here we use the example on the Synthetic Difference-in-Differences (SDiD) page. We will firstly show the results by the SDiD estimator, followed by Słoczyński’s ATT estimator. For completion, we will also look at a nonparametric (model-free) estimator, the nearest-neighbour (NN) matching.
. sdid Y id year D, vce(bootstrap) seed(1000)
Bootstrap replications (50). This may take some time.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
Synthetic Difference-in-Differences Estimator
-----------------------------------------------------------------------------
Y | ATT Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------
treatment | 131.07490 8.42889 15.55 0.000 114.55458 147.59522
-----------------------------------------------------------------------------
95% CIs and p-values are based on Large-Sample approximations.
Refer to Arkhangelsky et al., (2020) for theoretical derivations.
. hettreatreg i.year i.id, outcome(Y) treatment(D)
"OLS" is the estimated regression coefficient on D.
OLS = 77.59
P(d=1) = .328
P(d=0) = .672
w1 = .854
w0 = .146
delta = -.526
ATE = -119
ATT = 132.2
ATU = -241.8
OLS = w1*ATT + w0*ATU = 77.59
. teffects nnmatch (Y year i.id) (D), atet
Treatment-effects estimation Number of obs = 1,800
Estimator : nearest-neighbor matching Matches: requested = 1
Outcome model : matching min = 1
Distance metric: Mahalanobis max = 23
------------------------------------------------------------------------------
| AI robust
Y | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
D |
(1 vs 0) | 146.7567 4.294783 34.17 0.000 138.3391 155.1743
------------------------------------------------------------------------------
As can be seen, the two estimators indeed give very close results (131.07 vs 132.2). NN matching gives a slightly higher ATT (146.76).
Obviously more research is required. What we can learn from the simulated exercises so far are:
A coefficient in a linear (OLS) model is not necessarily the ATU, ATT or average treatment effect (ATE).
The TWFE estimator is considered incorrect largely because of the misunderstanding of what the coefficient in a linear model is (or is not).
The Poisson regression and nearest-neighbour (NN) matching may be good alternatives.
References
Rosenblum, M., van der Laan, M.J., 2010. Simple, Efficient Estimators of Treatment Effects in Randomized Trials Using Generalized Linear Models to Leverage Baseline Variables. The International Journal of Biostatistics 6. https://doi.org/10.2202/1557-4679.1138
Słoczyński, T., 2022. Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights. The Review of Economics and Statistics 104, 501–509. https://doi.org/10.1162/rest_a_00953
Note after reading more this statement seems incorrect (May 2023).