Are the two-way fixed-effects (TWFE) models wrong for difference-in-difference (DiD) data with heterogeneous treatment effects?

May 08, 2023

Note: the post was originally published on another website on 3 September 2022.

I recently came across this very good resource on difference-in-difference (DiD) design. In short, the standard two-way fixed-effects (TWFE) models may give wrong results in the presence of heterogeneous treatment effects and hence new methods are required, and the website reviewed a range of such methods implemented in various statistical packages.

But are the TWFE models really wrong? Let’s take the example documented in the section “So where do TWFE regressions go wrong?” (this is an archived version which may display math notations incorrectly; for the current version see here). The aim is to estimate the treatment effect for the data as visualised as follows:

As shown on the website, if we run a TWFE model we will have the following results (D is the treatment indicator):

. xtreg Y i.t D, fe

Fixed-effects (within) regression               Number of obs     =      1,800
Group variable: id                              Number of groups  =         30

R-squared:                                      Obs per group:
     Within  = 0.7320                                         min =         60
     Between = 0.8104                                         avg =       60.0
     Overall = 0.3604                                         max =         60

                                                F(60,1710)        =      77.83
corr(u_i, Xb) = -0.1062                         Prob > F          =     0.0000

------------------------------------------------------------------------------
           Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           t |
          2  |          1   10.25912     0.10   0.922    -19.12175    21.12175
          3  |          2   10.25912     0.19   0.845    -18.12175    22.12175
(... other t dummies omitted)
         60  |   242.4984   10.79994    22.45   0.000     221.3159    263.6809
             |
           D |  -25.93176   3.374793    -7.68   0.000    -32.55092    -19.3126
       _cons |       16.5   7.254293     2.27   0.023     2.271776    30.72822
-------------+----------------------------------------------------------------
     sigma_u |  67.425362
     sigma_e |  39.733399
         rho |  .74224277   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(29, 1710) = 97.21                   Prob > F = 0.0000

It looks as if the result is wrong: the treatment effects are clearly positive (the slopes increase for all lines) but what we got is a negative coefficient (-25.93)!

However, is it really because the TWFE model is wrong; or is it that we misunderstand the interpretation of the coefficient?

Contrary to popular belief, the coefficient of a linear (OLS) model is not necessarily a treatment effect. As demonstrated by Słoczyński (2022) (for presentation slides see here), the coefficient of a linear model is actually the weighted average of treatment effects on the treated (ATT) and untreated (ATU). Also as shown by Rosenblum and Laan (2010), to use a linear model to estimate treatment effects, one needs to include interactions with other covariates:

Note there is no such requirement (including interactions) for Poisson regression, so we should expect to get a positive treatment effect straight away1. This can be easily verified:

. xtpoisson Y i.t D, fe vce(robust) irr nolog

Conditional fixed-effects Poisson regression      Number of obs    =     1,800
Group variable: id                                Number of groups =        30

                                                  Obs per group:
                                                               min =        60
                                                               avg =      60.0
                                                               max =        60

                                                  Wald chi2(60)    = 639556.88
Log pseudolikelihood = -7822.7233                 Prob > chi2      =    0.0000

                                     (Std. err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |               Robust
           Y |        IRR   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           t |
          2  |   1.060606   .0058044    10.75   0.000      1.04929    1.072044
(... other t dummies omitted)
         60  |   11.37048   1.198453    23.06   0.000     9.248301    13.97964
             |
           D |   1.242274    .058687     4.59   0.000     1.132414    1.362791
------------------------------------------------------------------------------

As expected, we got a positive treatment effect of incident ratio = 1.24. We can also get expected treatment effects (ATT) using Słoczyński’s method and a linear model with interactions:

. * Słoczyński's method *
. hettreatreg i.t i.id, outcome(Y) treatment(D)

"OLS" is the estimated regression coefficient on D.

   OLS  =  -25.93   

P(d=1)  =  .523
P(d=0)  =  .477

    w1  =  .443
    w0  =  .557
 delta  =  .079

   ATE  =  -7.245   
   ATT  =  105.2    
   ATU  =  -130.4   

OLS = w1*ATT + w0*ATU = -25.93   

. teffects ra (Y t i.id) (D), atet

Iteration 0:   EE criterion =  1.708e-26  
Iteration 1:   EE criterion =  1.612e-28  

Treatment-effects estimation                    Number of obs     =      1,800
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
------------------------------------------------------------------------------
             |               Robust
           Y | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
           D |
   (1 vs 0)  |   105.9309   2.963968    35.74   0.000     100.1217    111.7402
-------------+----------------------------------------------------------------
POmean       |
           D |
          0  |   55.82784   .5471447   102.03   0.000     54.75546    56.90023
------------------------------------------------------------------------------

As can be seen, both methods yield ATT around a positive value of 105. Słoczyński’s method also clearly shows how the coefficient from the OLS model can be decomposed into two parts: ATT and ATU. In simple DiD settings, ATT is equal to ATU, and hence the coefficient (as it is a weighted average of ATT and ATU), so it is ok to just look at the coefficient. In a more complicated DiD setting such as this one, ATT may not be equal to ATU – indeed the difference is very large in this case, which means there is a need to decompose the coefficient.

We can also compare Słoczyński’s ATT estimator with other recently developed DiD estimators. Here we use the example on the Synthetic Difference-in-Differences (SDiD) page. We will firstly show the results by the SDiD estimator, followed by Słoczyński’s ATT estimator. For completion, we will also look at a nonparametric (model-free) estimator, the nearest-neighbour (NN) matching.

. sdid Y id year D, vce(bootstrap) seed(1000)
Bootstrap replications (50). This may take some time.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................     50


Synthetic Difference-in-Differences Estimator

-----------------------------------------------------------------------------
           Y |     ATT     Std. Err.     t      P>|t|    [95% Conf. Interval]
-------------+---------------------------------------------------------------
   treatment | 131.07490    8.42889    15.55    0.000   114.55458   147.59522
-----------------------------------------------------------------------------
95% CIs and p-values are based on Large-Sample approximations.
Refer to Arkhangelsky et al., (2020) for theoretical derivations.

. hettreatreg i.year i.id, outcome(Y) treatment(D)

"OLS" is the estimated regression coefficient on D.

   OLS  =  77.59    

P(d=1)  =  .328
P(d=0)  =  .672

    w1  =  .854
    w0  =  .146
 delta  =  -.526

   ATE  =  -119     
   ATT  =  132.2    
   ATU  =  -241.8   

OLS = w1*ATT + w0*ATU = 77.59    

. teffects nnmatch (Y year i.id) (D), atet

Treatment-effects estimation                   Number of obs      =      1,800
Estimator      : nearest-neighbor matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Distance metric: Mahalanobis                                  max =         23
------------------------------------------------------------------------------
             |              AI robust
           Y | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
           D |
   (1 vs 0)  |   146.7567   4.294783    34.17   0.000     138.3391    155.1743
------------------------------------------------------------------------------

As can be seen, the two estimators indeed give very close results (131.07 vs 132.2). NN matching gives a slightly higher ATT (146.76).

Obviously more research is required. What we can learn from the simulated exercises so far are:

A coefficient in a linear (OLS) model is not necessarily the ATU, ATT or average treatment effect (ATE).
The TWFE estimator is considered incorrect largely because of the misunderstanding of what the coefficient in a linear model is (or is not).
The Poisson regression and nearest-neighbour (NN) matching may be good alternatives.

References

Rosenblum, M., van der Laan, M.J., 2010. Simple, Efficient Estimators of Treatment Effects in Randomized Trials Using Generalized Linear Models to Leverage Baseline Variables. The International Journal of Biostatistics 6. https://doi.org/10.2202/1557-4679.1138

Słoczyński, T., 2022. Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights. The Review of Economics and Statistics 104, 501–509. https://doi.org/10.1162/rest_a_00953

Note after reading more this statement seems incorrect (May 2023).

Chao’s Substack

Discussion about this post