DiD Toolbox for Matlab

(Dr. Ralf Elsas-Nicolle, 10/30/2025)

Table of Contents

Overview
Data Generation
TWFE and Bacon Decomposition – Ignoring Staggered Treatment
Staggered DiD Estimators
Monte Carlo Simulation
Literature

Overview

As mentioned in my previous blog, recent years have witnessed a strong increase in academic studies making causal claims (see for example Garg/Fetzer 2024), and significant methodological advancements of tools for estimating causal effects in complex, real-world settings have been made. The 2021 Nobel Prize in Economic Sciences awarded to Joshua Angrist, Guido Imbens, and David Card underscores the transformative impact.

I have developed a Matlab DiD Toolbox to make these methods also available in Matlab. The toolbox can be downloaded and installed from Matlab File Exchange.  The toolbox implements methods developed or discussed in the following key papers:

Goodman-Bacon (2021): Provides the mathematical foundation for the decomposition of the TWFE estimator, explaining how it operates as a weighted average of DiD estimates and identifying the source of bias from „negative weights“ due to treatment effect heterogeneity.
Wooldridge (2021): Establishes the algebraic equivalence between the Two-Way Fixed Effects (TWFE) estimator and the Two-Way Mundlak (TWM) regression, enabling flexible implementation using pooled OLS.
Borusyak, Jaravel, and Spiess (2024): Derives the efficient and robust imputation estimator (BJS) for staggered DiD, which estimates counterfactual outcomes using only untreated observations to calculate heterogeneous causal effects, providing efficiency and avoiding spurious identification.
de Chaisemartin and D’Haultfœuille (2020): Proposes the $ $DID_M$ $ estimator, which estimates a robust Average Treatment Effect across switching cells ($ $\delta_S$ $) and introduces robustness measures for assessing TWFE bias, particularly in designs where weights may be negative. And the unique feature is that the estimator can handle an on/off treatment, while all other estimators assume that treatment is an absorptive state.
Callaway / Sant’Anna (2021): CS focus on cohort/time based analyses, looking at $ ATT(g,k) $ with a focus on taking covariates into account. They derive identification, estimation, and inference strategies assuming parametric nuisance models (i.e. linear outcome regression, logit for propensity scores) that admit standard regularity conditions. In an extension, the interaction weighted estimator by Sun/Abraham (2021) builds on this work, showing how get unbiased event studies in this setting.

The toolbox comes with an extensive manual as well as one worked real-life example for an analysis of a staggered treatment, using the example of the introduction of the so-called „Castle Doctrine“ in the U.S., where from 2000 to 2010, more than 20 states passed the “Castle Doctrine” law which extended lethal self-defense rights.  

In this blog post, I will showcase some basic functionality of the toolbox to test whether two of the most popular estimators are unbiased using Monte Carlo simulation – the Wooldridge and Callaway/Sant’Anna estimators. Obviously, there is no doubt that these estimators under their assumptions are unbiased, but the Monte Carlo simulation shown below will test whether the code implementation is flawless.    

Data Generation

In the first step, the genDIDdata functionality is used to generate data with staggered and heterogeneous treatment. This function allows a very flexible generation of panel data with different treatment effect modeling. Specifically, we will generate data comprising 12 year-observations for 500 individuals out of which 60% are treated, starting at one out of three defined periods (year 4, 7, and 10). We also model a dependency of all variables (except for treatment effects on two exogenous variables x1 and x2 (i.e. covariates).

Treatment effects are generated such the first cohort has an ATT of 2, the second of 2.5 and the third of 3. Knowing the actual ATTs allows in the Monte Carlo simulation to compare estimated ATTs to the true ones, i.e. testing unbiasedness.  

clear;

T = did.genDIDdata(12, 500, 0.6, …

        treatType=„constantTime“, startPeriod=4, …

        cohortTimes=[4 7 10],xNum=2,                     … % creates x1, x2

        betaX=[0.6, -0.4],          … % effect of x1,x2 on y

        xUnitStd=1, xTimeStd=1, xShockStd=1, CohortIncrease=0.5, Seed=42);

ds = did.Dataset.fromTable(T,idVar=„id“, timeVar=„time“, dVar=„D“, yVar=„y“,describe=true); 

[DataDesc] Units = 500 | Periods = 12 | Nobs = 6000 | Balanced Units = 500 (100.0%)
[DataDesc] Ever-treated = 320 (64.0%) | Never-treated = 180 (36.0%) | Leavers = 0 (0.0% of ever)
Cohorts (first 6 rows):
    cohort_label    N_units    Share    N_leavers    ShareLeaversAmongCohort    FirstTreat_time
    ____________    _______    _____    _________    _______________________    _______________

        „4“           105      0.21         0                   0                      4       
        „7“            95      0.19         0                   0                      7       
        „10“          120      0.24         0                   0                     10       

Outcome Y (overall) – N/Mean/SD/Min/P25/Median/P75/Max:
     N       Mean       SD        Min        P25      Median     P75       Max  
    ____    ______    ______    _______    _______    ______    ______    ______

    6000    2.0493    2.8177    -8.1942    0.12043    2.0683    3.9571    11.147

Outcome Y by cohort (including ’never‘):
    cohort_int    cohort_time    cohort_label     N       Mean       SD        Min        P25      Median     P75       Max  
    __________    ___________    ____________    ____    ______    ______    _______    _______    ______    ______    ______

         0            NaN          „never“       2160    1.2432     2.702    -8.1942     -0.562    1.2372    3.0617    9.4748
         4              4          „4“           1260    2.9115    2.7196    -6.3011      1.197      2.98    4.7945    10.862
         7              7          „7“           1140    2.7616    2.8649    -5.2158    0.73006    2.7564    4.6977    11.147
        10             10          „10“          1440    1.9403    2.6714     -7.586    0.14993    1.9918     3.656    10.927

TWFE and Bacon Decomposition – Ignoring Staggered Treatment

As the first step of analyses, a standard two-way fixed effects regression is conducted, yielding an average estimate for the ATT. Since the Borusyiak and the Goodman-Bacon analyses, it is well known that the standard TWFE estimator is shown to be a weighted average of all possible Difference-in-Differences estimators based on the cohorts in the data. As some comparisons are not compatible with the standard DiD assumptions, they can contribute to the average estimate (even with negative) weight, thereby inducing bias.

resTWFE = did.fit(„TWFE“, ds,Covariates=[„x1“,„x2“]); 

[TWFE] ATT(D) = 2.523864
[TWFE] Standard errors clustered by id 
[TWFE]
  3×5 table

    Name    Estimate       SE        tStat       pValue   
    ____    ________    ________    _______    ___________

    „D“      2.5239      0.07748     32.575      1.32e-125
    „x1“    0.62099     0.021035     29.522    1.4785e-111
    „x2“    -0.3961     0.021286    -18.608     4.3329e-59

The Bacon-decomposition can be used to measure the contribution to the ATT estimate of 2.524 by decomposing the cohort comparisons:

Treated vs. Untreated: Compares a treated cohort to a never-treated control group.
Early vs. Late Treated (Admissible Timing): Earlier-treated groups act as treatment, later-treated groups act as controls during their pre-period.
Late vs. Early Treated (Forbidden Comparison): Later-treated groups act as treatment, already-treated earlier groups act as controls.

res = did.fit(„bacon“, ds);

—- Bacon: Overall (TWFE) —–
    2.4957

Bacon: —- Comparisons —–
               type               group1    group2     weight     estimate
    __________________________    ______    ______    ________    ________

    „Treated vs Never Treated“       4         0       0.18625     1.9489 
    „Treated vs Never Treated“       7         0       0.22469     2.3883 
    „Treated vs Never Treated“      10         0       0.21286     3.0587 
    „Treated earlier vs later“       4         7      0.032767      1.813 
    „Treated earlier vs later“       4        10      0.082779     1.9819 
    „Treated earlier vs later“       7        10      0.074895     2.5304 
    „Treated later vs earlier“       7         4      0.065533     2.2629 
    „Treated later vs earlier“      10         4      0.082779     3.1342 
    „Treated later vs earlier“      10         7      0.037448     3.3188 

Bacon: —- Comparisons agg. by Type —-
               Type               Weight     Estimate
    __________________________    _______    ________

    „Treated earlier vs later“    0.19044     2.1685 
    „Treated later vs earlier“    0.18576      2.864 
    „Treated vs Never Treated“     0.6238     2.4859 

As a result we see very different ATT contributions for the (aggregated) cohort comparisons. Still, the „clean“ comparison (treated vs. never-treated) has a weight of about 90%, indicating a not too strong bias of the average ATT. Keep in mind though that treatment effects vary over time, which is necessarily ignored when calculating an average across all periods and cohorts using TWFE.

Staggered DiD Estimators

For illustration purposes, we first demonstrate a single estimation for both the Wooldridge and Callaway/Sant’Anna estimator. Cohort aggregated ATTs for both estimators show time variability of ATTs and are close to the true ATTs of 2, 2.5 and 3, respectively. 

resWool = did.fit(„wooldridge“, ds, Covariates=[„x1“,„x2“],vcov=„clustered“, clusters=[„id“]);

[wooldridge] ATT by cohort (mean of cohorts coefficients):
    Cohort    #TimePeriods    ATT(k)       SE        tStat       pValue  
    ______    ____________    ______    _________    ______    __________

       4           9            2.04    0.0021972    928.46    0.00068567
       7           6          2.3774    0.0014479      1642    0.00038772
      10           3           3.111    0.0013336    2332.7    0.00027291

[wooldridge] Overall ATT (cohort-share weighted):
    Estimate       SE        tStat       pValue  
    ________    _________    ______    __________

     2.5418     0.0016509    1539.6    0.00041349

[wooldridge] ATT by event time (mean/median across cohorts):
    k    ATT_hat_mean    ATT_hat_median    nCells
    _    ____________    ______________    ______

    0       2.1801           2.0947          3   
    1       2.0626           1.9459          3   
    2       2.1817           2.1022          3   
    3       2.2817           2.2817          2   
    4       2.1201           2.1201          2   
    5        2.274            2.274          2   
    6       3.1457           3.1457          1   
    7       3.1153           3.1153          1   
    8       3.0721           3.0721          1   

resCS = did.fit(„cs“, ds, …

               Comparison=„notyet“, Delta=0, Approach=„dr“, Covariates=[„x1“,„x2“], …

                Seed=29, SEMethod=„multiplier“);

[CS] R=18 cells; Approach=dr; Comp=notyet; δ=0; SE=multiplier; W=treatedObs. Bootstrap B=100, crit(95%)=2.812
[CS] Overall ATT 
  1×6 table

    Estimate      SE       tStat    pValue      LB       UB  
    ________    _______    _____    ______    ______    _____

     2.2533     0.14334    15.72      0       1.9657    2.541

[CS] Estimates 
  18×11 table

          Name             Effect      Estimate    g     t       SE       tStat     pValue    gYear    tYear    refYear
    _________________    __________    ________    __    __    _______    ______    ______    _____    _____    _______

    „ATT(g=4, t=4)“      „ATT(g,t)“     1.8117      4     4    0.26518     6.832      0         4        4         3   
    „ATT(g=4, t=5)“      „ATT(g,t)“     1.4996      4     5     0.2279    6.5801      0         4        5         3   
    „ATT(g=4, t=6)“      „ATT(g,t)“     1.8315      4     6     0.2447    7.4846      0         4        6         3   
    „ATT(g=4, t=7)“      „ATT(g,t)“     1.7235      4     7    0.27726    6.2161      0         4        7         3   
    „ATT(g=4, t=8)“      „ATT(g,t)“     1.7184      4     8    0.26202    6.5583      0         4        8         3   
    „ATT(g=4, t=9)“      „ATT(g,t)“     2.1944      4     9    0.27629    7.9424      0         4        9         3   
    „ATT(g=4, t=10)“     „ATT(g,t)“     1.8391      4    10    0.35066    5.2449      0         4       10         3   
    „ATT(g=4, t=11)“     „ATT(g,t)“     1.7211      4    11    0.26725      6.44      0         4       11         3   
    „ATT(g=4, t=12)“     „ATT(g,t)“     1.7769      4    12    0.30425    5.8403      0         4       12         3   
    „ATT(g=7, t=7)“      „ATT(g,t)“     2.5435      7     7    0.27552    9.2318      0         7        7         6   
    „ATT(g=7, t=8)“      „ATT(g,t)“     2.5173      7     8    0.28184    8.9315      0         7        8         6   
    „ATT(g=7, t=9)“      „ATT(g,t)“     2.6227      7     9    0.27892     9.403      0         7        9         6   
    „ATT(g=7, t=10)“     „ATT(g,t)“     2.5916      7    10     0.2992    8.6615      0         7       10         6   
    „ATT(g=7, t=11)“     „ATT(g,t)“     2.3265      7    11    0.30694    7.5797      0         7       11         6   
    „ATT(g=7, t=12)“     „ATT(g,t)“     2.1746      7    12    0.35502    6.1254      0         7       12         6   
    „ATT(g=10, t=10)“    „ATT(g,t)“     3.2181     10    10    0.24013    13.401      0        10       10         9   
    „ATT(g=10, t=11)“    „ATT(g,t)“     3.0574     10    11    0.25898    11.805      0        10       11         9   
    „ATT(g=10, t=12)“    „ATT(g,t)“     3.1332     10    12    0.27889    11.235      0        10       12         9   

[CS] By Cohort 
  3×8 table

    g     Estimate      SE       tStat     pValue      LB         UB      gYear
    __    ________    _______    ______    ______    _______    ______    _____

     4     1.7907     0.26518    6.7528      0       0.61119    2.9702      4  
     7     2.4627     0.20778    11.852      0        1.5385    3.3869      7  
    10     3.1362     0.20041    15.649      0        2.2448    4.0276     10  

Monte Carlo Simulation

We now conduct a Monte Carlo simulation, where in 1.000 simulation runs each time new panel data (with the same structure but different random variability) are generated and the Wooldridge and CallaWay/Sant’Anna estimators are applied. These estimators will be unbiased when the mean ATT estimate across all simulation runs corresponds to the true ATT values on average.

S=1000; % num simulations

resATT=nan(S,3);

resCS=nan(S,3);

tic

parfor ii=1:S

    % gen data

    simTable = did.genDIDdata(12, 500, 0.6, …

        treatType=„constantTime“, startPeriod=4, …

        cohortTimes=[4 7 10],xNum=2,                     … % creates x1, x2

        betaX=[0.6, -0.4],          … % effect of x1,x2 on y

        xUnitStd=1, xTimeStd=1, xShockStd=1, CohortIncrease=0.5);

    % estimate Wooldridge

    resSim = did.fit(„Wooldridge“,simTable,idVar=„id“, timeVar=„time“, dVar=„D“, yVar=„y“, …

        Covariates=[„x1“,„x2“],…

        details=false,describe=false,Print=false);

    resATT(ii,:) = resSim.summaryTable.(„ATT(k)“)‘;

    % estiamte Callway/Sant’Anna

    resSimCS = did.fit(„cs“, simTable,idVar=„id“, timeVar=„time“, dVar=„D“, yVar=„y“, …

        Comparison=„never“, Delta=0, Approach=„dr“,  Covariates=[„x1“,„x2“], …

        Seed=NaN, SEMethod=„multiplier“,Print=false);

    resCS(ii,:) = resSimCS.Aggregates.byCohort.Estimate‘;

end

toc

Elapsed time is 19.267593 seconds.

% make tables of true ATTs versus average estimated ATTs per cohort over the

% simulation runs

trueTau = unique(T.ATT);

nameVar = „ATT_“+[1:size(resATT,2)];

resTbl = array2table([trueTau(2:end)‘;mean(resATT)],VariableNames=nameVar,RowNames=[„True“,„meanEstimate“]);

fprintf(„Wooldridge meanATT from „+ S + “ sims: \n“);

Wooldridge meanATT from 1000 sims: 

resTbl

resTbl = 2×3 table 
ATT_1ATT_2ATT_3
1 True22.50003
2 meanEstimate2.00152.50323.0013

	ATT_1	ATT_2	ATT_3
1 True	2	2.5000	3
2 meanEstimate	2.0015	2.5032	3.0013

resTblCS = array2table([trueTau(2:end)‘;mean(resCS)],VariableNames=nameVar,RowNames=[„True“,„meanEstimate“]);

fprintf(„Callaway/Sant’Anna(dr) meanATT from „+ S + “ sims: \n“);

Callaway/Sant’Anna(dr) meanATT from 1000 sims: 

resTblCS

resTblCS = 2×3 table 
ATT_1ATT_2ATT_3
1 True22.50003
2 meanEstimate2.00352.50822.9993

	ATT_1	ATT_2	ATT_3
1 True	2	2.5000	3
2 meanEstimate	2.0035	2.5082	2.9993

As can be seen from the tables, both estimators take the control variates correctly into account and are unbiased as average ATT estimates across the simulations reveal the true coefficients. This is good news for the implementation of these estimators in the DID Toolbox.

Literature

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation, Review of Economic Studies, 00, 1–33.
Callaway, B. / Sant’Anna, P. (2021): Difference-in-Differences with Multiple Time Periods, Journal of Econometrics 225 (2): 200–230.
de Chaisemartin, C., & D’Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects, American Economic Review, 110(9), 2964–2996.
Garg, P. / Fetzer, T. (2025): Causal Claims in Economics, arXiv:2501.06873
Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing, Journal of Econometrics, 225(2), 254–277.
Wooldridge, J. (2021). Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Event Study Estimators. Working Paper.

Schlagwörter: Difference-in-Differences, Econometrics, Matlab, Panel Data