DiD Toolbox for Matlab

(Dr. Ralf Elsas-Nicolle, 10/30/2025)

Overview

As mentioned in my previous blog, recent years have witnessed a strong increase in academic studies making causal claims (see for example Garg/Fetzer 2024), and significant methodological advancements of tools for estimating causal effects in complex, real-world settings have been made. The 2021 Nobel Prize in Economic Sciences awarded to Joshua Angrist, Guido Imbens, and David Card underscores the transformative impact.
I have developed a Matlab DiD Toolbox to make these methods also available in Matlab. The toolbox can be downloaded and installed from Matlab File Exchange. The toolbox implements methods developed or discussed in the following key papers:
  • Goodman-Bacon (2021): Provides the mathematical foundation for the decomposition of the TWFE estimator, explaining how it operates as a weighted average of DiD estimates and identifying the source of bias from „negative weights“ due to treatment effect heterogeneity.
  • Wooldridge (2021): Establishes the algebraic equivalence between the Two-Way Fixed Effects (TWFE) estimator and the Two-Way Mundlak (TWM) regression, enabling flexible implementation using pooled OLS.
  • Borusyak, Jaravel, and Spiess (2024): Derives the efficient and robust imputation estimator (BJS) for staggered DiD, which estimates counterfactual outcomes using only untreated observations to calculate heterogeneous causal effects, providing efficiency and avoiding spurious identification.
  • de Chaisemartin and D’Haultfœuille (2020): Proposes the $ $DID_M$ $ estimator, which estimates a robust Average Treatment Effect across switching cells ($ $\delta_S$ $) and introduces robustness measures for assessing TWFE bias, particularly in designs where weights may be negative. And the unique feature is that the estimator can handle an on/off treatment, while all other estimators assume that treatment is an absorptive state.
  • Callaway / Sant’Anna (2021): CS focus on cohort/time based analyses, looking at $ ATT(g,k) $ with a focus on taking covariates into account. They derive identification, estimation, and inference strategies assuming parametric nuisance models (i.e. linear outcome regression, logit for propensity scores) that admit standard regularity conditions. In an extension, the interaction weighted estimator by Sun/Abraham (2021) builds on this work, showing how get unbiased event studies in this setting.
The toolbox comes with an extensive manual as well as one worked real-life example for an analysis of a staggered treatment, using the example of the introduction of the so-called „Castle Doctrine“ in the U.S., where from 2000 to 2010, more than 20 states passed the “Castle Doctrine” law which extended lethal self-defense rights.
In this blog post, I will showcase some basic functionality of the toolbox to test whether two of the most popular estimators are unbiased using Monte Carlo simulation – the Wooldridge and Callaway/Sant’Anna estimators. Obviously, there is no doubt that these estimators under their assumptions are unbiased, but the Monte Carlo simulation shown below will test whether the code implementation is flawless.

Data Generation

In the first step, the genDIDdata functionality is used to generate data with staggered and heterogeneous treatment. This function allows a very flexible generation of panel data with different treatment effect modeling. Specifically, we will generate data comprising 12 year-observations for 500 individuals out of which 60% are treated, starting at one out of three defined periods (year 4, 7, and 10). We also model a dependency of all variables (except for treatment effects on two exogenous variables x1 and x2 (i.e. covariates).
Treatment effects are generated such the first cohort has an ATT of 2, the second of 2.5 and the third of 3. Knowing the actual ATTs allows in the Monte Carlo simulation to compare estimated ATTs to the true ones, i.e. testing unbiasedness.
clear;
 
T = did.genDIDdata(12, 500, 0.6,
treatType=„constantTime“, startPeriod=4,
cohortTimes=[4 7 10],xNum=2, % creates x1, x2
betaX=[0.6, -0.4], % effect of x1,x2 on y
xUnitStd=1, xTimeStd=1, xShockStd=1, CohortIncrease=0.5, Seed=42);
 
ds = did.Dataset.fromTable(T,idVar=„id“, timeVar=„time“, dVar=„D“, yVar=„y“,describe=true);
[DataDesc] Units = 500 | Periods = 12 | Nobs = 6000 | Balanced Units = 500 (100.0%) [DataDesc] Ever-treated = 320 (64.0%) | Never-treated = 180 (36.0%) | Leavers = 0 (0.0% of ever) Cohorts (first 6 rows): cohort_label N_units Share N_leavers ShareLeaversAmongCohort FirstTreat_time ____________ _______ _____ _________ _______________________ _______________ „4“ 105 0.21 0 0 4 „7“ 95 0.19 0 0 7 „10“ 120 0.24 0 0 10 Outcome Y (overall) – N/Mean/SD/Min/P25/Median/P75/Max: N Mean SD Min P25 Median P75 Max ____ ______ ______ _______ _______ ______ ______ ______ 6000 2.0493 2.8177 -8.1942 0.12043 2.0683 3.9571 11.147 Outcome Y by cohort (including ’never‘): cohort_int cohort_time cohort_label N Mean SD Min P25 Median P75 Max __________ ___________ ____________ ____ ______ ______ _______ _______ ______ ______ ______ 0 NaN „never“ 2160 1.2432 2.702 -8.1942 -0.562 1.2372 3.0617 9.4748 4 4 „4“ 1260 2.9115 2.7196 -6.3011 1.197 2.98 4.7945 10.862 7 7 „7“ 1140 2.7616 2.8649 -5.2158 0.73006 2.7564 4.6977 11.147 10 10 „10“ 1440 1.9403 2.6714 -7.586 0.14993 1.9918 3.656 10.927

TWFE and Bacon Decomposition – Ignoring Staggered Treatment

As the first step of analyses, a standard two-way fixed effects regression is conducted, yielding an average estimate for the ATT. Since the Borusyiak and the Goodman-Bacon analyses, it is well known that the standard TWFE estimator is shown to be a weighted average of all possible Difference-in-Differences estimators based on the cohorts in the data. As some comparisons are not compatible with the standard DiD assumptions, they can contribute to the average estimate (even with negative) weight, thereby inducing bias.
resTWFE = did.fit(„TWFE“, ds,Covariates=[„x1“,„x2“]);
[TWFE] ATT(D) = 2.523864 [TWFE] Standard errors clustered by id [TWFE] 3×5 table Name Estimate SE tStat pValue ____ ________ ________ _______ ___________ „D“ 2.5239 0.07748 32.575 1.32e-125 „x1“ 0.62099 0.021035 29.522 1.4785e-111 „x2“ -0.3961 0.021286 -18.608 4.3329e-59
The Bacon-decomposition can be used to measure the contribution to the ATT estimate of 2.524 by decomposing the cohort comparisons:
  1. Treated vs. Untreated: Compares a treated cohort to a never-treated control group.
  2. Early vs. Late Treated (Admissible Timing): Earlier-treated groups act as treatment, later-treated groups act as controls during their pre-period.
  3. Late vs. Early Treated (Forbidden Comparison): Later-treated groups act as treatment, already-treated earlier groups act as controls.
res = did.fit(„bacon“, ds);
—- Bacon: Overall (TWFE) —– 2.4957 Bacon: —- Comparisons —– type group1 group2 weight estimate __________________________ ______ ______ ________ ________ „Treated vs Never Treated“ 4 0 0.18625 1.9489 „Treated vs Never Treated“ 7 0 0.22469 2.3883 „Treated vs Never Treated“ 10 0 0.21286 3.0587 „Treated earlier vs later“ 4 7 0.032767 1.813 „Treated earlier vs later“ 4 10 0.082779 1.9819 „Treated earlier vs later“ 7 10 0.074895 2.5304 „Treated later vs earlier“ 7 4 0.065533 2.2629 „Treated later vs earlier“ 10 4 0.082779 3.1342 „Treated later vs earlier“ 10 7 0.037448 3.3188 Bacon: —- Comparisons agg. by Type —- Type Weight Estimate __________________________ _______ ________ „Treated earlier vs later“ 0.19044 2.1685 „Treated later vs earlier“ 0.18576 2.864 „Treated vs Never Treated“ 0.6238 2.4859
As a result we see very different ATT contributions for the (aggregated) cohort comparisons. Still, the „clean“ comparison (treated vs. never-treated) has a weight of about 90%, indicating a not too strong bias of the average ATT. Keep in mind though that treatment effects vary over time, which is necessarily ignored when calculating an average across all periods and cohorts using TWFE.

Staggered DiD Estimators

For illustration purposes, we first demonstrate a single estimation for both the Wooldridge and Callaway/Sant’Anna estimator. Cohort aggregated ATTs for both estimators show time variability of ATTs and are close to the true ATTs of 2, 2.5 and 3, respectively.
resWool = did.fit(„wooldridge“, ds, Covariates=[„x1“,„x2“],vcov=„clustered“, clusters=[„id“]);
[wooldridge] ATT by cohort (mean of cohorts coefficients): Cohort #TimePeriods ATT(k) SE tStat pValue ______ ____________ ______ _________ ______ __________ 4 9 2.04 0.0021972 928.46 0.00068567 7 6 2.3774 0.0014479 1642 0.00038772 10 3 3.111 0.0013336 2332.7 0.00027291 [wooldridge] Overall ATT (cohort-share weighted): Estimate SE tStat pValue ________ _________ ______ __________ 2.5418 0.0016509 1539.6 0.00041349 [wooldridge] ATT by event time (mean/median across cohorts): k ATT_hat_mean ATT_hat_median nCells _ ____________ ______________ ______ 0 2.1801 2.0947 3 1 2.0626 1.9459 3 2 2.1817 2.1022 3 3 2.2817 2.2817 2 4 2.1201 2.1201 2 5 2.274 2.274 2 6 3.1457 3.1457 1 7 3.1153 3.1153 1 8 3.0721 3.0721 1
resCS = did.fit(„cs“, ds,
Comparison=„notyet“, Delta=0, Approach=„dr“, Covariates=[„x1“,„x2“],
Seed=29, SEMethod=„multiplier“);
[CS] R=18 cells; Approach=dr; Comp=notyet; δ=0; SE=multiplier; W=treatedObs. Bootstrap B=100, crit(95%)=2.812 [CS] Overall ATT 1×6 table Estimate SE tStat pValue LB UB ________ _______ _____ ______ ______ _____ 2.2533 0.14334 15.72 0 1.9657 2.541 [CS] Estimates 18×11 table Name Effect Estimate g t SE tStat pValue gYear tYear refYear _________________ __________ ________ __ __ _______ ______ ______ _____ _____ _______ „ATT(g=4, t=4)“ „ATT(g,t)“ 1.8117 4 4 0.26518 6.832 0 4 4 3 „ATT(g=4, t=5)“ „ATT(g,t)“ 1.4996 4 5 0.2279 6.5801 0 4 5 3 „ATT(g=4, t=6)“ „ATT(g,t)“ 1.8315 4 6 0.2447 7.4846 0 4 6 3 „ATT(g=4, t=7)“ „ATT(g,t)“ 1.7235 4 7 0.27726 6.2161 0 4 7 3 „ATT(g=4, t=8)“ „ATT(g,t)“ 1.7184 4 8 0.26202 6.5583 0 4 8 3 „ATT(g=4, t=9)“ „ATT(g,t)“ 2.1944 4 9 0.27629 7.9424 0 4 9 3 „ATT(g=4, t=10)“ „ATT(g,t)“ 1.8391 4 10 0.35066 5.2449 0 4 10 3 „ATT(g=4, t=11)“ „ATT(g,t)“ 1.7211 4 11 0.26725 6.44 0 4 11 3 „ATT(g=4, t=12)“ „ATT(g,t)“ 1.7769 4 12 0.30425 5.8403 0 4 12 3 „ATT(g=7, t=7)“ „ATT(g,t)“ 2.5435 7 7 0.27552 9.2318 0 7 7 6 „ATT(g=7, t=8)“ „ATT(g,t)“ 2.5173 7 8 0.28184 8.9315 0 7 8 6 „ATT(g=7, t=9)“ „ATT(g,t)“ 2.6227 7 9 0.27892 9.403 0 7 9 6 „ATT(g=7, t=10)“ „ATT(g,t)“ 2.5916 7 10 0.2992 8.6615 0 7 10 6 „ATT(g=7, t=11)“ „ATT(g,t)“ 2.3265 7 11 0.30694 7.5797 0 7 11 6 „ATT(g=7, t=12)“ „ATT(g,t)“ 2.1746 7 12 0.35502 6.1254 0 7 12 6 „ATT(g=10, t=10)“ „ATT(g,t)“ 3.2181 10 10 0.24013 13.401 0 10 10 9 „ATT(g=10, t=11)“ „ATT(g,t)“ 3.0574 10 11 0.25898 11.805 0 10 11 9 „ATT(g=10, t=12)“ „ATT(g,t)“ 3.1332 10 12 0.27889 11.235 0 10 12 9 [CS] By Cohort 3×8 table g Estimate SE tStat pValue LB UB gYear __ ________ _______ ______ ______ _______ ______ _____ 4 1.7907 0.26518 6.7528 0 0.61119 2.9702 4 7 2.4627 0.20778 11.852 0 1.5385 3.3869 7 10 3.1362 0.20041 15.649 0 2.2448 4.0276 10

Monte Carlo Simulation

We now conduct a Monte Carlo simulation, where in 1.000 simulation runs each time new panel data (with the same structure but different random variability) are generated and the Wooldridge and CallaWay/Sant’Anna estimators are applied. These estimators will be unbiased when the mean ATT estimate across all simulation runs corresponds to the true ATT values on average.
S=1000; % num simulations
resATT=nan(S,3);
resCS=nan(S,3);
tic
parfor ii=1:S
% gen data
simTable = did.genDIDdata(12, 500, 0.6,
treatType=„constantTime“, startPeriod=4,
cohortTimes=[4 7 10],xNum=2, % creates x1, x2
betaX=[0.6, -0.4], % effect of x1,x2 on y
xUnitStd=1, xTimeStd=1, xShockStd=1, CohortIncrease=0.5);
% estimate Wooldridge
resSim = did.fit(„Wooldridge“,simTable,idVar=„id“, timeVar=„time“, dVar=„D“, yVar=„y“,
Covariates=[„x1“,„x2“],
details=false,describe=false,Print=false);
resATT(ii,:) = resSim.summaryTable.(„ATT(k)“)‘;
% estiamte Callway/Sant’Anna
resSimCS = did.fit(„cs“, simTable,idVar=„id“, timeVar=„time“, dVar=„D“, yVar=„y“,
Comparison=„never“, Delta=0, Approach=„dr“, Covariates=[„x1“,„x2“],
Seed=NaN, SEMethod=„multiplier“,Print=false);
 
resCS(ii,:) = resSimCS.Aggregates.byCohort.Estimate‘;
end
toc
Elapsed time is 19.267593 seconds.
% make tables of true ATTs versus average estimated ATTs per cohort over the
% simulation runs
trueTau = unique(T.ATT);
nameVar = „ATT_“+[1:size(resATT,2)];
resTbl = array2table([trueTau(2:end)‘;mean(resATT)],VariableNames=nameVar,RowNames=[„True“,„meanEstimate“]);
fprintf(„Wooldridge meanATT from „+ S + “ sims: \n“);
Wooldridge meanATT from 1000 sims:
resTbl
resTbl = 2×3 table
ATT_1ATT_2ATT_3
1 True22.50003
2 meanEstimate2.00152.50323.0013
resTblCS = array2table([trueTau(2:end)‘;mean(resCS)],VariableNames=nameVar,RowNames=[„True“,„meanEstimate“]);
fprintf(„Callaway/Sant’Anna(dr) meanATT from „+ S + “ sims: \n“);
Callaway/Sant’Anna(dr) meanATT from 1000 sims:
resTblCS
resTblCS = 2×3 table
ATT_1ATT_2ATT_3
1 True22.50003
2 meanEstimate2.00352.50822.9993
As can be seen from the tables, both estimators take the control variates correctly into account and are unbiased as average ATT estimates across the simulations reveal the true coefficients. This is good news for the implementation of these estimators in the DID Toolbox.

Literature

  • Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation, Review of Economic Studies, 00, 1–33.
  • Callaway, B. / Sant’Anna, P. (2021): Difference-in-Differences with Multiple Time Periods, Journal of Econometrics 225 (2): 200–230.
  • de Chaisemartin, C., & D’Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects, American Economic Review, 110(9), 2964–2996.
  • Garg, P. / Fetzer, T. (2025): Causal Claims in Economics, arXiv:2501.06873
  • Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing, Journal of Econometrics, 225(2), 254–277.
  • Wooldridge, J. (2021). Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Event Study Estimators. Working Paper.
Schlagwörter: , , ,