Data and Tools

Here you can find links to and descriptions of some tools and data repositories I have created and try to maintain and development further.

Data

Market Abuse Regulation Article 17:

  • Regulation (EU) No 596/2014 sets rules intended to protect market integrity across EU financial markets. Article 17 is especially relevant here because it concerns the mandatory public disclosure of inside information by issuers.
  • The database uses ESMA FIRDS instrument and venue reference data to identify firms and issuers that currently appear to be subject to MAR Article 17 disclosure obligations. It turns instrument-level records into an issuer-level overview, separating strictly identified stock and bond issuers from review and excluded cases.
  • The result is a research-oriented map of current Article 17 issuer scope: who the issuer is, which stock or bond instruments support the classification, which OAM/competent-authority jurisdiction is indicated by ESMA, and which cases require further legal or source review. It is not a substitute for legal advice or a formal regulatory determination.

Tools

Panel Econometrics Toolbox for Matlab

  • The Panel Econometrics Toolbox is an object-oriented MATLAB package (R2021a+) designed for the estimation and simulation of advanced panel data models. It provides a unified workflow for handling static, dynamic, and limited dependent variable panel data structures. 
  • Key Features:
    • Unified Data Handling: A smart PanelData container handles sorting, unbalanced panels, and missing values automatically.
    • Simulation Engine: Built-in Monte Carlo generator for testing estimator validity under various data generating processes (DGPs).
    • Advanced Estimators: Includes Fixed Effects, Random Effects, Correlated Random Effects (Mundlak), IV-2SLS, Arellano-Bond (Diff-GMM), Blundell-Bond (Sys-GMM), Panel Logit/Probit as well as DPF for dynamic panel estimation with fractional (limited) dependent variables (like debt ratios etc.).
    • Specification Testing: Automated diagnostic tests (Sargan, AR(m), Hausman, Weak Instruments) included by default.
    • Worked Examples: The toolbox contains a number of worked examples, using real-world data for comparison with Stata estimation, simulation evidence on the Nickell bias of ignoring endogeneity problems with a lagged dependent variable. These can be found in the examples subfolder*.*
    • Code Validation: Monte Carlo backtests of all estimators are conducted. The file verify_monte_carlo.m can be found in the tests subfolder.

Difference-in-Differences (DiD) Toolbox for Matlab

  • The DiD Toolbox is a set of Matlab tools designed for applied statisticians and econometricians to conduct Difference-in-Differences (DiD) analyses, particularly focusing on designs involving staggered treatment timing. The primary goal of this toolbox is to address these methodological challenges by providing modern, robust estimators that yield valid causal estimates in complex multi-period, staggered adoption scenarios.
  • Implemented estimators based on
    • Goodman-Bacon (2021): Provides the mathematical foundation for the decomposition of the TWFE estimator, explaining how it operates as a weighted average of DiD estimates and identifying the source of bias from „negative weights“ due to treatment effect heterogeneity.
    • Wooldridge (2021): Establishes the algebraic equivalence between the Two-Way Fixed Effects (TWFE) estimator and the Two-Way Mundlak (TWM) regression, enabling flexible implementation using pooled OLS.
    • Borusyak, Jaravel, and Spiess (2024): Derives the efficient and robust imputation estimator (BJS) for staggered DiD, which estimates counterfactual outcomes using only untreated observations to calculate heterogeneous causal effects, providing efficiency and avoiding spurious identification.
    • de Chaisemartin and D’Haultfœuille (2020): Proposes the  estimator, which estimates a robust Average Treatment Effect across switching cells () and introduces robustness measures for assessing TWFE bias, particularly in designs where weights may be negative. And the unique feature is that the estimator can handle an on/off treatment, while all other estimators assume that treatment is an absorptive state.
    • Callaway / Sant’Anna (2021): CS focus on cohort/time based analyses, looking at  with a focus on taking covariates into account. They derive identification, estimation, and inference strategies assuming parametric nuisance models (i.e. linear outcome regression, logit for propensity scores) that admit standard regularity conditions. In an extension, the interaction weighted estimator by Sun/Abraham (2021) builds on this work, showing how get unbiased event studies in this setting.
    • Rambachan/Roth (2023): The study suggests a sensitivity analysis for the parallel trends assumption of DiD analyses by asking “how large could a violation be before our conclusion changes?” They propose bounding the treatment effect estimates under various degrees of pre-trend differences.
    • Arkhangelsky et al. (2021): The study discusses synthetic difference-in-differences (SDID) methods, which try to generate parallel trends by reweighting units to match their pre-exposure trends.