% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/att_gt.R
\name{att_gt}
\alias{att_gt}
\title{Group-Time Average Treatment Effects}
\usage{
att_gt(
  yname,
  tname,
  idname = NULL,
  gname,
  xformla = NULL,
  data,
  panel = TRUE,
  allow_unbalanced_panel = FALSE,
  control_group = c("nevertreated", "notyettreated"),
  anticipation = 0,
  weightsname = NULL,
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "dr",
  base_period = "varying",
  faster_mode = TRUE,
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)
}
\arguments{
\item{yname}{The name of the outcome variable}

\item{tname}{The name of the column containing the time periods}

\item{idname}{The individual (cross-sectional unit) id name}

\item{gname}{The name of the variable in \code{data} that
contains the first period when a particular observation is treated.
This should be a positive number for all observations in treated groups.
It defines which "group" a unit belongs to.  It should be 0 for units
in the untreated group.}

\item{xformla}{A formula for the covariates to include in the
model.  It should be of the form \code{~ X1 + X2}.  Default
is NULL which is equivalent to \code{xformla=~1}.  This is
used to create a matrix of covariates which is then passed
to the 2x2 DID estimator chosen in \code{est_method}.

For time-varying covariates: (1) With balanced panel data,
in each 2x2 comparison, the covariates
are taken to be the value of the covariates in the earlier time
period, and all of the underlying computation involve change in Y
as a function of those values of covariates.  (2) With repeated cross
sections data and unbalanced panel data, the covariates are taken
from each time period and computations involve Y_post conditional
on X_post minus Y_pre conditional on X_pre.  A byproduct of this
is that, with balanced panel data and in the presence of
time-varying covariates, it is possible to get different numerical
results according to whether or not \code{allow_unbalanced_panel=TRUE} or
\code{FALSE}.}

\item{data}{The name of the data.frame that contains the data}

\item{panel}{Whether or not the data is a panel dataset.
The panel dataset should be provided in long format -- that
is, where each row corresponds to a unit observed at a
particular point in time.  The default is TRUE.  When
is using a panel dataset, the variable \code{idname} must
be set.  When \code{panel=FALSE}, the data is treated
as repeated cross sections.}

\item{allow_unbalanced_panel}{Whether or not function should
"balance" the panel with respect to time and id.  The default
values if \code{FALSE} which means that \code{\link[=att_gt]{att_gt()}} will drop
all units where data is not observed in all periods.
The advantage of this is that the computations are faster
(sometimes substantially).}

\item{control_group}{Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment.  This group does not change across groups or
time periods.  The other option is to set
\code{group="notyettreated"}.  In this case, the control group
is set to the group of units that have not yet participated
in the treatment in that time period.  This includes all
never treated units, but it includes additional units that
eventually participate in the treatment, but have not
participated yet.}

\item{anticipation}{The number of time periods before participating
in the treatment where units can anticipate participating in the
treatment and therefore it can affect their untreated potential outcomes}

\item{weightsname}{The name of the column containing the sampling weights.
If not set, all observations have same weight.}

\item{alp}{the significance level, default is 0.05}

\item{bstrap}{Boolean for whether or not to compute standard errors using
the multiplier bootstrap.  If standard errors are clustered, then one
must set \code{bstrap=TRUE}. Default is \code{TRUE} (in addition, cband
is also by default \code{TRUE} indicating that uniform confidence bands
will be returned.  If bstrap is \code{FALSE}, then analytical
standard errors are reported.}

\item{cband}{Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability \code{1-alp}.  In order to compute uniform confidence
bands, \code{bstrap} must also be set to \code{TRUE}.  The default is
\code{TRUE}.}

\item{biters}{The number of bootstrap iterations to use.  The default is 1000,
and this is only applicable if \code{bstrap=TRUE}.}

\item{clustervars}{A vector of variables names to cluster on.  At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when \code{bstrap=TRUE}).}

\item{est_method}{the method to compute group-time average treatment effects.  The default is "dr" which uses the doubly robust
approach in the \code{DRDID} package.  Other built-in methods
include "ipw" for inverse probability weighting and "reg" for
first step regression estimators.  The user can also pass their
own function for estimating group time average treatment
effects.  This should be a function
\code{f(Y1,Y0,treat,covariates)} where \code{Y1} is an
\code{n} x \code{1} vector of outcomes in the post-treatment
outcomes, \code{Y0} is an \code{n} x \code{1} vector of
pre-treatment outcomes, \code{treat} is a vector indicating
whether or not an individual participates in the treatment,
and \code{covariates} is an \code{n} x \code{k} matrix of
covariates.  The function should return a list that includes
\code{ATT} (an estimated average treatment effect), and
\code{inf.func} (an \code{n} x \code{1} influence function).
The function can return other things as well, but these are
the only two that are required. \code{est_method} is only used
if covariates are included.}

\item{base_period}{Whether to use a "varying" base period or a
"universal" base period.  Either choice results in the same
post-treatment estimates of ATT(g,t)'s.  In pre-treatment
periods, using a varying base period amounts to computing a
pseudo-ATT in each treatment period by comparing the change
in outcomes for a particular group relative to its comparison
group in the pre-treatment periods (i.e., in pre-treatment
periods this setting computes changes from period t-1 to period
t, but repeatedly changes the value of t)

A universal base period fixes the base period to always be
(g-anticipation-1).  This does not compute
pseudo-ATT(g,t)'s in pre-treatment periods, but rather
reports average changes in outcomes from period t to
(g-anticipation-1) for a particular group relative to its comparison
group.  This is analogous to what is often reported in event
study regressions.

Using a varying base period results in an estimate of
ATT(g,t) being reported in the period immediately before
treatment.  Using a universal base period normalizes the
estimate in the period right before treatment (or earlier when
the user allows for anticipation) to be equal to 0, but one
extra estimate in an earlier period.}

\item{faster_mode}{This option enables a faster version of \code{did}, optimizing
computation time for large datasets by improving data management within the package.
The default is set to \code{FALSE}. While the difference is minimal for small datasets,
it is recommended for use with large datasets.}

\item{print_details}{Whether or not to show details/progress of computations.
Default is \code{FALSE}.}

\item{pl}{Whether or not to use parallel processing}

\item{cores}{The number of cores to use for parallel processing}
}
\value{
an \code{\link{MP}} object containing all the results for group-time average
treatment effects
}
\description{
\code{att_gt} computes average treatment effects in DID
setups where there are more than two periods of data and allowing for
treatment to occur at different points in time and allowing for
treatment effect heterogeneity and dynamics.
See Callaway and Sant'Anna (2021) for a detailed description.
}
\section{Examples:}{
\strong{Basic \code{\link[=att_gt]{att_gt()}} call:}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# Example data
data(mpdta)
set.seed(09152024)
out1 <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=NULL,
               data=mpdta)
summary(out1)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = NULL, data = mpdta)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95\% Simult.  Conf. Band]  
#>   2004 2004  -0.0105     0.0258       -0.0809      0.0599  
#>   2004 2005  -0.0704     0.0341       -0.1635      0.0227  
#>   2004 2006  -0.1373     0.0384       -0.2423     -0.0322 *
#>   2004 2007  -0.1008     0.0354       -0.1976     -0.0040 *
#>   2006 2004   0.0065     0.0235       -0.0578      0.0708  
#>   2006 2005  -0.0028     0.0192       -0.0554      0.0499  
#>   2006 2006  -0.0046     0.0184       -0.0548      0.0456  
#>   2006 2007  -0.0412     0.0207       -0.0977      0.0153  
#>   2007 2004   0.0305     0.0161       -0.0135      0.0746  
#>   2007 2005  -0.0027     0.0157       -0.0456      0.0401  
#>   2007 2006  -0.0311     0.0184       -0.0815      0.0193  
#>   2007 2007  -0.0261     0.0176       -0.0741      0.0220  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.16812
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust
}\if{html}{\out{</div>}}

\strong{Using covariates:}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{out2 <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=~lpop,
               data=mpdta)
summary(out2)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = ~lpop, data = mpdta)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95\% Simult.  Conf. Band]  
#>   2004 2004  -0.0145     0.0222       -0.0737      0.0446  
#>   2004 2005  -0.0764     0.0303       -0.1570      0.0041  
#>   2004 2006  -0.1404     0.0382       -0.2421     -0.0388 *
#>   2004 2007  -0.1069     0.0358       -0.2021     -0.0117 *
#>   2006 2004  -0.0005     0.0231       -0.0618      0.0609  
#>   2006 2005  -0.0062     0.0188       -0.0561      0.0437  
#>   2006 2006   0.0010     0.0204       -0.0534      0.0553  
#>   2006 2007  -0.0413     0.0210       -0.0971      0.0145  
#>   2007 2004   0.0267     0.0140       -0.0104      0.0639  
#>   2007 2005  -0.0046     0.0170       -0.0498      0.0407  
#>   2007 2006  -0.0284     0.0187       -0.0782      0.0213  
#>   2007 2007  -0.0288     0.0161       -0.0715      0.0140  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.23267
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust
}\if{html}{\out{</div>}}

\strong{Specify comparison units:}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{out3 <- att_gt(yname="lemp",
               tname="year",
               idname="countyreal",
               gname="first.treat",
               xformla=~lpop,
               control_group = "notyettreated",
               data=mpdta)
summary(out3)
#> 
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal", 
#>     gname = "first.treat", xformla = ~lpop, data = mpdta, control_group = "notyettreated")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95\% Simult.  Conf. Band]  
#>   2004 2004  -0.0212     0.0217       -0.0788      0.0365  
#>   2004 2005  -0.0816     0.0324       -0.1676      0.0044  
#>   2004 2006  -0.1382     0.0368       -0.2361     -0.0403 *
#>   2004 2007  -0.1069     0.0344       -0.1984     -0.0154 *
#>   2006 2004  -0.0075     0.0233       -0.0693      0.0544  
#>   2006 2005  -0.0046     0.0184       -0.0533      0.0442  
#>   2006 2006   0.0087     0.0182       -0.0397      0.0570  
#>   2006 2007  -0.0413     0.0205       -0.0956      0.0130  
#>   2007 2004   0.0269     0.0136       -0.0091      0.0630  
#>   2007 2005  -0.0042     0.0153       -0.0448      0.0364  
#>   2007 2006  -0.0284     0.0191       -0.0792      0.0223  
#>   2007 2007  -0.0288     0.0167       -0.0732      0.0157  
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.23326
#> Control Group:  Not Yet Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust
}\if{html}{\out{</div>}}
}

\references{
Callaway, Brantly and Pedro H.C. Sant'Anna.  \"Difference-in-Differences with Multiple Time Periods.\" Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. \doi{10.1016/j.jeconom.2020.12.001}, \url{https://arxiv.org/abs/1803.09015}
}
