% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/csregion.R
\name{csregion}
\alias{csregion}
\title{Filter the data based on common support region}
\usage{
csregion(gps_matrix, borders = "include", refit = TRUE)
}
\arguments{
\item{gps_matrix}{An object of classes \code{gps} and \code{data.frame} (e.g., created
by the \code{estimate_gps()} function). The first column corresponds to the
treatment or grouping variable, while the other columns represent the
treatment assignment probabilities calculated separately for each
hypotetical treatment group. The number of columns should therefore be
equal to the number of unique levels of the treatment variable plus one
(for the treatment variable itself). The number of rows should correspond
to the number of subjects for which generalized propensity scores were
estimated.}

\item{borders}{A character string specifying how to handle observations at
the edges of the Common Support Region (CSR). Acceptable values are
\code{"include"} and \code{"exclude"}. If \code{"include"} is selected (default),
observations with Generalized Propensity Scores (GPS) exactly equal to the
CSR boundaries are retained for further analysis. This corresponds to a
non-strict inequality: \code{lower_bound <= GPS <= upper_bound}. If
\code{"exclude"} is selected, observations lying exactly on the CSR boundaries
are removed. This corresponds to a strict inequality: \code{lower_bound <
  GPS < upper_bound}. Using \code{"exclude"} will typically result in a slightly
smaller matched sample size compared to \code{"include"}, but may be preferred
for more conservative matching.}

\item{refit}{Logical. If \code{TRUE} (default), the model used to estimate the GPS
is refitted after excluding samples outside the common support region,
using the same formula and method as in the original \code{estimate_gps()} call.
If \code{FALSE}, the model is not refitted, but still only samples within the
CSR are retained. Refitting is recommended, as suggested by Lopez and
Gutman (2017).}
}
\value{
A numeric matrix similar to the one returned by \code{estimate_gps()},
but with the number of rows reduced to exclude those observations that do
not fit within the common support region (CSR) boundaries. The returned
object also possesses additional attributes that summarize the calculation
process of the CSR boundaries:
\itemize{
\item \code{filter_matrix} - A logical matrix with the same dimensions as the
gps-part of \code{gps_matrix}, indicating which treatment assignment
probabilities fall within the CSR boundaries,
\item \code{filter_vector} - A vector indicating whether each observation was kept
(\code{TRUE}) or removed (\code{FALSE}), essentially a row-wise
sum of \code{filter_matrix},
\item \code{csr_summary} - A summary of the CSR calculation process, including
details of the boundaries and the number of observations filtered.
\item \code{csr_data} - The original dataset used for the estimation of generalized
propensity scores (\code{original_data} attribute of the \code{gps} object) filtered
by the \code{filter_vector}
}
}
\description{
The \code{csregion()} function estimates the boundaries of the
rectangular common support region, as defined by Lopez and Gutman (2017),
and filters the matrix of generalized propensity scores based on these
boundaries. The function returns a matrix of observations whose generalized
propensity scores lie within the treatment group-specific boundaries.
}
\examples{
# We could estimate simples generalized propensity scores for the `iris`
# dataset
gps <- estimate_gps(Species ~ Sepal.Length, data = iris)

# And then define the common support region boundaries using `csregion()`
gps_csr <- csregion(gps)

# The additional information of the CSR-calculation process are
# accessible through the attributes described in the `*Value*` section
attr(gps_csr, "filter_matrix")
attr(gps_csr, "csr_summary")
attr(gps_csr, "csr_data")

}
