% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ard_categorical.R
\name{ard_categorical}
\alias{ard_categorical}
\alias{ard_categorical.data.frame}
\title{Categorical ARD Statistics}
\usage{
ard_categorical(data, ...)

\method{ard_categorical}{data.frame}(
  data,
  variables,
  by = dplyr::group_vars(data),
  strata = NULL,
  statistic = everything() ~ c("n", "p", "N"),
  denominator = "column",
  fmt_fun = NULL,
  stat_label = everything() ~ default_stat_labels(),
  fmt_fn = deprecated(),
  ...
)
}
\arguments{
\item{data}{(\code{data.frame})\cr
a data frame}

\item{...}{Arguments passed to methods.}

\item{variables}{(\code{\link[dplyr:dplyr_tidy_select]{tidy-select}})\cr
columns to include in summaries. Default is \code{everything()}.}

\item{by, strata}{(\code{\link[dplyr:dplyr_tidy_select]{tidy-select}})\cr
columns to use for grouping or stratifying the table output.
Arguments are similar, but with an important distinction:

\code{by}: results are tabulated by \strong{all combinations} of the columns specified,
including unobserved combinations and unobserved factor levels.

\code{strata}: results are tabulated by \strong{all \emph{observed} combinations} of the
columns specified.

Arguments may be used in conjunction with one another.}

\item{statistic}{(\code{\link[=syntax]{formula-list-selector}})\cr
a named list, a list of formulas,
or a single formula where the list element one or more of  \code{c("n", "N", "p", "n_cum", "p_cum")}
(on the RHS of a formula).}

\item{denominator}{(\code{string}, \code{data.frame}, \code{integer})\cr
Specify this argument to change the denominator,
e.g. the \code{"N"} statistic. Default is \code{'column'}. See below for details.}

\item{fmt_fun}{(\code{\link[=syntax]{formula-list-selector}})\cr
a named list, a list of formulas,
or a single formula where the list element is a named list of functions
(or the RHS of a formula),
e.g. \verb{list(mpg = list(mean = \\(x) round(x, digits = 2) |> as.character()))}.}

\item{stat_label}{(\code{\link[=syntax]{formula-list-selector}})\cr
a named list, a list of formulas, or a single formula where
the list element is either a named list or a list of formulas defining the
statistic labels, e.g. \code{everything() ~ list(n = "n", p = "pct")} or
\code{everything() ~ list(n ~ "n", p ~ "pct")}.}

\item{fmt_fn}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}}
}
\value{
an ARD data frame of class 'card'
}
\description{
Compute Analysis Results Data (ARD) for categorical summary statistics.
}
\section{Denominators}{

By default, the \code{ard_categorical()} function returns the statistics \code{"n"}, \code{"N"}, and
\code{"p"}, where little \code{"n"} are the counts for the variable levels, and big \code{"N"} is
the number of non-missing observations. The default calculation for the
percentage is merely \code{p = n/N}.

However, it is sometimes necessary to provide a different \code{"N"} to use
as the denominator in this calculation. For example, in a calculation
of the rates of various observed adverse events, you may need to update the
denominator to the number of enrolled subjects.

In such cases, use the \code{denominator} argument to specify a new definition
of \code{"N"}, and subsequently \code{"p"}.
The argument expects one of the following inputs:
\itemize{
\item a string: one of \code{"column"}, \code{"row"}, or \code{"cell"}.
\itemize{
\item \code{"column"}, the default, returns percentages where the sum is equal to
one within the variable after the data frame has been subset with \code{by}/\code{strata}.
\item \code{"row"} gives 'row' percentages where \code{by}/\code{strata} columns are the 'top'
of a cross table, and the variables are the rows. This is well-defined
for a single \code{by} or \code{strata} variable, and care must be taken when there
are more to ensure the the results are as you expect.
\item \code{"cell"} gives percentages where the denominator is the number of non-missing
rows in the source data frame.
}
\item a data frame. Any columns in the data frame that overlap with the \code{by}/\code{strata}
columns will be used to calculate the new \code{"N"}.
\item an integer. This single integer will be used as the new \code{"N"}
\item a structured data frame. The data frame will include columns from \code{by}/\code{strata}.
The last column must be named \code{"...ard_N..."}. The integers in this column will
be used as the updated \code{"N"} in the calculations.
}

Lastly, when the \code{p} statistic is returned, the proportion is returned---bounded by \verb{[0, 1]}.
However, the default function to format the statistic scales the proportion by 100
and the percentage is returned which matches the default statistic label of \code{'\%'}.
To get the formatted values, pass the ARD to \code{apply_fmt_fun()}.
}

\section{Other Statistics}{

In some cases, you may need other kinds of statistics for categorical variables.
Despite the name, \code{ard_continuous()} can be used to obtain these statistics.

In the example below, we calculate the mode of a categorical variable.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{get_mode <- function(x) \{
  table(x) |> sort(decreasing = TRUE) |> names() |> getElement(1L)
\}

ADSL |>
  ard_continuous(
    variables = AGEGR1,
    statistic = list(AGEGR1 = list(mode = get_mode))
  )
#> \{cards\} data frame: 1 x 8
#>   variable   context stat_name stat_label  stat fmt_fun
#> 1   AGEGR1 continuo…      mode       mode 65-80    <fn>
#> i 2 more variables: warning, error
}\if{html}{\out{</div>}}
}

\examples{
ard_categorical(ADSL, by = "ARM", variables = "AGEGR1")

ADSL |>
  dplyr::group_by(ARM) |>
  ard_categorical(
    variables = "AGEGR1",
    statistic = everything() ~ "n"
  )
}
