% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/prepare.R
\name{brs_prep}
\alias{brs_prep}
\title{Pre-process analyst data for beta interval regression}
\usage{
brs_prep(
  data,
  y = "y",
  delta = "delta",
  left = "left",
  right = "right",
  ncuts = 100L,
  lim = 0.5
)
}
\arguments{
\item{data}{Data frame containing the response variable and
covariates.}

\item{y}{Character: name of the score column (default \code{"y"}).}

\item{delta}{Character: name of the censoring indicator column
(default \code{"delta"}). Values must be in \code{{0, 1, 2, 3}}.}

\item{left}{Character: name of the left-endpoint column
(default \code{"left"}).}

\item{right}{Character: name of the right-endpoint column
(default \code{"right"}).}

\item{ncuts}{Integer: number of scale categories (default 100).}

\item{lim}{Numeric: half-width of the uncertainty region
(default 0.5). Used only when constructing intervals from \code{y}
alone.}
}
\value{
A \code{data.frame} with the following columns appended or
  replaced:
  \describe{
    \item{\code{left}}{Lower endpoint on \eqn{(0, 1)}.}
    \item{\code{right}}{Upper endpoint on \eqn{(0, 1)}.}
    \item{\code{yt}}{Midpoint approximation on \eqn{(0, 1)}.}
    \item{\code{y}}{Original scale value (preserved for reference).}
    \item{\code{delta}}{Censoring indicator: 0 = exact, 1 = left,
      2 = right, 3 = interval.}
  }
  Covariate columns are preserved.
  The output carries attributes \code{"is_prepared"} (\code{TRUE}),
  \code{"ncuts"} and \code{"lim"} so that
  \code{\link{brs}} can detect prepared data and skip the
  internal \code{\link{brs_check}} call.
}
\description{
Validates and transforms raw data into the format required by
\code{\link{brs}}.
The analyst can supply data in several ways:

\enumerate{
  \item \strong{Minimal (Mode 1)}: only the score \code{y}.
    Censoring is inferred automatically:
    \eqn{y = 0 \to \delta = 1}, \eqn{y = K \to \delta = 2},
    \eqn{0 < y < K \to \delta = 3},
    \eqn{y \in (0, 1) \to \delta = 0}.
  \item \strong{Classic (Mode 2)}: \code{y} + explicit
    \code{delta}. The analyst declares the censoring type;
    interval endpoints are computed using the actual \code{y}
    value.
  \item \strong{Interval (Mode 3)}: \code{left} and/or
    \code{right} columns (on the original scale). Censoring is
    inferred from the NA pattern.
  \item \strong{Full (Mode 4)}: \code{y}, \code{left}, and
    \code{right} together. The analyst's own endpoints are
    rescaled directly to \eqn{(0, 1)}.
}

All covariate columns are preserved unchanged in the output.
}
\details{
\strong{Priority rule}: if \code{delta} is provided (non-\code{NA}),
it takes precedence over all automatic classification rules.
When \code{delta} is \code{NA}, the function infers the censoring type
from the pattern of \code{left}, \code{right}, and \code{y}:

\tabular{llllll}{
  \code{left} \tab \code{right} \tab \code{y} \tab \code{delta}
  \tab Interpretation \tab Inferred \eqn{\delta} \cr
  \code{NA}   \tab  5  \tab \code{NA} \tab \code{NA}
  \tab Left-censored (below 5) \tab 1 \cr
  20          \tab \code{NA} \tab \code{NA} \tab \code{NA}
  \tab Right-censored (above 20) \tab 2 \cr
  30          \tab 45  \tab \code{NA} \tab \code{NA}
  \tab Interval-censored [30, 45] \tab 3 \cr
  \code{NA}   \tab \code{NA} \tab 50 \tab \code{NA}
  \tab Exact observation \tab 0 \cr
  \code{NA}   \tab \code{NA} \tab 50 \tab 3
  \tab Analyst says interval \tab 3 \cr
  \code{NA}   \tab \code{NA} \tab 0  \tab 1
  \tab Analyst says left-censored \tab 1 \cr
  \code{NA}   \tab \code{NA} \tab 99 \tab 2
  \tab Analyst says right-censored \tab 2 \cr
}

When \code{y}, \code{left}, and \code{right} are all present for the
same observation, the analyst's \code{left}/\code{right} values are
used directly (rescaled by \eqn{K =} \code{ncuts}) and \code{delta}
is set to 3 (interval-censored) unless the analyst supplied
\code{delta} explicitly.

\strong{Endpoint formulas for Mode 2 (y + explicit delta)}:

When the analyst supplies \code{delta} explicitly, the endpoint
computation uses the actual \code{y} value to produce
observation-specific bounds.  This is the same logic used by
\code{\link{brs_check}} with a user-supplied \code{delta}
vector:

\tabular{llll}{
  \eqn{\delta} \tab Condition \tab \eqn{l_i} (left)
    \tab \eqn{u_i} (right) \cr
  0 \tab (any) \tab \eqn{y / K} \tab \eqn{y / K} \cr
  1 \tab \eqn{y = 0} \tab \eqn{\epsilon}
    \tab \eqn{\mathrm{lim} / K} \cr
  1 \tab \eqn{y \neq 0} \tab \eqn{\epsilon}
    \tab \eqn{(y + \mathrm{lim}) / K} \cr
  2 \tab \eqn{y = K} \tab \eqn{(K - \mathrm{lim}) / K}
    \tab \eqn{1 - \epsilon} \cr
  2 \tab \eqn{y \neq K} \tab \eqn{(y - \mathrm{lim}) / K}
    \tab \eqn{1 - \epsilon} \cr
  3 \tab type \code{"m"} \tab \eqn{(y - \mathrm{lim}) / K}
    \tab \eqn{(y + \mathrm{lim}) / K} \cr
}

\strong{Consistency warnings}: when the analyst supplies \code{delta}
values that are unusual for the given \code{y} (e.g.,
\eqn{\delta = 1} but \eqn{y \neq 0}), the function emits a warning
but proceeds.  This is by design for Monte Carlo workflows where
forced delta on non-boundary observations is intentional.

All endpoints are clamped to \eqn{[\epsilon, 1 - \epsilon]} with
\eqn{\epsilon = 10^{-5}}.
}
\examples{
# --- Mode 1: y only (automatic classification, like brs_check) ---
d1 <- data.frame(y = c(0, 3, 5, 7, 10), x1 = rnorm(5))
brs_prep(d1, ncuts = 10)

# --- Mode 2: y + explicit delta ---
d2 <- data.frame(
  y = d1$y,
  delta = c(0, 3, 3, 3, 0), # Force interval-censoring for 3,5,7
  x1 = d1$x1
)
brs_prep(d2, ncuts = 100)

# --- Mode 3: left/right with NA patterns ---
d3 <- data.frame(
  left = c(NA, 20, 30, NA),
  right = c(5, NA, 45, NA),
  y = c(NA, NA, NA, 50),
  x1 = d1$x1[1:4]
)
brs_prep(d3, ncuts = 100)

# --- Mode 4: y + left + right (analyst-supplied intervals) ---
d4 <- data.frame(
  y = c(50, 75),
  left = c(48, 73),
  right = c(52, 77),
  x1 = rnorm(2)
)
brs_prep(d4, ncuts = 100)

# --- Fitting after prep ---
\donttest{
dat5 <- data.frame(
  y = c(
    0, 5, 20, 50, 75, 90, 100, 30, 60, 45,
    10, 40, 55, 70, 85, 25, 35, 65, 80, 15
  ),
  x1 = rep(c(1, 2), 10)
)
prep5 <- brs_prep(dat5, ncuts = 100)
fit5 <- brs(y ~ x1, data = prep5)
summary(fit5)
}

}
\references{
Lopes, J. E. (2023). \emph{Modelos de regressao beta para dados de escala}.
Master's dissertation, Universidade Federal do Parana, Curitiba.
URI: \url{https://hdl.handle.net/1884/86624}.

Hawker, G. A., Mian, S., Kendzerska, T., and French, M. (2011).
Measures of adult pain: Visual Analog Scale for Pain (VAS Pain),
Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ),
Short-Form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale
(CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and Measure of
Intermittent and Constant Osteoarthritis Pain (ICOAP).
Arthritis Care and Research, 63(S11), S240-S252.
\doi{10.1002/acr.20543}

Hjermstad, M. J., Fayers, P. M., Haugen, D. F., et al. (2011).
Studies comparing Numerical Rating Scales, Verbal Rating Scales, and
Visual Analogue Scales for assessment of pain intensity in adults:
a systematic literature review.
Journal of Pain and Symptom Management, 41(6), 1073-1093.
\doi{10.1016/j.jpainsymman.2010.08.016}
}
\seealso{
\code{\link{brs_check}} for the automatic
  classification of raw scale scores;
  \code{\link{brs}} for fitting the model.
}
