% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/temporal_forest.R
\name{temporal_forest}
\alias{temporal_forest}
\title{Temporal Forest for Longitudinal Feature Selection}
\usage{
temporal_forest(
  X = NULL,
  Y,
  id,
  time,
  dissimilarity_matrix = NULL,
  n_features_to_select = 10,
  min_module_size = 4,
  n_boot_screen = 50,
  keep_fraction_screen = 0.25,
  n_boot_select = 100,
  alpha_screen = 0.2,
  alpha_select = 0.05
)
}
\arguments{
\item{X}{A list of numeric matrices, one for each time point. The rows of each
matrix should be subjects and columns should be predictors. Required unless
\code{dissimilarity_matrix} is provided.}

\item{Y}{A numeric vector for the longitudinal outcome.}

\item{id}{A vector of subject identifiers.}

\item{time}{A vector of time point indicators.}

\item{dissimilarity_matrix}{An optional pre-computed dissimilarity matrix (e.g., \code{1 - TOM}).
If provided, the network construction step (Stage 1) is skipped. The matrix must be
square with predictor names as rownames and colnames. Defaults to \code{NULL}.}

\item{n_features_to_select}{The number of top features to return in the final selection.
This is passed to the \code{number_selected_final} argument of the internal function.
Defaults to 10.}

\item{min_module_size}{The minimum number of features in a module. Passed to the
\code{minClusterSize} argument of the internal function. Defaults to 4.}

\item{n_boot_screen}{The number of bootstrap repetitions for the initial screening
stage within modules. Defaults to 50.}

\item{keep_fraction_screen}{The proportion of features to keep from each module
during the screening stage. Defaults to 0.25.}

\item{n_boot_select}{The number of bootstrap repetitions for the final stability
selection stage. Defaults to 100.}

\item{alpha_screen}{The significance level for splitting in the screening stage trees.
Defaults to 0.2.}

\item{alpha_select}{The significance level for splitting in the selection stage trees.
Defaults to 0.05.}
}
\value{
An object of class \code{TemporalForest} with:
\itemize{
\item \code{top_features} (\strong{character}): the K selected features in
descending stability order.
\item \code{candidate_features} (\strong{character}): all features that
entered the final (second-stage) selection.
}
}
\description{
The main user-facing function for the \code{TemporalForest} package. It performs the
complete three-stage algorithm to select a top set of features from
high-dimensional longitudinal data.
}
\details{
The function executes a three-stage process:
\enumerate{
\item \strong{Time-Aware Module Construction:} Builds a consensus network across time points to identify modules of stably co-correlated features.
\item \strong{Within-Module Screening:} Uses bootstrapped mixed-effects model trees (\code{glmertree}) to screen for important predictors within each module.
\item \strong{Stability Selection:} Performs a final stability selection step on the surviving features to yield a reproducible final set.
}

\strong{Unbalanced Panels:} The algorithm is robust to unbalanced panel data (i.e., subjects with missing time points). The consensus TOM is constructed using the time points available, and the mixed-effects models naturally handle missing observations.

\strong{Outcome Family:} The current version is designed for \strong{Gaussian (continuous) outcomes}, as it relies on \code{glmertree::lmertree}. Support for other outcome families is not yet implemented.

\strong{Reproducibility (Determinism):} For reproducible results, it is recommended to set a seed using \code{set.seed()} before running. The algorithm has both stochastic and deterministic components:
\itemize{
\item \strong{Stochastic} (depends on \code{set.seed()}): The bootstrap resampling of subjects in both the screening and selection stages.
\item \strong{Deterministic} (does not depend on \code{set.seed()}): The network construction process (correlation, adjacency, and TOM calculation).
}
}
\note{
The current API does not expose selection probabilities, module labels,
or a parameter snapshot; these may be added in a future version.
}
\section{Input contract}{

\itemize{
\item \strong{X}: list of numeric matrices, one per time point; \strong{columns (names and
order) must be identical across all time points}. The function does not
reorder or reconcile columns.
\item \strong{Row order / binding rule}: when rows from \code{X} are stacked internally,
they are assumed to already be in \strong{subject-major × time-minor} order in
the user's data. The function does \strong{not} re-order subjects or time.
\item \strong{Y, id, time}: vectors of equal length. \code{id} and \code{time} may be
integer/character/factor; \code{time} is coerced to a numeric sequence
via \code{as.numeric(as.factor(time))}.
\item \strong{Missing values}: this function does \strong{not} perform NA filtering or
imputation. Users should pre-clean the data (e.g., \code{keep <- complete.cases(Y,id,time)}).
}
}

\section{Unbalanced panels}{

Missing time points per subject are allowed \strong{provided the user supplies
\code{X}, \code{Y}, \code{id}, \code{time} that already align under the binding rule above}.
Stage 1 builds a TOM at the feature level for each available time-point
matrix; the \strong{consensus TOM} is the element-wise minimum across time points.
Subject-level missingness at a given time does not prevent feature-wise
similarity from being computed at other times. This function does not perform
any subject-level alignment across time.
}

\section{Outcome family}{

Current version targets \strong{Gaussian} outcomes via \code{glmertree::lmertree}.
Other families (e.g., binomial/Poisson) are not supported in this version.
}

\section{Stability selection and thresholds}{

Final selection is \strong{top-K} by bootstrap frequency (K = \code{n_features_to_select}).
A probability cutoff (e.g., \code{pi_thr}) is \strong{not} used and selection
probabilities are \strong{not returned} in the current API.
}

\section{Reproducibility (determinism)}{

\itemize{
\item \strong{Stochastic} (affected by \code{set.seed()}): bootstrap resampling and tree
partitioning.
\item \strong{Deterministic}: correlation/adjacency/TOM and consensus-TOM given fixed inputs.
}
}

\section{Internal validation}{

An internal helper \code{\link{check_temporal_consistency}} is called
automatically at the start (whenever \code{dissimilarity_matrix} is \code{NULL}).
It throws an error if column names across time points are not identical
(names and order).
}

\examples{
\donttest{
# Tiny demo: selects V1, V2, V3 quickly (skips Stage 1 via precomputed A)
set.seed(11)
n_subjects <- 60; n_timepoints <- 2; p <- 20
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)
X_long <- do.call(rbind, X)
id   <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)
u <- rnorm(n_subjects, 0, 0.7)
eps <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[,"V1"] + 3.5*X_long[,"V2"] + 3.2*X_long[,"V3"] + rep(u, each = n_timepoints) + eps
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))
fit <- temporal_forest(
  X, Y, id, time,
  dissimilarity_matrix = A,
  n_features_to_select = 3,
  n_boot_screen = 6, n_boot_select = 18,
  keep_fraction_screen = 1, min_module_size = 2,
  alpha_screen = 0.5, alpha_select = 0.6
)
print(fit$top_features)
}
}
\references{
Shao, S., Moore, J.H., Ramirez, C.M. (2025). Network-Guided
Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data.
\emph{Journal of Statistical Software}.
}
\seealso{
\code{\link{select_soft_power}}, \code{\link{calculate_fs_metrics_cv}},
\code{\link{calculate_pred_metrics_cv}}, \code{\link{check_temporal_consistency}}
}
\author{
Sisi Shao, Jason H. Moore, Christina M. Ramirez
}
