% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/createriskset.R
\name{create_riskset}
\alias{create_riskset}
\title{Process and Create Risk Sets for a One- and Two-Mode Relational Event Sequences}
\usage{
create_riskset(
  type = c("two-mode", "one-mode"),
  time,
  eventID,
  sender,
  receiver,
  p_samplingobserved = 1,
  n_controls,
  combine = TRUE,
  seed = 9999
)
}
\arguments{
\item{type}{"two-mode" indicates that this is a two-mode event sequence. "one-mode" indicates that the event sequence is one-mode.}

\item{time}{The vector of event time values from the observed event sequence.}

\item{eventID}{The vector of event IDs from the observed event sequence (typically a numerical event sequence that goes from 1 to \emph{n}).}

\item{sender}{The vector of event senders from the observed event sequence.}

\item{receiver}{The vector of event receivers from the observed event sequence.}

\item{p_samplingobserved}{The numerical value for the probability of selection for sampling from the observed event sequence. Set to 1 by default indicating that all observed events from the event sequence will be included in the post-processing event sequence.}

\item{n_controls}{The numerical value for the number of null event controls for each (sampled) observed event.}

\item{combine}{TRUE/FALSE. TRUE indicates that the post-sampling (processing) event sequence should be merged with the pre-processing dataset. FALSE only returns the post-processing event sequence (that is, only the sampled events).}

\item{seed}{The random number seed for user replication.}
}
\value{
A post-processing data.table object with the following columns:
\itemize{
\item \code{time} - The event time for the sampled and observed events.
\item \code{eventID} - The numerical event sequence ID for the sampled and observed events.
\item \code{sender} - The event senders of the sampled and observed events.
\item \code{receiver} - The event targets (receivers) of the sampled and observed events.
\item \code{observed} - Boolean indicating if the event is an observed or control event. (1 = observed; 0 = control)
\item \code{sampled} - Boolean indicating if the event is sampled or not sampled. (1 = sampled; 0 = not sampled)
}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#stable}{\figure{lifecycle-stable.svg}{options: alt='[Stable]'}}}{\strong{[Stable]}}

This function creates one- and two-mode post-sampling eventset with options for case-control
sampling (Vu et al. 2015) and sampling from the observed event sequence (Lerner and Lomi 2020). Case-control
sampling samples an arbitrary \emph{m} number of controls from the risk set for any event
(Vu et al. 2015). Lerner and Lomi (2020) proposed sampling from the observed event sequence
where observed events are sampled with probability \emph{p}. Importantly, this function generates risk sets
that assume that the risk set for each event is fixed across all time points, that is, all actors active
at any time point across the event sequence are in the set of potential events. Users interested in
generating time-/event-varying risks sets should consult the \code{\link[dream]{processOMEventSeq}} function
for one-mode event sequences and the \code{\link[dream]{processTMEventSeq}} function for two-mode event
sequences. Future versions of the \code{dream} package will incorporate this option into this function in a
principled manner.
}
\details{
This function processes observed events from the set \eqn{E}, where each event \eqn{e_i} is
defined as:
\deqn{e_{i} \in E = (s_i, r_i, t_i, G[E;t])}
where:
\itemize{
\item \eqn{s_i} is the sender of the event.
\item \eqn{r_i} is the receiver of the event.
\item \eqn{t_i} represents the time of the event.
\item \eqn{G[E;t] = \{e_1, e_2, \ldots, e_{t'} \mid t' < t\}} is the network of past events, that is, all events that occurred prior to the current event, \eqn{e_i}.
}

Following Butts (2008) and Butts and Marcum (2017), for one-mode event sequences, the risk (support)
set is defined as all possible  events at time \eqn{t}, \eqn{A_t}, as the full Cartesian
product of prior senders and receivers in the set \eqn{G[E;t]} that could have
occurred at time \eqn{t}. Formally:
\deqn{A_t = \{ (s, r) \mid s \in S \times r \in R\}}
where \eqn{S} is the set of potential event senders and \eqn{R} is the set of potential event receivers. In this function,
the full risk set is considered fixed across all time points.

For two-mode event sequences, the risk (support) set is defined as all possible
events at time \eqn{t}, \eqn{A_t}, as the cross product of two disjoint sets, namely, prior senders and receivers,
in the set \eqn{G[E;t]} that could have occurred at time \eqn{t}. Formally:
\deqn{A_t = \{ (s, r) \mid s \in S \times r \in R\}}
where \eqn{S} is the set of potential event senders and \eqn{R} is the set of potential event receivers. In this function,
the full risk set is considered fixed across all time points.

Case-control sampling maintains the full set of observed events, that is, all events in \eqn{E}, and
samples an arbitrary number \eqn{m} of non-events from the support set \eqn{A_t} (Vu et al. 2015; Lerner
and Lomi 2020). This process generates a new support set, \eqn{SA_t}, for any relational event
\eqn{e_i} contained in \eqn{E} given a network of past events \eqn{G[E;t]}. \eqn{SA_t} is formally defined as:
\deqn{SA_t \subseteq \{ (s, r) \mid s \in S \times r \in R \}}
and in the process of sampling from the observed events, \eqn{n} number of observed events are
sampled from the set \eqn{E} with known probability \eqn{0 < p \le 1}. More formally, sampling from
the observed set generates a new set \eqn{SE \subseteq E}.
}
\examples{

data("WikiEvent2018.first100k")
WikiEvent2018.first100k$time <- as.numeric(WikiEvent2018.first100k$time)
### Creating the EventSet By Employing Case-Control Sampling With M = 10 and
### Sampling from the Observed Event Sequence with P = 0.01
EventSet <- create_riskset(
  type = "two-mode",
  time = WikiEvent2018.first100k$time, # The Time Variable
  eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
  sender = WikiEvent2018.first100k$user, # The Sender Variable
  receiver = WikiEvent2018.first100k$article, # The Receiver Variable
  p_samplingobserved = 0.01, # The Probability of Selection
  n_controls = 10, # The Number of Controls to Sample from the Full Risk Set
  seed = 9999) # The Seed for Replication


### Creating A New EventSet with more observed events and less control events
### Sampling from the Observed Event Sequence with P = 0.02
### Employing Case-Control Sampling With M = 2
EventSet1 <- create_riskset(
  type = "two-mode",
  time = WikiEvent2018.first100k$time, # The Time Variable
  eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
  sender = WikiEvent2018.first100k$user, # The Sender Variable
  receiver = WikiEvent2018.first100k$article, # The Receiver Variable
  p_samplingobserved = 0.02, # The Probability of Selection
  n_controls = 2, # The Number of Controls to Sample from the Full Risk Set
  seed = 9999) # The Seed for Replication

}
\references{
Butts, Carter T. 2008. "A Relational Event Framework for Social Action." \emph{Sociological Methodology} 38(1): 155-200.

Butts, Carter T. and Christopher Steven Marcum. 2017. "A Relational Event Approach to Modeling Behavioral Dynamics." In A.
Pilny & M. S. Poole (Eds.), \emph{Group processes: Data-driven computational approaches}. Springer International Publishing.

Lerner, Jürgen and Alessandro Lomi. 2020. "Reliability of relational event model estimates under sampling: How to
fit a relational event model to 360 million dyadic events." \emph{Network Science} 8(1): 97–135.

Vu, Duy, Philippa Pattison, and Garry Robins. 2015. "Relational event models for social learning in MOOCs." \emph{Social Networks} 43: 121-135.
}
\author{
Kevin A. Carson \href{mailto:kacarson@arizona.edu}{kacarson@arizona.edu}, Diego F. Leal \href{mailto:dflc@arizona.edu}{dflc@arizona.edu}
}
