% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/setup-import.R
\name{import_src}
\alias{import_src}
\alias{import_src.src_cfg}
\alias{import_src.aumc_cfg}
\alias{import_src.character}
\alias{import_tbl}
\alias{import_tbl.tbl_cfg}
\title{Data import utilities}
\usage{
import_src(x, ...)

\method{import_src}{src_cfg}(
  x,
  data_dir = src_data_dir(x),
  tables = NULL,
  force = FALSE,
  verbose = TRUE,
  ...
)

\method{import_src}{aumc_cfg}(x, ...)

\method{import_src}{character}(
  x,
  data_dir = src_data_dir(x),
  tables = NULL,
  force = FALSE,
  verbose = TRUE,
  cleanup = FALSE,
  ...
)

import_tbl(x, ...)

\method{import_tbl}{tbl_cfg}(
  x,
  data_dir = src_data_dir(x),
  progress = NULL,
  cleanup = FALSE,
  ...
)
}
\arguments{
\item{x}{Object specifying the source configuration}

\item{...}{Passed to downstream methods (finally to
\link[readr:read_delim]{readr::read_csv}/\link[readr:read_delim_chunked]{readr::read_csv_chunked})/generic consistency}

\item{data_dir}{Destination directory where the downloaded data is written
to.}

\item{tables}{Character vector specifying the tables to download. If
\code{NULL}, all available tables are downloaded.}

\item{force}{Logical flag; if \code{TRUE}, existing data will be re-downloaded}

\item{verbose}{Logical flag indicating whether to print progress information}

\item{cleanup}{Logical flag indicating whether to remove raw csv files after
conversion to fst}

\item{progress}{Either \code{NULL} or a progress bar as created by
\code{\link[progress:progress_bar]{progress::progress_bar()}}}
}
\value{
Called for side effects and returns \code{NULL} invisibly.
}
\description{
Making a dataset available to \code{ricu} consists of 3 steps: downloading
(\code{\link[=download_src]{download_src()}}), importing (\code{\link[=import_src]{import_src()}}) and attaching
(\code{\link[=attach_src]{attach_src()}}). While downloading and importing are one-time procedures,
attaching of the dataset is repeated every time the package is loaded.
Briefly, downloading loads the raw dataset from the internet (most likely
in \code{.csv} format), importing consists of some preprocessing to make the
data available more efficiently and attaching sets up the data for use by
the package.
}
\details{
In order to speed up data access operations, \code{ricu} does not directly use
the PhysioNet provided CSV files, but converts all data to \code{\link[fst:fst]{fst::fst()}}
format, which allows for random row and column access. Large tables are
split into chunks in order to keep memory requirements reasonably low.

The one-time step per dataset of data import is fairly resource intensive:
depending on CPU and available storage system, it will take on the order of
an hour to run to completion and depending on the dataset, somewhere
between 50 GB and 75 GB of temporary disk space are required as tables are
uncompressed, in case of partitioned data, rows are reordered and the data
again is saved to a storage efficient format.

The S3 generic function \code{import_src()} performs import of an entire data
source, internally calling the S3 generic function \code{import_tbl()} in order
to perform import of individual tables. Method dispatch is intended to
occur on objects inheriting from \code{src_cfg} and \code{tbl_cfg} respectively. Such
objects can be generated from JSON based configuration files which contain
information such as table names,  column types or row numbers, in order to
provide safety in parsing of \code{.csv} files. For more information on data
source configuration, refer to \code{\link[=load_src_cfg]{load_src_cfg()}}.

Current import capabilities include re-saving a \code{.csv} file to \code{.fst} at
once (used for smaller sized tables), reading a large \code{.csv} file using the
\code{\link[readr:read_delim_chunked]{readr::read_csv_chunked()}} API, while partitioning chunks and reassembling
sub-partitions (used for splitting a large file into partitions), as well
as re-partitioning an already partitioned table according to a new
partitioning scheme. Care has been taken to keep the maximal memory
requirements for this reasonably low, such that data import is feasible on
laptop class hardware.
}
\examples{
\dontrun{

dir <- tempdir()
list.files(dir)

download_src("mimic_demo", dir)
list.files(dir)

import_src("mimic_demo", dir)
list.files(dir)

unlink(dir, recursive = TRUE)

}

}
