% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/transforms.R
\name{tspTransform}
\alias{tspTransform}
\title{Top-scoring pairs transformation}
\usage{
tspTransform(x, sep = "__", sample, check.names = FALSE)
}
\arguments{
\item{x}{Data.frame to be converted; has to be composed of at least two features of a single type (to be comparable).}

\item{sep}{Separator string used to join original feature names to generate names for transformed features. 
Can be set to \code{NULL} to generate generic names instead, which is faster.}

\item{sample}{A number of features to generate.
If set, the function generates only a random subset out of all possible \eqn{m\cdot (m-1)/2} feature pairs.}

\item{check.names}{Passed to the underlying call to \code{\link{data.frame}}; if set to \code{TRUE}, performs a coercion of feature names.}
}
\value{
A logical \code{data.frame}.
}
\description{
Applies a top-scoring pairs transformation, that is creates \eqn{m\cdot (m-1)/2} logical features, for each two-element subset of original features,
composed of \code{TRUE} when the value of the first is larger or equal than in the second and \code{FALSE} otherwise (first and second here is according to the order of features in input).
}
\details{
This transformation can be used to recreate top-scoring pairs methods using information theory concepts, for instance using \code{\link{MIM}}.
The main gain form TSP is that it is resilient to calibration errors, in particular some sample batch biases, it also generates a robust and parameter-less discrete representation of the continuous input.
It is lossy, however, and the generated scores for feature pairs may be hard for interpretation; the inflation of feature count can also pose practical problems, which is a reason why this function offers a way to efficiently and randomly under-sample the output.

For TSP to work well, it is crucial that input features have approximately identical distribution, so that the output features would have enough entropy to be informative given some decision or when compared with each other; to this end, re-scaling may be required, for instance with \code{\link{scale}}.
}
\note{
\code{NA}s are accepted and treated as incomparable values.
}
\examples{
tspTransform(data.frame(a=1:3,b=1:3,c=rep(2,3)),sep='>=')
#Convering iris data
tspIris<-tspTransform(data.frame(scale(iris[,-5])))
#Feature selection
MIM(tspIris,iris$Species)
}
