
An R package for Plasmodium vivax molecular correction via statistical genetic inference of
The core function, compute_posterior(), computes
per-person posterior probabilities of relapse, recrudescence, and
reinfection (recurrence states) using P. vivax genetic data on
two or more episodes. To better understand the core function, in
addition to this README, we recommend reading - documentation accessed
via ?compute_posterior() - the vignette “Demonstrate Pv3Rs
usage” accessed via vignette("demonstrate-usage", "Pv3Rs")
- Understand
posterior probabilities - the article “Pv3Rs: Plasmodium vivax
relapse, recrudescence, and reinfection statistical genetic inference”
published in Bioinformatics
Two other important features:
plot_data() visualises genetic data for molecular
correction, regardless of the analytical method (e.g., it can be used to
visualise Plasmodium falciparum data intended for analysis
using a WHO match-counting algorithm).
plot_simplex() can be used to visualise
per-recurrence probabilities of relapse, recrudescence, and reinfection
(or any other probability triplet summing to one).
The statistical model is described in the supplement of the Bioinformatics article “Pv3Rs: Plasmodium vivax relapse, recrudescence, and reinfection statistical genetic inference”. It builds on the prototype that features in Taylor & Watson et al. 2019.
Genetic data are modelled using a Bayesian model, whose prior is ideally informative (in Taylor & Watson et al. 2019 priors were generated by a time-to-event model built by James Watson) because the cause of recurrent P. vivax malaria is not always identifiable from genetic data alone: when the data are consistent with recurrent parasites that are relatively unrelated to those in all preceding infections, both reinfection and relapse are plausible; meanwhile, when the data are compatible with recurrent parasites that are clones of those in the preceding infection, both recrudescence and relapse are plausible.
The main Pv3Rs function, compute_posterior(), could
be applied to P. falciparum by setting the prior probability of
relapse to zero, but genotyping errors, which are not accounted for
under the current Pv3Rs model, are liable to lead to the
misclassification of recrudescence as reinfection when the prior
probability of relapse is zero (and of recrudescence as relapse when the
prior probability of relapse exceeds zero). A post hoc diagnostic to
identify misclassified recrudescence is presented in Understand
genotyping errors.
As with any model, Pv3Rs makes various assumptions that limit its capabilities in some settings.
Recurrence states are modelled as mutually exclusive, suitable for studies where participants are actively followed up frequently and where all detected infections are treated to the extent that parasitaemia drops below some detectable level before recurrence if it occurs. In studies with untreated or accumulated infections, outputs may not be meaningful.
We do not model all the complexities around molecular correction. For example, population structure, including household effects; failure to capture low-density clones in a blood sample of limited volume [Snounou & Beck, 1998]; and hidden biomass the spleen and bone marrow [Markus, 2019]. Users must interpret outputs in light of these limitations and in the context of the study and its methods. For example, we expect Pv3Rs to output probable relapse if a person is reinfected by a new mosquito but with parasites that are recently related to those that caused a previous infection, as might happen in household transmission chains.
Relapsing parasites that are siblings of parasites in previous infections can be meiotic, parent-child-like, regular or half siblings, but we model all sibling parasites as regular siblings via the following assumptions:
In our experience, half sibling misspecification leads to some misclassification of relapses as reinfections; see Understand half-sibling misspecification and Understand posterior probabilities, where half siblings lead to probabilities that behaviour erratically with increasing marker counts. A descriptive study to explore the extent of half-sibling misspecification is recommended.
We do not model undetected alleles, other genotyping errors, or de novo mutations. Recrudescent parasites are modelled as perfect clones under Pv3Rs. As such, the posterior probability of recrudescence is rendered zero by errors and mutations. This becomes more likely when there are data on more markers. Understand genotyping errors explores the impact of errors and mutations on recurrence state probabilities.
When genetic data alone are insufficient to distinguish between recrudescence and relapse (or reinfection and relapse), the posterior probabilities of recrudescence and relapse (or reinfection and relapse) are heavily influenced by our a priori uniform assumption over relationship graphs; see Understand graph-prior ramifications. The development of a more biologically-principled generative model on parasite relationships is merited.
| Limitation | Reason |
|---|---|
| Possible misclassification of persistent and/or accumulated states | Modelling recurrent states as mutually exclusive |
| Possible inconsistency with data on more-and-more markers | Not modelling errors |
| Possible misclassification of relapse as reinfection | Half-sibling misspecification |
| Possible misclassification of recrudescence as relapse | Not modelling errors |
| Possible misclassification of reinfection | Not modelling population structure |
| Strong prior impact on posterior | Recurrent states are not always identifiable from genetic data alone |
Pv3Rs scales to hundreds of markers but not whole-genome sequence (WGS) data.
We do not recommend running compute_posterior() for
data whose total genotype count (sum of per-episode multiplicities of
infection) exceeds eight. If the total genotype count exceeds eight but
there are multiple recurrences, it might be possible to compute
posterior probabilities by analysing episodes pairwise (this approach
was used in Taylor
& Watson et al. 2019; we’re working currently on an improved
version).
The per-marker allele limit of compute_posterior()
is untested. Very high marker cardinalities could lead to very small
allele frequencies and thus some underflow problems.
In addition to P. vivax allelic data on two or more
episodes, compute_posterior() requires as input
population-level allele frequencies. To minimise bias due to within-host
selection of recrudescent parasites, we recommend using only enrolment
episodes to estimate population-level allele frequencies, and ideally
enrolment episodes from study participants selected at random, not only
study participants who experience recurrence. That said, if there is
strong prior reason to believe most recurrences are either reinfections
or relapses, both of which are draws from the mosquito population
(albeit a delayed draw in the case of a relapse), assuming there is no
systematic within-patient selection (as might occur when infections
encounter lingering drug pressure), estimates based on all episodes
should be unbiased and more precise than those based on enrolment
episodes only.
Unfortunately, the Pv3Rs model does not exploit data on read counts at present. However, read-count data could be used to compute population-level allele frequencies, assuming they are not biased by experimental artefacts.
# Install Pv3Rs from CRAN:
install.packages("Pv3Rs")
# Load and attach Pv3Rs
library(Pv3Rs)
# List links to all available documentation
help(package = "Pv3Rs")
# List links to vignettes
vignette(package = "Pv3Rs")
# View function documentation including examples, e.g.,
?compute_posterior
#===============================================================================
# To install the development version of Pv3Rs:
#===============================================================================
# Doing this in RStudio ensures pandoc, required for vignette building, is
# installed. If you're not working in RStudio, you might need to install pandoc
# and check its path (or set build_vignettes = FALSE)
install.packages("devtools") # Install or update devtools from CRAN
devtools::install_github("aimeertaylor/Pv3Rs", build_vignettes = TRUE)