The Child Health and Mortality Prevention Surveillance (CHAMPS) network collects valuable information for identifying causes of death across multiple sites in Africa and South Asia. Verbal autopsy (VA) interviews are included in the constellation CHAMPS data, which Pramanik et al. (2015) use to develop a method for calibrating computer-coded algorithms that assign causes of death (CoD) to VA data. Their method is implemented in the R package vacalibration and an instructive introduction can be found in the package’s vignette. VA-Calibration is a natural extension of the work flow for analyzing VA data and thus we have integrated this package into openVA, as illustrated in this vignette.
We begin by attaching the openVA package, which
attaches and prints the versions of the core packages and, if installed,
the optional packages. vacalibration is among the
optional packages and must be installed alongside
openVA, e.g.,
install.packages(c("openVA", "vacalibration"))
library(openVA)
## ────────────────────── Attaching packages for openVA 1.2.0 ─────────────────────
## ✔ InSilicoVA 1.4.2
## ✔ InterVA4 1.7.6
## ✔ InterVA5 1.1.3
## ✔ Tariff 1.0.5
## ── Optional packages (require manual installation if not attached) ─────────────
## ✔ nbc4va 1.2
## ✔ vacalibration 2.1
## ✔ EAVA 1.0.0
In the example code to follow we use the example data set “NeonatesVA5” that contains 200 (simulated) neonate VAs and is included in openVA. These data are simulated from the 2016 WHO VA instrument, and can be loaded with:
data(NeonatesVA5)
dim(NeonatesVA5)
## [1] 200 354
We start by assigning CoDs to the example data set using the
InSilicoVA and InterVA5 algorithms. The simulated data are based on the
2016 WHO VA instrument, which we specify by passing “WHO2016” to the
data.type
parameter.
fit_insilicova <- codeVA(NeonatesVA5, data.type = "WHO2016")
fit_interva <- codeVA(NeonatesVA5, model = "InterVA", version = "5",
HIV = "l", Malaria = "l", write = FALSE)
# omitting messages about the data checks, record processing, and posterior sampling...
Note: In the example code for InterVA5 (shown above), we set
the write
parameter to “FALSE” which will prevent the
function from producing the log file with information about VA records
excluded from the analysis (due to missing data) and the data
consistency checks. In real analyses it is recommended to set
write
to “TRUE” (and provide the path to the
directory
parameter, which is where the log file will be
written).
Before calibrating the results, we must prepare the fitted objects
into the format expected by the vacalibration()
function.
Specifically, we need to prepare a list of data frames that include two
columns: (1) the ID for the individual deaths, and (2) the CoD assigned
by the algorithm. openVA includes a helper function
prepCalibration()
that performs this step for us:
insilicova_prep <- prepCalibration(fit_insilicova)
interva_prep <- prepCalibration(fit_interva)
As we will see below, the vacalibration()
tool can
combine results across algorithms to produce an ensemble of the
cause-specific mortality fractions (CSMFs). To obtain the ensemble
estimate, we simply pass multiple fitted objects from
codeVA()
to the prepCalibration()
function as
follows:
two_fits <- prepCalibration(fit_insilicova, fit_interva)
The results can now be passed to the vacalibration()
function. In the following example, we do this separately for each
algorithm as well as for the combined list needed to produce the
ensemble estimate of the calibrated CSMF. To contain the length of this
vignette, we do not include the diagnostic and summary plots of the
results and we omit the output detailing the posterior sampling.
calib_insilicova = vacalibration::vacalibration(va_data = insilicova_prep,
age_group = "neonate",
country = "Mozambique",
plot_it = FALSE)
calib_interva <- vacalibration::vacalibration(va_data = interva_prep,
age_group = "neonate",
country = "Mozambique",
plot_it = FALSE)
calib_ensemble <- vacalibration::vacalibration(va_data = two_fits,
age_group = "neonate",
country = "Mozambique",
plot_it = FALSE)
# omitting messages about posterior sampling...
openVA implements some basic S3 methods to make your
VA data analysis experience a bit more enjoyable. For example, the basic
print method for a vacalibration
fitted object provides a
quick summary of the posterior sampling, the algorithm(s) included in
the calibration, and the input data.
calib_insilicova
## vacalibration fitted object:
## 10000 iterations performed, with first 5000 iterations discarded.
## 5000 iterations saved after thinning
##
## Results for: insilicova (calibrated): 81 neonate deaths
More useful is the well-known summary()
method that (in
the VA space) prints out a summary of the posterior sampling, the number
of VA records processed by the algorithm, and the ordered CSMF (and
credible intervals where applicable). Users have the option to specify
the number of causes to include in the summary through the
top
parameter. The seasoned VA analysts may notice
different CoDs than what InSilicoVA assigns (i.e., those causes from the
WHO VA cause list). vacalibration()
employs a mapping to a
“broad cause of death” list via the
vacalibration::cause_map()
function (as described in the vacalibration
vignette ).
summary(calib_insilicova, top = 5)
## VA Calibration
## 10000 iterations performed, with first 5000 iterations discarded.
## 5000 iterations saved after thinning
##
## insilicova (calibrated)
## 81 neonate deaths
## Top 5 CSMFs:
##
## cause mean lower upper
## 1 sepsis_meningitis_inf 0.4804 0.3431 0.6358
## 2 pneumonia 0.2211 0.0766 0.3605
## 3 other 0.1235 0.1235 0.1235
## 4 prematurity 0.0718 0.0048 0.1670
## 5 congenital_malformation 0.0617 0.0617 0.0617
It is also worth noting that summary()
returns a list of
objects that can be useful for further analysis; one example is ordered
CSMF returned as a data frame (as shown below).
summ_calib_interva <- summary(calib_interva)
names(summ_calib_interva)
## [1] "nBurn" "nIterations" "nMCMC"
## [4] "nThin" "age_group" "algorithms"
## [7] "n" "show_top" "ensemble"
## [10] "ensemble_algorithms" "uncalibrated" "pcalib_postsumm"
## [13] "interva"
is.data.frame(summ_calib_interva$interva)
## [1] TRUE
summ_calib_interva$interva
## cause mean lower upper
## 1 ipre 0.7453 0.6403 0.8433
## 2 congenital_malformation 0.1207 0.0506 0.2059
## 3 sepsis_meningitis_inf 0.0632 0.0197 0.1264
## 4 pneumonia 0.0364 0.0047 0.0890
## 5 prematurity 0.0213 0.0009 0.0649
When working with a vacalibration
fitted object
involving results from multiple algorithms – e.g., an ensemble
calibration of the CSMF – it is possible to limit the summarized results
to a subset by passing the algorithm name or “ensemble” to the
algorithm
parameter. (The default behavior is to print out
summaries for all inputs, including the “ensemble” if applicable.)
summary(calib_ensemble, algorithm = "ensemble")
## VA Calibration
## 10000 iterations performed, with first 5000 iterations discarded.
## 5000 iterations saved after thinning
##
## Ensemble of: insilicova interva
## Top 5 CSMFs:
##
## cause mean lower upper
## 1 sepsis_meningitis_inf 0.6132 0.3355 0.7660
## 2 ipre 0.1467 0.0220 0.3979
## 3 congenital_malformation 0.0922 0.0507 0.1439
## 4 other 0.0701 0.0701 0.0701
## 5 pneumonia 0.0535 0.0085 0.1305
Visual displays of VA results are available in
openVA through the plotVA()
function.
There are several options to control the layout and type of plot. The
default is a horizontal plot of error bars:
plotVA(calib_insilicova, title = "Vignette results",
xlab = "CoD", ylab = "Proportions")
## $insilicova
vacalibrated
fitted objects include the uncalibrated
results as well. A comparison can be made by setting the
uncalibrated
parameter to “TRUE”. This option is
illustrated below in the form of a basic bar graph with the horizontal
option turned off – i.e., horiz = FALSE
:
plotVA(calib_interva, type = "bar", uncalibrated = TRUE, horiz = FALSE)
## $interva
When working with a vacalibrated
object that includes
results from multiple algorithms, like our ensemble example, it is worth
noting that the plotVA()
function returns a list of
ggplot
objects, one for each algorithm and for the ensemble
if applicable. (This is also the case if the vacalibrated
object only includes calibrated restuls for a single object – which is
why the algorithm name or “ensemble” is printed after the call to
plotVA()
– so it is possible to make further customizations
to the plot.)
ensemble_plots <- plotVA(calib_ensemble)
names(ensemble_plots)
## [1] "ensemble" "insilicova" "interva"
ensemble_plots$ensemble
For our last example, we illustrate the “compare” type of plot that combine all of the calibrated results onto a single plot:
plotVA(calib_ensemble, type = "compare", horiz = FALSE)
## $compare
The typical analysis utilizing the computer-coded verbal autopsy method involves transforming (hopefully cleaned) VA data into a particular format and then employing an algorithm to assign causes of death and produce an estimated CSMF. In this vignette we have skipped the first step, but would like to point users to the Python package pycrossva for more information. vacalibration provides a valuable extension to the analysis of neonate and child deaths that improves the accuracy of the population CSMF and can also leverage results from multiple algorithms. The openVA Team has made some steps to integrate the vacalibration package into openVA, but we welcome further suggestions to improve the interoperability of this growing VA ecosystem of software. To do so, please submit suggests (and bug reports) via the GitHub issue tracker for the openVA package.
Pramanik, Sandipan, Scott Zeger, Dianna Blau, and Abhirup Datta. 2025. “Modeling structure and country-specific heterogeneity in misclassification matrices of verbal autopsy-based cause of death classifiers,” The Annals of Applied Statistics: 19(2), 1214-1239.