Type: Package
Title: Optimize PTSD Diagnostic Criteria
Version: 0.1.0
Description: Provides tools for analyzing and optimizing PTSD (Post-Traumatic Stress Disorder) diagnostic criteria using PCL-5 (PTSD Checklist for DSM-5) data. Functions identify optimal subsets of PCL-5 items that maintain diagnostic accuracy while reducing assessment burden. Includes tools for both hierarchical (cluster-based) and non-hierarchical symptom combinations, calculation of diagnostic metrics, and comparison with standard DSM-5 criteria. Model validation is conducted using holdout and cross-validation methods to assess robustness and generalizability of the results. For more details see Weidmann et al. (2025) <doi:10.31219/osf.io/6rk72_v1>.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
Imports: dplyr, magrittr, rlang, stats, utils, modelr
Depends: R (≥ 3.5.0)
Suggests: DT, knitr, lattice, psych, rmarkdown, testthat (≥ 3.0.0)
VignetteBuilder: knitr
Config/testthat/edition: 3
URL: https://github.com/WeidmannL/PTSDdiag
BugReports: https://github.com/WeidmannL/PTSDdiag/issues
NeedsCompilation: no
Packaged: 2026-02-10 23:39:48 UTC; trs
Author: Laura Weidmann ORCID iD [aut], Tobias R. Spiller ORCID iD [aut, cre], Flavio A. Schüepp ORCID iD [aut]
Maintainer: Tobias R. Spiller <tobias.spiller@access.uzh.ch>
Repository: CRAN
Date/Publication: 2026-02-13 07:50:02 UTC

Find optimal non-hierarchical six-symptom combinations for PTSD diagnosis

Description

Identifies the three best six-symptom combinations for PTSD diagnosis where any four symptoms must be present, regardless of their cluster membership. This function implements a simplified diagnostic approach compared to the full DSM-5 criteria.

Usage

analyze_best_six_symptoms_four_required(data, score_by = "false_cases")

Arguments

data

A dataframe containing exactly 20 columns with PCL-5 item scores (output of rename_ptsd_columns). Each symptom should be scored on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

score_by

Character string specifying optimization criterion:

  • "false_cases": Minimize total misclassifications

  • "newly_nondiagnosed": Minimize false negatives only

Details

The function:

  1. Tests all possible combinations of 6 symptoms from the 20 PCL-5 items

  2. Requires 4 symptoms to be present (>=2 on original 0-4 scale) for diagnosis

  3. Identifies the three combinations that best match the original DSM-5 diagnosis

Optimization can be based on either:

The symptom clusters in PCL-5 are:

Value

A list containing:

Examples

# Create example data
ptsd_data <- data.frame(matrix(sample(0:4, 200, replace=TRUE), ncol=20))
names(ptsd_data) <- paste0("symptom_", 1:20)


# Find best combinations minimizing false cases
results <- analyze_best_six_symptoms_four_required(ptsd_data, score_by = "false_cases")

# Get symptom numbers
results$best_symptoms

# View raw comparison data
results$diagnosis_comparison

# View summary statistics
results$summary



Find optimal hierarchical six-symptom combinations for PTSD diagnosis

Description

Identifies the three best six-symptom combinations for PTSD diagnosis where four symptoms must be present and must include at least one symptom from each DSM-5 criterion cluster. This approach maintains the hierarchical structure of PTSD diagnosis while reducing the total number of required symptoms.

Usage

analyze_best_six_symptoms_four_required_clusters(
  data,
  score_by = "false_cases"
)

Arguments

data

A dataframe containing exactly 20 columns with PCL-5 item scores (output of rename_ptsd_columns). Each symptom should be scored on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

score_by

Character string specifying optimization criterion:

  • "false_cases": Minimize total misclassifications

  • "newly_nondiagnosed": Minimize false negatives only

Details

The function:

  1. Generates valid combinations ensuring representation from all clusters

  2. Requires 4 symptoms to be present (>=2 on original 0-4 scale) for diagnosis

  3. Validates that present symptoms include at least one from each cluster

  4. Identifies the three combinations that best match the original DSM-5 diagnosis

DSM-5 PTSD symptom clusters:

Optimization can be based on either:

Value

A list containing:

Examples

# Create example data
ptsd_data <- data.frame(matrix(sample(0:4, 200, replace=TRUE), ncol=20))
names(ptsd_data) <- paste0("symptom_", 1:20)


# Find best hierarchical combinations minimizing false cases
results <- analyze_best_six_symptoms_four_required_clusters(ptsd_data, score_by = "false_cases")

# Get symptom numbers
results$best_symptoms

# View raw comparison data
results$diagnosis_comparison

# View summary statistics
results$summary



Binarize PCL-5 symptom scores

Description

Converts PCL-5 symptom scores from their original 0-4 scale to binary values (0/1) based on the clinical threshold for symptom presence (>=2).

Usage

binarize_data(data)

Arguments

data

A dataframe containing exactly 20 columns with PCL-5 item scores (output of rename_ptsd_columns). Each symptom should be scored on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

Note: This function should only be used with raw symptom scores before calculating the total score, as it will convert all values in the dataframe to 0/1, which would invalidate any total score column if present.

Details

The function implements the standard clinical threshold for PTSD symptom presence where:

Value

A dataframe with the same structure as input but with all symptom scores converted to binary values:

Examples

# Create sample data
sample_data <- data.frame(
  matrix(sample(0:4, 20 * 10, replace = TRUE),
         nrow = 10,
         ncol = 20)
)
colnames(sample_data) <- paste0("symptom_", 1:20)

# Binarize scores
binary_data <- binarize_data(sample_data)
binary_data # Should only show 0s and 1s


Calculate PTSD total score

Description

Calculates the total PCL-5 (PTSD Checklist for DSM-5) score by summing all 20 symptom scores. The total score ranges from 0 to 80, with higher scores indicating greater symptom severity.

Usage

calculate_ptsd_total(data)

Arguments

data

A dataframe containing standardized PCL-5 item scores (output of rename_ptsd_columns). Each symptom should be scored on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

Details

Calculates the total score from PCL-5 items

Value

A dataframe with all original columns plus an additional column "total" containing the sum of all 20 symptom scores (range: 0-80)

Examples

# Create sample data
sample_data <- data.frame(
  matrix(sample(0:4, 20 * 10, replace = TRUE),
         nrow = 10,
         ncol = 20)
)
colnames(sample_data) <- paste0("symptom_", 1:20)

# Calculate total scores
scores_with_total <- calculate_ptsd_total(sample_data)
print(scores_with_total$total)


Determine PTSD diagnosis based on DSM-5 criteria using binarized scores

Description

Determines whether DSM-5 diagnostic criteria for PTSD are met using binarized symptom scores (0/1) for PCL-5 items. This is an alternative to determine_ptsd_diagnosis() that works with pre-binarized data.

Usage

create_ptsd_diagnosis_binarized(data)

Arguments

data

A dataframe containing exactly 20 columns of PCL-5 item scores (output of rename_ptsd_columns) named symptom_1 to symptom_20. Each symptom should be scored on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

Note: This function should only be used with raw symptom scores (output of rename_ptsd_columns) and not with data containing a total score column, as the internal binarization process would invalidate the total score.

Details

The function applies the DSM-5 diagnostic criteria for PTSD using binary indicators of symptom presence:

Value

A dataframe with a single column "PTSD_orig" containing TRUE/FALSE values indicating whether DSM-5 diagnostic criteria are met based on binarized scores

Examples

# Create sample data
sample_data <- data.frame(
  matrix(sample(0:4, 20 * 10, replace = TRUE),
         nrow = 10,
         ncol = 20)
)
colnames(sample_data) <- paste0("symptom_", 1:20)

# Get diagnosis using binarized approach
diagnosis_results <- create_ptsd_diagnosis_binarized(sample_data)
diagnosis_results$PTSD_orig


Determine PTSD diagnosis based on DSM-5 criteria using non-binarized scores

Description

Determines whether DSM-5 diagnostic criteria for PTSD are met based on PCL-5 item scores, using the original non-binarized values (0-4 scale).

Usage

create_ptsd_diagnosis_nonbinarized(data)

Arguments

data

A dataframe that can be either:

  • Output of rename_ptsd_columns(): 20 columns named symptom_1 to symptom_20

  • Output of calculate_ptsd_total(): 21 columns including symptom_1 to symptom_20 plus a 'total' column

Each symptom should be scored on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

Details

The function applies the DSM-5 diagnostic criteria for PTSD:

A symptom is considered present when rated 2 (Moderately) or higher.

Value

A dataframe with all original columns (including 'total' if present) plus an additional column "PTSD_Diagnosis" containing TRUE/FALSE values indicating whether DSM-5 diagnostic criteria are met

Examples

# Example with output from rename_ptsd_columns
sample_data1 <- data.frame(
  matrix(sample(0:4, 20 * 10, replace = TRUE),
         nrow = 10,
         ncol = 20)
)
colnames(sample_data1) <- paste0("symptom_", 1:20)
diagnosed_data1 <- create_ptsd_diagnosis_nonbinarized(sample_data1)

# Check diagnosis results
diagnosed_data1$PTSD_Diagnosis

# Example with output from calculate_ptsd_total
sample_data2 <- calculate_ptsd_total(sample_data1)
diagnosed_data2 <- create_ptsd_diagnosis_nonbinarized(sample_data2)

# Check diagnosis results
diagnosed_data2$PTSD_Diagnosis


Create readable summary of PTSD diagnostic changes

Description

Formats the output of summarize_ptsd_changes() into a more readable table with proper labels and formatting of percentages and metrics.

Usage

create_readable_summary(summary_stats)

Arguments

summary_stats

A dataframe output from summarize_ptsd_changes() containing raw diagnostic metrics and counts

Details

Reformats the diagnostic metrics into a presentation-ready format:

Value

A formatted dataframe with the following columns:

Examples

# Using the output from summarize_ptsd_changes
n_cases <- 100
sample_data <- data.frame(
  PTSD_orig = sample(c(TRUE, FALSE), n_cases, replace = TRUE),
  PTSD_alt1 = sample(c(TRUE, FALSE), n_cases, replace = TRUE)
)

# Generate and format summary
diagnostic_metrics <- summarize_ptsd_changes(sample_data)
readable_summary <- create_readable_summary(diagnostic_metrics)
print(readable_summary)


Perform k-fold cross-validation for PTSD diagnostic models

Description

Validates PTSD diagnostic models using k-fold cross-validation to assess generalization performance and identify stable symptom combinations.

Usage

cross_validation(data, k = 5, score_by = "newly_nondiagnosed", seed = 123)

Arguments

data

A dataframe containing exactly 20 columns with PCL-5 item scores (output of rename_ptsd_columns). Each symptom should be scored on a 0-4 scale.

k

Number of folds for cross-validation (default: 5)

score_by

Character string specifying optimization criterion:

  • "false_cases": Minimize total misclassifications

  • "newly_nondiagnosed": Minimize false negatives only (default)

seed

Integer for random number generation reproducibility (default: 123)

Details

The function:

  1. Splits data into k folds

  2. For each fold, trains on k-1 folds and tests on the held-out fold

  3. Identifies symptom combinations that appear across multiple folds

  4. Calculates average performance metrics for repeated combinations

Two models are evaluated:

Value

A list containing:

Examples

# Create sample data
set.seed(42)
sample_data <- data.frame(
  matrix(sample(0:4, 20 * 200, replace = TRUE),
         nrow = 200,
         ncol = 20)
)
colnames(sample_data) <- paste0("symptom_", 1:20)


# Perform 5-fold cross-validation
cv_results <- cross_validation(sample_data, k = 5)

# View summary for each fold
cv_results$without_clusters$summary_by_fold

# View combinations that appeared multiple times
cv_results$without_clusters$combinations_summary



Perform holdout validation for PTSD diagnostic models

Description

Validates PTSD diagnostic models using a train-test split approach (holdout validation). Trains the model on a portion of the data and evaluates performance on the held-out test set.

Usage

holdout_validation(
  data,
  train_ratio = 0.7,
  score_by = "newly_nondiagnosed",
  seed = 123
)

Arguments

data

A dataframe containing exactly 20 columns with PCL-5 item scores (output of rename_ptsd_columns). Each symptom should be scored on a 0-4 scale.

train_ratio

Numeric between 0 and 1 indicating proportion of data for training (default: 0.7 for 70/30 split)

score_by

Character string specifying optimization criterion:

  • "false_cases": Minimize total misclassifications

  • "newly_nondiagnosed": Minimize false negatives only (default)

seed

Integer for random number generation reproducibility (default: 123)

Details

The function:

  1. Splits data into training (70

  2. Finds optimal symptom combinations on training data

  3. Evaluates these combinations on test data

  4. Compares results to original DSM-5 diagnoses

Two models are evaluated:

Value

A list containing:

Examples

# Create sample data
set.seed(42)
sample_data <- data.frame(
  matrix(sample(0:4, 20 * 200, replace = TRUE),
         nrow = 200,
         ncol = 20)
)
colnames(sample_data) <- paste0("symptom_", 1:20)


# Perform holdout validation
validation_results <- holdout_validation(sample_data, train_ratio = 0.7)

# Access results
validation_results$without_clusters$summary
validation_results$with_clusters$summary



Rename PTSD symptom (= PCL-5 item) columns

Description

Standardizes column names in PCL-5 (PTSD Checklist for DSM-5) data by renaming them to a consistent format (symptom_1 through symptom_20). This standardization is essential for subsequent analyses using other functions in the package.

Usage

rename_ptsd_columns(data)

Arguments

data

A dataframe containing exactly 20 columns, where each column represents a PCL-5 item score. The scores should be on a 0-4 scale where:

  • 0 = Not at all

  • 1 = A little bit

  • 2 = Moderately

  • 3 = Quite a bit

  • 4 = Extremely

Details

The function assumes the input data contains exactly 20 columns corresponding to the 20 items of the PCL-5. The columns are renamed sequentially from symptom_1 to symptom_20, maintaining their original order. The PCL-5 items correspond to different symptom clusters:

Value

A dataframe with the same data but renamed columns following the pattern 'symptom_1' through 'symptom_20'

Examples

# Example with a sample PCL-5 dataset
sample_data <- data.frame(
  matrix(sample(0:4, 20 * 10, replace = TRUE),
         nrow = 10,
         ncol = 20)
)
renamed_data <- rename_ptsd_columns(sample_data)
colnames(renamed_data)  # Shows new column names


Simulated PCL-5 (PTSD Checklist) Data

Description

A dataset containing simulated responses from 5,000 patients on the PCL-5 (PTSD Checklist for DSM-5). Each patient rated 20 PTSD symptoms on a scale from 0 to 4.

Usage

simulated_ptsd

Format

A data frame with 5,000 rows and 20 columns:

S1

Intrusive memories

S2

Nightmares

S3

Flashbacks

S4

Emotional reactivity to reminders

S5

Physical reactions to reminders

S6

Avoiding memories/thoughts/feelings

S7

Avoiding external reminders

S8

Amnesia

S9

Strong negative beliefs

S10

Distorted blame

S11

Negative trauma-related emotions

S12

Decreased interest in activities

S13

Detachment or estrangement

S14

Trouble experiencing positive emotions

S15

Irritability/aggression

S16

Risk-taking behavior

S17

Hypervigilance

S18

Heightened startle reaction

S19

Difficulty concentrating

S20

Sleep problems

Details

The symptoms are rated on a 5-point scale:

The symptoms correspond to DSM-5 PTSD criteria:

Source

Simulated data for demonstration purposes


Summarize PTSD scores and diagnoses

Description

Creates a summary of PCL-5 total scores and PTSD diagnoses, including mean total score, standard deviation, and number of positive diagnoses.

Usage

summarize_ptsd(data)

Arguments

data

A dataframe containing at minimum:

  • A 'total' column with PCL-5 total scores (from calculate_ptsd_total)

  • A 'PTSD_Diagnosis' column with TRUE/FALSE values (from determine_ptsd_diagnosis)

Details

This function calculates key summary statistics for PCL-5 data:

Value

A dataframe with one row containing:

Examples

# Create sample data
sample_data <- data.frame(
  total = sample(0:80, 100, replace = TRUE),
  PTSD_Diagnosis = sample(c(TRUE, FALSE), 100, replace = TRUE)
)

# Generate summary statistics
summary_stats <- summarize_ptsd(sample_data)
print(summary_stats)


Summarize changes in PTSD diagnostic metrics

Description

Compares different PTSD diagnostic criteria by calculating diagnostic accuracy metrics and changes in diagnosis status relative to a baseline criterion.

Usage

summarize_ptsd_changes(data)

Arguments

data

A dataframe where:

  • Each column represents a different diagnostic criterion

  • Must include a column named "PTSD_orig" as the baseline criterion

  • Values are logical (TRUE/FALSE) indicating whether PTSD criteria are met

  • Each row represents one case/participant

Details

The function calculates multiple diagnostic metrics comparing each diagnostic criterion to a baseline criterion (PTSD_orig):

Basic counts:

Diagnostic accuracy metrics:

Value

A dataframe containing the following columns for each diagnostic criterion:

Examples

# Create sample diagnostic data
set.seed(123)
n_cases <- 100
sample_data <- data.frame(
  PTSD_orig = sample(c(TRUE, FALSE), n_cases, replace = TRUE),
  PTSD_alt1 = sample(c(TRUE, FALSE), n_cases, replace = TRUE),
  PTSD_alt2 = sample(c(TRUE, FALSE), n_cases, replace = TRUE)
)

# Calculate diagnostic metrics
diagnostic_metrics <- summarize_ptsd_changes(sample_data)
diagnostic_metrics