Word Pairs

Sample dataframe included in package. Word pairs are arrayed in columns. Columns need not be immediately adjacent within your dataframe.

word1	word2
Dog	trumpet
the	BANANA
rat	astronaut
*&^%	lizard
bird	bird

Clean Word Pairs in Columns Transcript

Arguments to clean_paired_cols are:
dat your raw dataframe with two columns of paired text
word1 quoted variable reflecting the column name where your first word lives
word2 quoted variable reflecting the column name where your first word lives
lemmatize transforms raw word to lemmatized form, T/F default is TRUE

WordPairs_Clean <- clean_paired_cols(dat=Word_Pairs, wordcol1='word1', wordcol2='word2', lemmatize=TRUE)
knitr::kable(head(WordPairs_Clean, 6), format = "simple", digits=2)

id_row_orig	word1_clean	word2_clean	word1	word2
1	dog	trumpet	Dog	trumpet
2	the	banana	the	BANANA
3	rat	astronaut	rat	astronaut
5	bird	bird	bird	bird
6	shark	shark	shark	shark
8	dog	leash	Dog	leash

Word Pairs Semantic Distance

Generates semantic distances (Glove and SD15) between word pairs in separate columns. Output of ‘dist_paired_cols’ on 2-column arrayed dataframe. Argument to dist_paired_cols: dat = dataframe with word pairs arrayed in columns cleaned and prepped using ‘clean_2cols’ fn

Columns_Dists <- dist_paired_cols(dat=WordPairs_Clean) 
knitr::kable(head(Columns_Dists, 6), format = "simple", digits=2)

id_row_orig	word1_clean	word2_clean	word1	word2	CosDist_SD15	CosDist_GLO
1	dog	trumpet	Dog	trumpet	0.45	0.84
2	the	banana	the	BANANA	1.18	0.77
3	rat	astronaut	rat	astronaut	1.22	0.93
5	bird	bird	bird	bird	0.00	0.00
6	shark	shark	shark	shark	0.00	0.00
8	dog	leash	Dog	leash	0.68	0.50

SemanticDistance_Word_Pairs

Jamie Reilly, Hannah Mechtenberg, Emily Myers, Jonathan E. Peelle

2025-08-27

Word Pairs

Clean Word Pairs in Columns Transcript

Word Pairs Semantic Distance