Detecting Environmental Outliers in Data Analysis Pipelines


[Up] [Top]

Documentation for package ‘specleanr’ version 1.0.0

Help Pages

abdata Alburnoides bipunctatus species data from GBIF and iNaturalist
adjustboxplots Adjust the boxplots bounding fences using medcouple to flag suspicious outliers.
bestmethod Identifies the best method for outlier detection for a single species.
boots To implement bootstrapping procedures. Sampling with replacement.
broad_classify Outlier detection method broad classification.
check.exclude indicate excluded columns.
checks Post checks for PCA and bootstrapping
check_names Check species names for inconsistencies
check_packages Check for packages to install and respond to use
classify_data Extract final clean data using either absolute or best method generated outliers.
cosine Cosine similarity index based on (Gautam & Kulkarni 2014; Joy & Renumol 2020)
datacleaner-class Outlier detection class for multiple methods
distboxplot Distribution boxplot
ecological_ranges Check for environmental outliers using species optimal ranges.
efidata EFIPLUS data used to develop ecological sensitivity parameters for riverine species in European streams and rivers.
eif Computes the empirical influence function for each values in the dataset
extentvalues To check for a bounding box
extractMethods List of outlier detection methods implemented in this package.
extractoutliers Extract outliers for a one species
extract_clean_data Extract final clean data using either absolute or best method generated outliers.
geo_ranges Checks for geographic ranges from FishBase
getdata Download species records from online database.
getdiff get dataframe from the large dataframe.
ggenvironmentalspace Title Plotting to show the quality controlled data in environmental space.
ggoutlieraccum Identify if enough methods are selected for the outlier detection.
ggoutliers Visualize the outliers identified by each method
hamming Identify best outlier detection method using Hamming distance.
hampel Flag suspicious outliers based on the Hampel filter method..
handle_true_errors Catch errors during methods implementation.
interquartile Computes interquartile range to flag environmental outliers
isoforest Identify outliers using isolation forest model.
jaccard Identifies the best outlier detection method using Jaccard coefficient.
jdsdata Joint Danube Survey Data
jknife Identifies outliers using Reverse Jackknifing method based on Chapman et al., (2005).
kdat Sequential fences constants
logboxplot Log boxplot based for outlier detection.
mahal Flags outliers based on Mahalanobis distance matrix for all records.
match.argc Customized match function
match_datasets Data harmonizing for offline data based on Darwin Core terms .
medianrule Median rule method
mixediqr Mixed Interquartile range and semiInterquartile range 'Walker et al., 2018'
mth mth datasets with constant at each confidence interval levels.
multiabsolute Identifies absolute outliers for multiple species.
multibestmethod Identify best method for outlier removal for multiple species using majority votes.
multidetect Ensemble multiple outlier detection methods.
ocindex Identifies absolute outliers and their proportions for a single species.
onesvm Identify outliers using One Class Support Vector Machines
optimal_threshold Optimize threshold for clean data extraction.
overlap Identifies best outlier detection method using Overlap coefficient.
pca Implement principal component analysis for dimension reduction
pcboot To package both principal component analysis and bootstrapping.
pred_extract Preliminary data cleaning including removing duplicates, records outside a particular basin, and NAs.
search_threshold Determine the threshold using Locally estimated or weighted Scatterplot Smoothing.
semiIQR Computes semi-interquantile range to flag suspicious outliers
seqfences Sequential fences method
show-method set method for displaying output details after outlier detection.
smc Identify best outlier detection method using simple matching coefficient.
sorensen Identifies best outlier detection method suing Sorensen Similarity Index.
thermal_ranges Collates minimum, maximum, and preferable temperatures from FishBase.
ttdata Thymallus thymallus species data from GBIF and iNaturalist
xglosh Global-Local Outlier Score from Hierarchies
xkmeans Flags outliers using kmeans clustering method
xknn k-nearest neighbors for outlier detection
xlof Flags suspicious using the local outlier factor or Density-Based Spatial Clustering of Applications with Noise.
zscore Computes z-scores to flag environmental outliers.