Type: Package
Title: Convert and Impute Dates to ISO Standard ("International Organization for Standardization")
Version: 1.1.1
URL: https://github.com/andzoluk
Language: en-US
Description: Provides functions to convert and impute date values to the ISO 8601 standard format. The package automatically recognizes date patterns within a data frame and transforms them into consistent ISO-formatted dates. It also supports imputing missing month or day components in partial date strings using user-defined rules. Only one date format can be applied within a single data frame column.
License: MIT + file LICENSE
Encoding: UTF-8
Imports: stringr, lubridate, data.table, dplyr
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
RoxygenNote: 7.3.1
NeedsCompilation: no
Packaged: 2025-11-18 11:18:31 UTC; Andrzejewski
Author: Lukasz Andrzejewski [aut, cre]
Maintainer: Lukasz Andrzejewski <lukasz.coding@gmail.com>
Repository: CRAN
Date/Publication: 2025-11-18 12:30:02 UTC

Check if Day Component is Valid in dmy date type

Description

This function checks whether the day component in a vector of date strings is valid, i.e., not exceeding the maximum number of days for the given month and year. It returns a logical vector indicating which elements have a correctly specified day.

Usage

check_day_correctly_entered_dmy(data_frame, column_name, separator = "-")

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

Value

A logical vector

Author(s)

Lukasz Andrzejewski


Check if Day Component is Valid in ymd date type

Description

This function checks whether the day component in a vector of date strings is valid, i.e., not exceeding the maximum number of days for the given month and year. It returns a logical vector indicating which elements have a correctly specified day.

Usage

check_day_correctly_entered_ymd(data_frame, column_name, separator = "-")

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

Value

A logical vector

Author(s)

Lukasz Andrzejewski


Check if a Vector Contains a Month and Year

Description

This function determines whether the elements of a vector contain a **month** and **year** in the specified order. It returns a logical vector indicating which elements meet this criterion.

Usage

check_if_month_year_entered(data_frame, column_name, separator = "-")

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

Value

A logical vector


Check if a Vector Contains only Year

Description

This function determines whether the elements of a vector contain only **year** It returns a logical vector indicating which elements meet this criterion.

Usage

check_if_only_year_entered(
  data_frame,
  column_name,
  separator = "-",
  month = "UNK",
  day = "UN"
)

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

Value

A logical vector

Author(s)

Lukasz Andrzejewski


Check if a vector contains a complete date

Description

Check if a vector contains a complete date

Usage

check_if_year_month_day_entered(
  data_frame,
  column_name,
  separator = "-",
  date_format = "ymd",
  year = "UNKN",
  month = "UNK",
  day = "UN"
)

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

date_format

by default "ymd". choose between ymd (if first year, then month then day) and dmy (if first day, then month then year)

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

Value

A logical vector

Author(s)

Lukasz Andrzejewski


Check if a Vector Contains a Year and Month

Description

This function determines whether the elements of a vector contain a **year** and **month** in the specified order. It returns a logical vector indicating which elements meet this criterion.

Usage

check_if_year_month_entered(data_frame, column_name, separator = "-")

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

Value

A logical vector

Author(s)

Lukasz Andrzejewski


Get TRUE if date format is dmy

Description

Get TRUE if date format is dmy

Usage

choose_dmy_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if most probable date format is DMY

Author(s)

Lukasz Andrzejewski


Get TRUE if date format is dym

Description

Get TRUE if date format is dym

Usage

choose_dym_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if most probable date format is DYM

Author(s)

Lukasz Andrzejewski


Get TRUE if date format is mdy

Description

Get TRUE if date format is mdy

Usage

choose_mdy_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if most probable date format is MDY

Author(s)

Lukasz Andrzejewski


Get TRUE if date format is myd

Description

Get TRUE if date format is myd

Usage

choose_myd_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if most probable date format is MYD

Author(s)

Lukasz Andrzejewski


Get TRUE if date format is ydm

Description

Get TRUE if date format is ydm

Usage

choose_ydm_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if most probable date format is YDM

Author(s)

Lukasz Andrzejewski


Get TRUE if date format is ymd

Description

Get TRUE if date format is ymd

Usage

choose_ymd_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if most probable date format is YMD

Author(s)

Lukasz Andrzejewski


Prepare and normalize date-like strings before YMD conversion

Description

This function applies a series of cleaning and normalization steps to strings representing dates. It is intended for use before parsing dates into a YMD (year–month–day) format. The function standardizes month names, trims whitespace, removes invalid characters, and handles strings that contain a letter "T" (common in timestamp formats).

Usage

clean_date(df_column)

Arguments

df_column

A character vector or data frame column containing raw date-like strings to be cleaned.

Details

The processing includes:

Value

A character vector of cleaned date strings, with a maximum length of 12 characters, trimmed of whitespace, and with any timestamp-like "T" components removed when appropriate.

Author(s)

Lukasz Andrzejewski

Examples

clean_date(c("2024-01-10T15:30:00", "2024 AUGUST 12", "20250101"))

Function recognize date variables and modify them to ISO standard ("International Organization for Standardization")

Description

Function recognize date variables and modify them to ISO standard ("International Organization for Standardization")

Usage

dfiso(df)

Arguments

df

data frame or variable/s, for example data.frame(date=c("12-Mar-2021","01-Jan-2023"))

Value

dates formatted to ISO standard (yyyy-mm-dd)

Author(s)

Lukasz Andrzejewski

Examples

# data frame with different formatted dates
dfiso(data.frame(date1=c("13-02-2022","13/Feb/2022","13-Feb-2022")))


Find DMY dates only

Description

Find DMY dates only

Usage

find_dmy_date_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if date format is DMY

Author(s)

Lukasz Andrzejewski


Find DYM dates only

Description

Find DYM dates only

Usage

find_dym_date_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if date format is DYM

Author(s)

Lukasz Andrzejewski


Find MDY dates only

Description

Find MDY dates only

Usage

find_mdy_date_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if date format is MDY

Author(s)

Lukasz Andrzejewski


Find MYD dates only

Description

Find MYD dates only

Usage

find_myd_date_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if date format is MYD

Author(s)

Lukasz Andrzejewski


Return TRUE if data frame column or vector contains date

Description

Return TRUE if data frame column or vector contains date

Usage

find_only_dates(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, return TRUE if number of characters is higher than 5, contains digits and special characters or month names

Author(s)

Lukasz Andrzejewski


Find Unknown date, defined as UN or UNK

Description

Find Unknown date, defined as UN or UNK

Usage

find_unknow_date(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if "un" character is found but not "jun"

Author(s)

Lukasz Andrzejewski


Find YDM dates only

Description

Find YDM dates only

Usage

find_ydm_date_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if date format is YDM

Author(s)

Lukasz Andrzejewski


Find YMD dates only

Description

Find YMD dates only

Usage

find_ymd_date_format(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

logical vector, TRUE if date format is YMD

Author(s)

Lukasz Andrzejewski


Replace full month name by abbreviated month name

Description

Replace full month name by abbreviated month name

Usage

get_abbreviated_month_name(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

vector, if any full length month name, then replace by abbreviated month name

Author(s)

Lukasz Andrzejewski


Get vector with full name of months separated by vertical bar

Description

Get vector with full name of months separated by vertical bar

Usage

get_full_name_months_sep_by_vertical_bar()

Value

full names and abbreviations of months separated by vertical bar

Author(s)

Lukasz Andrzejewski


Score each of date format ymd, ydm, dmy, dym, mdy, myd and return only the highest score

Description

Score each of date format ymd, ydm, dmy, dym, mdy, myd and return only the highest score

Usage

get_max_score_within_data_formats(df_column)

Arguments

df_column

data frame date column or vector with dates

Value

return score of most probable date format

Author(s)

Lukasz Andrzejewski


List month names: full names and abbreviated names in lower case

Description

List month names: full names and abbreviated names in lower case

Usage

get_months()

Value

full names and abbreviations of months

Author(s)

Lukasz Andrzejewski


List month names: full names in lower case

Description

List month names: full names in lower case

Usage

get_months_full_names()

Value

full names of months

Author(s)

Lukasz Andrzejewski


Get vector with full and abbreviated name of months separated by vertical bar

Description

Get vector with full and abbreviated name of months separated by vertical bar

Usage

get_months_sep_by_vertical_bar()

Value

full names and abbreviations of months separated by vertical bar

Author(s)

Lukasz Andrzejewski


Function to find number of symbols in date

Description

Function to find number of symbols in date

Usage

get_number_of_symbols_in_string(df_column, symbol = "T")

Arguments

df_column

data frame date column or vector with dates

symbol

symbol that needs to be found, by default "T"

Value

number of found symbols

Author(s)

Lukasz Andrzejewski


function return observations with up to 12 characters

Description

function return observations with up to 12 characters

Usage

get_up_to_12_char(df_column)

Arguments

df_column

data frame column or vector to extract observarions up to 12 characters

Value

return up to 12 characters

Author(s)

Lukasz Andrzejewski


Function return special characters and months separated by vertical bars

Description

Function return special characters and months separated by vertical bars

Usage

has_dash_or_slash_or_white_space_characters_or_months_separated_by_vertical_bar(
  
)

Value

special characters and months: "-|\/|\w+\s+|january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec"

Author(s)

Lukasz Andrzejewski


Function return special characters separated by vertical bars

Description

Function return special characters separated by vertical bars

Usage

has_dash_or_slash_or_white_space_characters_separated_by_vertical_bar(
  special_characters = c("-", "\\/", "\\w+\\s+")
)

Arguments

special_characters

by default dash, slash, white space characters

Value

special characters: "-|\/|\w+\s+"

Author(s)

Lukasz Andrzejewski


Impute Missing Components in Partial Date Strings

Description

This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in either the *dmy* format (day-month-year) **or** the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.

Usage

impute_date(
  data_frame,
  column_name,
  date_format = "ymd",
  separator = "-",
  year = "UNKN",
  month = "UNK",
  day = "UN",
  min_max = "min",
  suffix = "_DT"
)

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

date_format

by default "ymd". choose between ymd (if first year, then month then day) and dmy (if first day, then month then year)

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

min_max

by default "min". controlling imputation direction."min" → Impute the earliest possible date "max"' → Impute the latest possible date

suffix

by default "_DT" - new imputed date is named as source variable with suffix

Details

If the **year** is missing or explicitly marked as unknown (e.g., '"UNKN"'), the function returns 'NA'. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.

Any datetime strings (e.g., '"NA-01-2025T11:10:00"') must be preprocessed to remove the time component before applying this function (e.g., convert to '"NA-01-2025"').

In addition to imputing the date, the function creates an accompanying **flag variable** named as: '"<source_variable>_<suffix>F"'. This flag variable indicates the type of imputation performed:

Value

A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.

Author(s)

Lukasz Andrzejewski

Examples

impute_date(data_frame = data.frame(K = c('2025 11 UN', '2025 UNK 23')),
column_name = "K", separator = " ")

Impute Missing Components in Partial Date Strings

Description

This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in the *dmy* format (day-month-year) and does not process datetime values or strings containing time components or non-date characters.

Usage

impute_date_dmy(
  data_frame,
  column_name,
  separator = "-",
  year = "UNKN",
  month = "UNK",
  day = "UN",
  min_max = "min",
  suffix = "_DT"
)

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

min_max

by default "min". controlling imputation direction."min" → Impute the earliest possible date "max"' → Impute the latest possible date

suffix

by default "_DT" - new imputed date is named as source variable with suffix

Details

If the **year** is missing or explicitly marked as unknown (e.g., '"UNKN"'), the function returns 'NA'. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.

Any datetime strings (e.g., '"NA-01-2025T11:10:00"') must be preprocessed to remove the time component before applying this function (e.g., convert to '"NA-01-2025"').

In addition to imputing the date, the function creates an accompanying **flag variable** named as: '"<source_variable>_<suffix>F"'. This flag variable indicates the type of imputation performed:

Value

A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.

Author(s)

Lukasz Andrzejewski

Examples

impute_date_dmy(data_frame = data.frame(K = c('NA 11 2025', '23 11 2025')),
column_name = "K", separator = " ", day = "NA")

Impute Missing Components in Partial Date Strings

Description

This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.

Usage

impute_date_ymd(
  data_frame,
  column_name,
  separator = "-",
  year = "UNKN",
  month = "UNK",
  day = "UN",
  min_max = "min",
  suffix = "_DT"
)

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

min_max

by default "min". controlling imputation direction."min" → Impute the earliest possible date "max"' → Impute the latest possible date

suffix

by default "_DT" - new imputed date is named as source variable with suffix

Details

If the **year** is missing or explicitly marked as unknown (e.g., '"UNKN"'), the function returns 'NA'. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.

Any datetime strings (e.g., '"2025-01-NAT11:10:00"') must be preprocessed to remove the time component before applying this function (e.g., convert to '"2025-01-NA"').

In addition to imputing the date, the function creates an accompanying **flag variable** named as: '"<source_variable>_<suffix>F"'. This flag variable indicates the type of imputation performed:

Value

A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.

Author(s)

Lukasz Andrzejewski

Examples

impute_date_ymd(data_frame = data.frame(K = c('2025/11/UN', '2025/11/23')),
column_name = "K", separator = "/")

Remove unnecessary characters from date-like strings

Description

This function cleans a character vector or data frame column containing date-like strings by removing all characters that are not needed for parsing or recognizing dates. It preserves:

All other characters (symbols, punctuation, letters not in month names) are removed.

Usage

remove_no_date_characters(df_column)

Arguments

df_column

A character vector (or data frame column) containing date-like strings. Factors will be coerced to character. NA values are preserved.

Details

The function works as follows:

  1. Converts input to character vector.

  2. Generates the set of letters present in all English month names (case-insensitive).

  3. Constructs a regex pattern to match all characters that are NOT digits, allowed letters, or allowed extra symbols.

  4. Uses stringr::str_replace_all() to remove unwanted characters.

Value

A character vector of the same length as df_column, with unwanted characters removed. Only digits, letters from month names, and selected extra characters are kept.

Author(s)

Lukasz Andrzejewski


Get substring of date to eliminate unnecessary part

Description

Get substring of date to eliminate unnecessary part

Usage

remove_unnecessary_part_of_date(df_column, symbol = "T")

Arguments

df_column

date column or vector with dates

symbol

symbol that needs to be found, by default "T"

Value

substring of date from position 1 to position where last "symbol" is located

Author(s)

Lukasz Andrzejewski


transform date vector to date vector in ISO standard ("International Organization for Standardization")

Description

transform date vector to date vector in ISO standard ("International Organization for Standardization")

Usage

viso(df_column)

Arguments

df_column

vector or string

Value

dates formatted to ISO standard (yyyy-mm-dd)

Author(s)

Lukasz Andrzejewski

Examples

#day month year vector
viso(c("12Mar2022","21Feb2022"))

#day month year vector in different formats
viso(c("12Mar2022","21-02-2022"))

#month year day vector
viso(c("Mar-2022-12","Feb-2022-21"))