Skip to contents

This function identifies citation keys provided in a list that are missing from a specified BibTeX file. It ensures the BibTeX file exists and processes the file to match the citation keys.

This function extracts all citation keys from a given BibTeX file.

This function attempts to find the closest matching citation keys for keys that are missing from a BibTeX file, based on string distance. Users can set a threshold for the maximum allowable distance for a match, optionally exclude rows where no matches are close enough, and choose the string distance calculation method.

Usage

xct_bib_keys_missing(path_bib, citations)

xct_bib_keys_extract(path_bib)

xct_keys_guess_match(
  keys_missing,
  keys_bib,
  stringdist_threshold = 5,
  stringdist_method = "osa",
  no_match_rows_include = FALSE
)

Arguments

path_bib

A file path to the bibliography file (.bib) used for citation keys. Defaults to "references.bib" in the current working directory.

keys_missing

character A vector of citation keys that are missing from the BibTeX file. Typically obtained using xct_bib_keys_missing().

keys_bib

character A vector of citation keys extracted from the BibTeX file. Typically obtained using xct_bib_keys_extract().

stringdist_threshold

numeric The maximum allowable string distance for a match to be considered valid. This value is passed to stringdist::stringdist() for the specified method. Distances are non-negative numeric values where smaller values indicate greater similarity. For short strings (5-10 characters), use 1-3; for medium strings (10-20 characters), use 3-7; and for longer strings (>20 characters), use 7-15. Default is 5.

stringdist_method

character The method used by stringdist::stringdist() to calculate string distances. Options include "osa" (Optimal String Alignment, default), "lv" (Levenshtein), "dl" (Damerau-Levenshtein), "jw" (Jaro-Winkler), and others. See stringdist::stringdist() documentation for details.

no_match_rows_include

logical Whether to include rows for keys with no valid matches in the output. Default is FALSE, which excludes such rows.

keys

character A vector of citation keys to check for in the BibTeX file.

Value

A vector of citation keys missing from the BibTeX file. Returns an empty vector if all citation keys are found.

A character vector of citation keys extracted from the BibTeX file.

A data frame with two columns:

key_missing

The missing citation keys.

key_missing_guess_match

The guessed closest matching citation keys (or NA if no valid match is found).

If no_match_rows_include = FALSE, rows with no valid matches are excluded.

See also

xct_bib_keys_missing(), xct_bib_keys_extract(), stringdist::stringdist()

Other cite: xct_assemble(), xct_format(), xct_format_single(), xct_keys_extract(), xct_keys_to_inline(), xct_keys_to_xref()

Examples

if (FALSE) { # \dontrun{
path_bib <- system.file("extdata", "references.bib", package = "xciter")
citations <- c("Smith2020", "Jones2019")
xct_bib_keys_missing(path_bib, citations)
} # }
# Path to the example BibTeX file
path_bib <- system.file("extdata", "references.bib", package = "xciter")

# Define the citation keys to check
keys <- c("busch_etal2011LandscapeLevelModel",
          "kirsch_etal2014Fishinventoryb",
          "test2001")

# Extract keys from the BibTeX file
keys_bib <- xct_bib_keys_extract(path_bib)

# Identify missing keys
keys_missing <- xct_bib_keys_missing(path_bib, keys)
#> ! The following citations were not found in the BibTeX file:
#> [1] "busch_etal2011LandscapeLevelModel" "kirsch_etal2014Fishinventoryb"    
#> [3] "test2001"                         

# Guess matches for missing keys with a string distance threshold and method
xct_keys_guess_match(keys_missing, keys_bib)
#> ! No valid match found for the following keys (closest distance exceeds 5 ): test2001
#>                         key_missing            key_missing_guess_match
#> 1 busch_etal2011LandscapeLevelModel busch_etal2011LandscapeLevelModela
#> 2     kirsch_etal2014Fishinventoryb       kirsch_etal2014Fishinventory