Skip to contents

Pull a substring found between a start and end regular-expression pattern from each element of a character vector. Matching is case-insensitive and dot-all by default, and an optional colon after the start pattern is ignored (e.g., "Label:"). You may optionally normalize internal whitespace.

Usage

ngr_str_extract_between(x, reg_start, reg_end, squish = FALSE)

Arguments

x

character A character vector to search.

reg_start

character A single string: the start regex pattern. Optional trailing colon and whitespace in the source text are ignored.

reg_end

character A single string: the end regex pattern used in a lookahead; the matched text will end before this pattern.

squish

logical Optional. If TRUE, collapse and trim whitespace in the extracted text via stringr::str_squish(). Default is FALSE.

Value

character A character vector the same length as x, with the extracted substrings. Elements are NA when no match is found. Errors may be thrown by the underlying regex engine if reg_start or reg_end contain invalid regular expressions.

Matching details

  • Flags used: (?i) case-insensitive, (?s) dot matches newline.

  • Pattern built: non-capturing (?:reg_start) then optional :\s*, then the first non-greedy capture (.*?), ending just before (?:reg_end) via a lookahead. If squish = TRUE, surrounding and internal whitespace is normalized.

See also

ngr_str_df_extract() for applying multiple start/end pairs to a data-frame column.

Other string: ngr_str_dir_from_path(), ngr_str_link_url()

Examples

x <- c(
"Grant Amount: $400,000 Intake Year: 2025",
"Grant Amount: $150,500 Intake Year: 2024"
)
ngr_str_extract_between(x,
reg_start = "Grant\\\\s*Amount",
reg_end = "Intake\\\\s*Year|$"
)
#> [1] NA NA

# With whitespace normalization
ngr_str_extract_between(
x = "Region : Fraser Basin Project Theme: Something",
reg_start = "Region",
reg_end = "Project\\\\s*Theme|$",
squish = TRUE
)
#> [1] ": Fraser Basin Project Theme: Something"