Extract segments of text from a data-frame column
Source:R/ngr_str_df_extract.R
ngr_str_df_extract.RdPull substrings between regex start/end delimiters from a string column.
Each element of segment_pairs is a length-2 character vector:
c(start_regex, end_regex). The list name becomes the output column name.
Arguments
- data
data.frame or tibble::tbl_df A table containing the source text column.
- col
character A single string: the name of the text column (e.g., "details").
- segment_pairs
list A named list where each element is a length-2 character vector
c(start_regex, end_regex). List names become output column names. Names must be non-empty and unique.
Value
data.frame The input data with one new column per named segment.
Values are NA where no match is found. An error is raised if col is
missing from data, if segment_pairs is not a properly named list, or if
any element is not a length-2 character vector.
See also
ngr_str_extract_between() to extract a single pair from a character vector.
Other string dataframe:
ngr_str_df_col_agg(),
ngr_str_df_detect_filter()
Examples
df <- tibble::tibble(details = c(
"Grant Amount: $400,000 Intake Year: 2025 Region: Fraser Basin Project Theme: Restoration",
"Grant Amount: $150,500 Intake Year: 2024 Region: Columbia Basin Project Theme: Food Systems"
))
segs <- list(
amount = c("Grant\\\\s*Amount", "Intake\\\\s*Year|Region|Project\\\\s*Theme|$"),
year = c("Intake\\\\s*Year", "Region|Project\\\\s*Theme|$"),
region = c("Region", "Project\\\\s*Theme|$"),
theme = c("Project\\\\s*Theme", "$")
)
out <- ngr_str_df_extract(df, "details", segs)
out
#> # A tibble: 2 × 5
#> details amount year region theme
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Grant Amount: $400,000 Intake Year: 2025 Region: Fr… NA NA : Fra… NA
#> 2 Grant Amount: $150,500 Intake Year: 2024 Region: Co… NA NA : Col… NA