Third primitive in fresh's point-handling family (alongside
frs_point_snap() and frs_point_match()). Given a candidates
table where multiple rows can share the same key value, optionally
compute a per-row score from a caller-supplied SQL expression,
optionally filter disqualifiers via a caller-supplied WHERE clause,
then keep one row per key by DISTINCT ON (col_key) ORDER BY ....
Usage
frs_candidates_pick(
conn,
table_in,
table_to,
col_key,
exp_score = NULL,
exp_filter = NULL,
order_by
)Arguments
- conn
A DBI::DBIConnection object.
- table_in
Character. Schema-qualified candidates table. Multiple rows per
col_keyvalue are expected (the whole point — pick one). Typically the output offrs_point_snap(num_features = N)optionally enriched with JOINs to pull in score-bearing columns.- table_to
Character. Schema-qualified destination. Dropped + recreated by this function via DDL.
- col_key
Character. Column on
table_inthat groups competing rows. After this function runs,table_tohas at most one row per distinctcol_keyvalue.- exp_score
Character or
NULL. Optional SQL fragment yielding ascorecolumn — evaluated per row. The caller writes the SQL. Example:"CASE WHEN LOWER(a.stream_name) = LOWER(b.gnis_name) THEN 100 ELSE 0 END". When supplied, the output table gains ascorecolumn; whenNULL, no score column is added.- exp_filter
Character or
NULL. Optional SQLWHEREclause for disqualifiers, evaluated againsttable_in(or the scored intermediate whenexp_scoreis set). Example:"score >= 0"to drop rows with score < 0. The caller writes the SQL.- order_by
Character vector. Sequence of
"<expr> ASC|DESC"strings (e.g.c("score DESC", "distance_to_stream ASC")) used for theDISTINCT ON (col_key) ORDER BY ...dedup tiebreak.col_keyis prepended automatically (PostgreSQL requires the leadingORDER BYcolumns to matchDISTINCT ON).
Value
conn invisibly, for piping. Side effect: drops + recreates
table_to with all table_in columns plus (when exp_score is
supplied) a score column, deduped to one row per col_key.
Details
Designed for the "score + filter + dedup per key" pattern that
shows up wherever column-to-column comparisons disambiguate matches
beyond pure geometry: stream-name matching, watershed-group
agreement, species-code overlap, assessment-date proximity, channel-
width × stream-order compatibility, etc. Composes with
frs_point_snap() (upstream — produces multi-candidate per key) and
frs_point_match() (downstream — operates on single-candidate-per-key
input).
Reproduces bcfp's 04_pscis.sql PSCIS-to-stream selection pattern
(smnorris/bcfishpass@v0.7.14-125-g6e9cf1c) when the caller supplies
the bcfp name_score and weighted_distance ORDER BY clauses.
SQL composition:
WITH scored AS (
SELECT *, (<exp_score>) AS score
FROM <table_in>
)
SELECT DISTINCT ON (<col_key>) *
FROM scored
WHERE <exp_filter>
ORDER BY <col_key>, <order_by[1]>, <order_by[2]>, ...
The WITH scored AS CTE is omitted when exp_score = NULL (and the
caller's order_by cannot reference a score column).
The WHERE clause is omitted when exp_filter = NULL.
Caller's responsibilities (NOT this primitive's):
Producing the candidates table (multi-row-per-key).
Enriching it with JOINs to pull in score-bearing columns.
Writing
exp_score/exp_filterSQL.Writing
order_byclauses.
What this primitive does:
Sanitizes
col_key,table_in,table_toidentifiers.Composes the CTE + SELECT.
Executes via
DBI::dbExecute().
See also
Other network:
frs_network_features(),
frs_point_match()
Examples
if (FALSE) { # \dontrun{
conn <- frs_db_conn()
# bcfp PSCIS-to-stream selection (reproduces 04_pscis.sql logic):
# caller has staged candidates with stream_name (from PSCIS) and
# gnis_name (from FWA) columns joined in.
frs_candidates_pick(
conn,
table_in = "working_bulk.pscis_stream_candidates",
table_to = "working_bulk.pscis",
col_key = "stream_crossing_id",
exp_score = "CASE
WHEN LOWER(stream_name) = LOWER(gnis_name) THEN 100
WHEN stream_name IS NULL OR gnis_name IS NULL THEN 0
ELSE -100
END",
exp_filter = "score >= 0",
order_by = c("score DESC", "distance_to_stream ASC")
)
# Generic field-assessed vs user-added crossings dedup:
frs_candidates_pick(
conn,
table_in = "wsg_adms.crossings_candidates",
table_to = "wsg_adms.crossings",
col_key = "site_id",
order_by = c("assessment_date DESC", "distance_to_stream ASC")
)
DBI::dbDisconnect(conn)
} # }
