
Reproducing bcfishpass with link + fresh
Source:vignettes/reproducing-bcfishpass.Rmd
reproducing-bcfishpass.Rmdbcfishpass is
the reference model for freshwater habitat classification and fish
passage prioritization in British Columbia. link + fresh provide a
configurable, reproducible R-side pipeline. The bundled
"bcfishpass" config reproduces bcfishpass’s classification
method. Other configs can express other methods; the package is
method-agnostic.
This vignette walks through what the bcfishpass configuration does,
how to run it, and how the output compares to bcfishpass reference
tables. Full per-phase pipeline detail lives in research/bcfishpass_comparison.md.
Prerequisites
The pipeline reads from a PostgreSQL database with fwapg loaded. fwapg is the
processed form of the BC Freshwater Atlas — it adds
wscode_ltree and localcode_ltree columns to
the stream-network tables (PostgreSQL ltree types encoding
watershed topology) and provides the SQL functions the pipeline uses to
traverse the network: fwa_upstream,
fwa_downstream,
fwa_watershedatmeasure,
and others. See fwapg’s repository for installation.
bcfishobs is
optional but recommended — it populates
bcfishobs.observations, the table that drives per-species
overrides of natural barriers below.
The comparison layer in the map at the end of this vignette reads from a read-only tunnel to the bcfishpass reference database. That is a validation convenience, not a requirement for running link.
How the bcfishpass configuration works
The rollup measures intrinsic habitat potential conditioned on accessibility. Intrinsic potential is a segment’s fit to per-species habitat rules (edge type, waterbody, channel width, gradient). Accessibility is whether fish can reach the segment without crossing a blocking natural barrier. bcfishpass records intrinsic classification on every segment, together with labels that name the downstream obstacles blocking it. The rollup in this vignette aggregates only the subset that is both intrinsically suitable and accessible — accessibility and intrinsic potential are separable in general, and a fuller treatment would report both.
FWA stream network (via fwapg, ltree-enriched)
│
│ gradient thresholds detect barriers @ 15 / 20 / 25 / 30 %
▼
gradient barriers ─── falls ─── user-identified definite barriers
│
│ observations override natural barriers per access model
▼
access model per species
│
│ break positions = observations + minimal gradient barriers
│ + habitat classification endpoints + crossings
▼
segmented streams (every segment ends where a rule decision can change)
│
│ per-species rules from rules.yaml
│ edge type • waterbody type • channel width • gradient
▼
classify (spawning ? rearing ? per species per segment)
│
│ user_habitat_classification overlay flips reviewer-confirmed
│ reaches to TRUE regardless of rule predicate
▼
classify + overlay
│
│ frs_cluster for rearing-spawning connectivity;
│ connected-waterbody rules for SK
▼
streams_habitat (per-species spawning / rearing booleans per segment)
Where breaks go, and why
A break is a point where one segment ends and the next begins. Every segment is one classification unit. Breaks therefore fall at positions where the decision can change:
-
Observations. bcfishpass’s per-species access models flip a natural-barrier reach (gradient barrier, falls, or user-definite barrier) to accessible when the count of upstream fish observations meets a threshold. Thresholds and species filters vary per model (see the SQL under
model/access/). Per-species parameters used by link live in the bundled"bcfishpass"config’sparameters_fresh.csv(observation_threshold,observation_date_min,observation_buffer_m,observation_species). Override counting is done in SQL viafwa_upstreambylnk_barrier_overrides. For BULK (bcfishpass commitea3c5d8):- BT — ≥ 1 observation of BT, CH, CM, CO, PK, SK, or ST; any date
- CH / CM / CO / PK / SK — ≥ 5 observations in that salmon set, on or after 1990-01-01
- ST — ≥ 5 observations of CH, CM, CO, PK, SK, or ST, on or after 1990-01-01
- WCT — ≥ 1 observation of WCT; any date
Minimal gradient barriers. On any flow path with multiple gradient barriers, only the downstream-most matters for access — everything upstream is already blocked by it. The pipeline reduces to the minimal set per species-class (via
fresh::frs_barriers_minimal()) so segmentation doesn’t split reaches that would end up in the same access state.User-identified definite barriers — positions listed in bcfishpass’s
user_barriers_definite.csv(mirrored atinst/extdata/configs/bcfishpass/overrides/user_barriers_definite.csv). Each row specifiesblue_line_keyanddownstream_route_measurefor a barrier that always blocks access — reviewer-added positions covering EXCLUSION zones and MISC barriers the model doesn’t detect from gradient / falls / subsurface detection. These are always- blocking, always a break position, and never eligible for observation-based override — matches bcfishpass’smodel_access_*.sql, which appendsbarriers_user_definitepost-filter viaUNION ALLso upstream observations and habitat confirmations never re-open them.Habitat classification endpoints — manual spawning / rearing delineations from bcfishpass’s
user_habitat_classification.csv(mirrored atinst/extdata/configs/bcfishpass/overrides/user_habitat_classification.csv). Each row recordsblue_line_key,downstream_route_measure,upstream_route_measure,species_code, andhabitat_ind. Breaks are placed at both measures so the marked reach is its own segment.Crossings — road, rail, and utility crossings carrying
barrier_statusofPASSABLE,POTENTIAL,BARRIER, orUNKNOWN. Each crossing at a distinct position gets its own segment boundary so habitat upstream of each can be attributed to it.
Natural accessibility — gradient barriers, falls, and user-definite
barriers — is the only gate in this configuration. Crossings are
segmentation boundaries here, not access blockers: a segment upstream of
a BARRIER-status crossing stays classified on its intrinsic
rule match, so rollup kilometres are not reduced by crossings. A
different composition (same pipeline,
label_block = c("blocked", "barrier")) answers the distinct
question of what habitat would be accessible if anthropogenic barriers
were fixed — worth a separate rollup.
Where classification comes from
Once segmented, each segment is checked against the per-species rules
in rules.yaml.
The YAML is generated from dimensions.csv
via lnk_rules_build().
Top-level keys are species codes. spawn: and
rear: are lists of alternative match conditions — any match
marks the segment. Conditions combine:
-
edge_types_explicit— FWAedge_typeinteger codes (1000 / 1100 stream, 2000 / 2300 river, 1050 / 1150 wetland, 1200 lake). Membership is a per-species decision recorded in the rules file. -
waterbody_type—Rriver polygon,Llake. -
channel_width—[min, max]metres. - Gradient bounds for spawning and rearing (via
parameters_fresh.csvand fresh’s thresholds CSV).
Known-habitat overlay
After the rule-based pass, lnk_pipeline_classify() calls
fresh::frs_habitat_overlay()
to layer reviewer-curated habitat on top of the model output. The
overlay reads the same user_habitat_classification.csv
that bcfishpass uses to populate its streams_habitat_known
table: each row is a
(blue_line_key, drm, urm, species_code, habitat_type, habitat_ind)
tuple flipping segments inside the range to TRUE, regardless of the rule
predicate. fresh ≥ 0.21.0 does the join via a 3-way bridge through
fresh.streams for range containment.
The result mirrors bcfishpass’s published
streams_habitat_linear.spawning_<sp> integer column —
model classifications (1 / 2) plus known-habitat overrides (3) — rather
than the model-only habitat_linear_<sp> boolean. The
overlay is opt-in per config: only bundles whose manifest declares
habitat_classification: invoke it. The bundled
"bcfishpass" and "default" configs both
do.
Stream-order bypass — not applied in this config
bcfishpass applies a rearing-side bypass on the channel-width minimum
for BT / CH / CO / ST / WCT when a first-order stream’s parent is order
≥ 5. The bundled "bcfishpass" config does not apply that
bypass. Numeric impact and the reasoning are in research/bcfishpass_comparison.md.
Running the pipeline
library(link)
library(targets)
# `_targets.R` lives in data-raw/; run from that directory.
setwd("data-raw")
tar_make() # 5 WSGs, serial
rollup <- tar_read(rollup) # per-WSG × species × habitat tibbletar_make() runs compare_bcfishpass_wsg()
once each for Adams (ADMS), Bulkley (BULK), Babine (BABL), Elk (ELKR),
and Deadman (DEAD), binding the per-WSG tibbles into one rollup. Each
call exercises the six lnk_pipeline_* phases. ADMS/BULK/
BABL/ELKR span the species assemblages used in bcfishpass validation —
BT with CH, CO, SK on ADMS; PK and ST added on BULK and BABL; BT with
WCT on ELKR. DEAD is an end-to-end test for the
barriers_definite_control wiring: it has a single
barrier_ind = TRUE control row with enough anadromous
observations upstream to exercise the filter, which the other four WSGs
don’t. Method agreement across this spread is stronger evidence than
agreement on a single WSG.
The rollup
rollup <- readRDS(system.file("extdata", "vignette-data", "rollup.rds",
package = "link"))link_km and bcfishpass_km are kilometres
classified as habitat (spawning or rearing, conditioned on natural
accessibility) per species × watershed group.
diff_pct = (link_km − bcfishpass_km) / bcfishpass_km × 100.
.pivot <- function(rollup, which_habitat) {
x <- rollup[rollup$habitat_type == which_habitat,
c("species", "wsg", "diff_pct")]
w <- stats::reshape(x, idvar = "species", timevar = "wsg",
direction = "wide", v.names = "diff_pct")
names(w)[-1] <- sub("diff_pct\\.", "", names(w)[-1])
cols <- intersect(c("species", "ADMS", "BULK", "BABL", "ELKR", "DEAD"),
names(w))
w <- w[order(w$species), cols]
row.names(w) <- NULL
w
}
knitr::kable(.pivot(rollup, "spawning"),
digits = 1,
caption = "Spawning parity (% diff vs bcfishpass)")
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=ADMS: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=BULK: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=BABL: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=ELKR: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=DEAD: first taken| species | ADMS | BULK | BABL | ELKR | DEAD |
|---|---|---|---|---|---|
| BT | 1.8 | 3.1 | 4.1 | 2.8 | 2.1 |
| CH | 0.5 | 1.9 | 3.9 | — | 1.4 |
| CO | 2.2 | 4.1 | 4.8 | — | 1.3 |
| PK | — | 2.5 | — | — | 1.1 |
| RB | — | — | — | — | — |
| SK | 9.6 | 2.6 | 43.8 | — | — |
| ST | — | 2.3 | 3.9 | — | 1.3 |
| WCT | — | — | — | 3.8 | — |
knitr::kable(.pivot(rollup, "rearing"),
digits = 1,
caption = "Rearing parity (% diff vs bcfishpass)")
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=ADMS: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=BULK: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=BABL: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=ELKR: first taken
#> Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying =
#> varying, : multiple rows match for wsg=DEAD: first taken| species | ADMS | BULK | BABL | ELKR | DEAD |
|---|---|---|---|---|---|
| BT | -1.1 | -2.2 | -1.9 | -1.2 | -0.2 |
| CH | 2.3 | 2.6 | 4.3 | — | 1.4 |
| CO | 3.4 | 5.1 | 11.6 | — | 4.1 |
| PK | — | — | — | — | — |
| RB | — | — | — | — | — |
| SK | 0.0 | 0.0 | 0.0 | — | — |
| ST | — | -0.1 | 0.6 | — | 0.0 |
| WCT | — | — | — | 1.5 | — |
Observed differences come from the stream-order bypass omission —
visible as the uniformly negative BT rearing column — and from
segmentation-boundary rounding where per-segment attributes fall near
rule thresholds. The rollup’s bcfishpass side reads the model-only
habitat_linear_<sp> boolean tables; link’s side
includes the known-habitat overlay (the overlay step), so WSGs where
reviewer-curated user_habitat_classification.csv
contributes meaningful km will show link slightly larger — most visibly
BABL SK spawning. The map section below uses the published
streams_habitat_linear integer table (model + known) on the
bcfishpass side so the layers shown are apples-to-apples. Numeric detail
is in research/bcfishpass_comparison.md.
Comparison map — Neexdzii Kwa (Upper Bulkley)
The watershed upstream of the Neexdzii Kwa / Wetzin Kwa (Bulkley /
Morice) confluence, built via
FWA_WatershedAtMeasure(360873822, 166030.4). Sits inside
the BULK watershed group, so the BULK rollup above aggregates this area
along with the rest of the Bulkley.
The link pipeline layer is visible by default; toggle on the bcfishpass reference layer to compare.
sub_ch <- readRDS(system.file("extdata", "vignette-data",
"sub_ch.rds", package = "link"))
sub_ch_bcfp <- readRDS(system.file("extdata", "vignette-data",
"sub_ch_bcfp.rds", package = "link"))
if (requireNamespace("mapgl", quietly = TRUE)) {
pal_values <- c("spawning only", "rearing only", "spawning + rearing")
pal_colors <- c("#e31a1c", "#1f78b4", "#6a3d9a")
mapgl::maplibre(
bounds = sf::st_bbox(sub_ch),
style = mapgl::carto_style("positron")
) |>
mapgl::add_line_layer(
id = "bcfishpass",
source = sub_ch_bcfp,
line_color = mapgl::match_expr("habitat",
values = pal_values, stops = pal_colors, default = "#999999"),
line_width = 3,
line_opacity = 0.6,
visibility = "none"
) |>
mapgl::add_line_layer(
id = "link",
source = sub_ch,
line_color = mapgl::match_expr("habitat",
values = pal_values, stops = pal_colors, default = "#999999"),
line_width = 2
) |>
mapgl::add_legend(
"Neexdzii Kwa (Upper Bulkley) · modelled chinook habitat",
values = pal_values,
colors = pal_colors,
type = "categorical",
position = "top-right"
) |>
mapgl::add_layers_control(
collapsible = TRUE,
position = "top-left"
)
} else {
message("Install `mapgl` (pak::pak('mapgl')) to render this map.")
}Reproducibility
The pipeline is deterministic. Two tar_make()
invocations on the same fwapg + bcfishobs state produce bit-identical
rollups. When input data shifts — a channel_width sync, new
observations loaded into bcfishobs, a bcfishpass reference
refresh — outputs will correctly differ.
Further reading
-
research/bcfishpass_comparison.md— per-phase pipeline DAG, parity numbers, documented gaps -
?lnk_pipeline_setup/?lnk_pipeline_load/?lnk_pipeline_prepare/?lnk_pipeline_break/?lnk_pipeline_classify/?lnk_pipeline_connect— phase helper reference -
?lnk_config— config bundle structure -
inst/extdata/configs/bcfishpass/— bundled config (rules YAML, dimensions CSV, per-species parameters, overrides) -
data-raw/_targets.R— pipeline definition -
data-raw/compare_bcfishpass_wsg.R— per-AOI target function