Skip to contents

Downloads a GPKG layer at two points in time and returns a tibble of the row-level changes (inserts, updates, deletes) between them. Uses pygeodiff locally after extracting the target layer into single-table temp copies, which sidesteps cross-layer schema drift.

Usage

rfp_mergin_diff(project, path, layer, from, to = NULL, key = NULL)

Arguments

project

Character. Full Mergin project name as namespace/project.

path

Character. GPKG file path within the project, e.g. "background_layers.gpkg".

layer

Character. Table/layer name inside the GPKG, e.g. "whse_basemapping.fwa_watershed_groups_poly".

from

Date, POSIXct, ISO-8601 string, or "vN". The starting point. When a date/time is given, the last project version created strictly before that time is used.

to

Optional. Same types as from. Defaults to the server HEAD.

key

Optional character. Name of the column to treat as the stable primary key for row matching. Strongly recommended. Without this, pygeodiff matches rows on sqlite's fid (rowid), which gets renumbered whenever a row is inserted — so a single real insert often reports as one insert plus N updates for every row whose rowid shifted.

Value

Long tibble with columns project, from_version, to_version, table, row, change ("insert", "update", or "delete"), column_name, old, new. Pivot wide with tidyr::pivot_wider(names_from = column_name, values_from = new) for a row-per-change view.

Examples

if (FALSE) { # \dontrun{
# Which watershed groups were added to sern_fraser_2024 since 2025-03-31?
d <- rfp_mergin_diff(
  project = "newgraph/sern_fraser_2024",
  path    = "background_layers.gpkg",
  layer   = "whse_basemapping.fwa_watershed_groups_poly",
  from    = "2025-03-31",
  key     = "watershed_group_code"
)
# Inserts only — one row per new group
library(dplyr); library(tidyr)
d |>
  filter(change == "insert") |>
  mutate(new = sapply(new, function(x) if (is.null(x)) NA
                                        else as.character(x))) |>
  select(row, column_name, new) |>
  pivot_wider(names_from = column_name, values_from = new) |>
  select(any_of(c("watershed_group_code", "watershed_group_name")))
} # }