Spatial block cross-validationSource:
Block cross-validation splits the area of your data into a number of grid cells, or "blocks", and then assigns all data into folds based on the blocks their centroid falls into.
spatial_block_cv( data, method = c("random", "snake", "continuous"), v = 10, relevant_only = TRUE, radius = NULL, buffer = NULL, ..., repeats = 1, expand_bbox = 1e-05 )
An object of class
The method used to sample blocks for cross validation folds. Currently supports
"random", which randomly assigns blocks to folds,
"snake", which labels the first row of blocks from left to right, then the next from right to left, and repeats from there, and
"continuous", which labels each row from left to right, moving from the bottom row up.
The number of partitions for the resampling. Set to
Inffor the maximum sensible value (for leave-one-X-out cross-validation).
For systematic sampling, should only blocks containing data be included in fold labeling?
Numeric: points within this distance of the initially-selected test points will be assigned to the assessment set. If
NULL, no radius is applied.
Numeric: points within this distance of any point in the test set (after
radiusis applied) will be assigned to neither the analysis or assessment set. If
NULL, no buffer is applied.
Arguments passed to
The number of times to repeat the V-fold partitioning.
A numeric of length 1, representing a proportion to expand the bounding box of
databy before building a grid. Without this expansion, grids built from data in geographic coordinates may exclude observations and grids built from regularly spaced data might have observations fall exactly on the boundary between folds, duplicating them. In spatialsample < 0.5.0, this was 0.00001 for data in a geographic CRS and 0 for data in a planar CRS. In spatialsample >= 0.5.0, this is 0.00001 for all data.
A tibble with classes
data.frame. The results include a column for the
data split objects and an identification variable
The grid blocks can be controlled by passing arguments to
.... Some particularly useful arguments include:
cellsize: Target cellsize, expressed as the "diameter" (shortest straight-line distance between opposing sides; two times the apothem) of each block, in map units.
n: The number of grid blocks in the x and y direction (columns, rows).
square: A logical value indicating whether to create square (
TRUE) or hexagonal (
n are provided, then the number of blocks requested
n of sizes specified by
cellsize will be returned, likely not
lining up with the bounding box of
data. If only
is provided, this function will return as many blocks of size
cellsize as fit inside the bounding box of
data. If only
n is provided,
cellsize will be automatically adjusted to create the requested
number of cells.
D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera-Arroita, S. Hauenstein, J. J. Lahoz-Monfort, B. Schröder, W. Thuiller, D. I. Warton, B. A. Wintle, F. Hartig, and C. F. Dormann. "Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure," 2016, Ecography 40(8), pp. 913-929, doi: 10.1111/ecog.02881.