V-Fold Cross-Validation with BufferingSource:
V-fold cross-validation (also known as k-fold cross-validation) randomly
splits the data into V groups of roughly equal size (called "folds").
A resample of the analysis data consists of V-1 of the folds while the
assessment set contains the final fold.
These functions extend
to also apply an inclusion radius and exclusion buffer to the assessment set,
ensuring that your analysis data is spatially separated from the assessment
In basic V-fold cross-validation (i.e. no repeats), the number of resamples
is equal to V.
spatial_buffer_vfold_cv( data, radius, buffer, v = 10, repeats = 1, strata = NULL, breaks = 4, pool = 0.1, ... ) spatial_leave_location_out_cv( data, group, v = NULL, radius = NULL, buffer = NULL, ... )
A data frame.
Numeric: points within this distance of the initially-selected test points will be assigned to the assessment set. If
NULL, no radius is applied.
Numeric: points within this distance of any point in the test set (after
radiusis applied) will be assigned to neither the analysis or assessment set. If
NULL, no buffer is applied.
The number of partitions for the resampling. Set to
Inffor the maximum sensible value (for leave-one-X-out cross-validation).
The number of times to repeat the V-fold partitioning.
A variable in
data(single character or name) used to conduct stratified sampling. When not
NULL, each resample is created within the stratification variable. Numeric
strataare binned into quartiles.
A single number giving the number of bins desired to stratify a numeric stratification variable.
A proportion of data used to determine if a particular group is too small and should be pooled into another group. We do not recommend decreasing this argument below its default of 0.1 because of the dangers of stratifying groups that are too small.
Not currently used.
A variable in data (single character or name) used to create folds. For leave-location-out CV, this should be a variable containing the locations to group observations by, for leave-time-out CV the time blocks to group by, and for leave-location-and-time-out the spatiotemporal blocks to group by.
K. Le Rest, D. Pinaud, P. Monestiez, J. Chadoeuf, and C. Bretagnolle. 2014. "Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation," Global Ecology and Biogeography 23, pp. 811-820, doi: 10.1111/geb.12161.
H. Meyer, C. Reudenbach, T. Hengl, M. Katurji, and T. Nauss. 2018. "Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation," Environmental Modelling & Software 101, pp. 1-9, doi: 10.1016/j.envsoft.2017.12.001.
data(Smithsonian, package = "modeldata") Smithsonian_sf <- sf::st_as_sf( Smithsonian, coords = c("longitude", "latitude"), crs = 4326 ) spatial_buffer_vfold_cv( Smithsonian_sf, buffer = 500, radius = NULL ) #> # 10-fold spatial cross-validation #> # A tibble: 10 × 2 #> splits id #> <list> <chr> #> 1 <split [11/2]> Fold01 #> 2 <split [11/2]> Fold02 #> 3 <split [11/2]> Fold03 #> 4 <split [17/2]> Fold04 #> 5 <split [17/2]> Fold05 #> 6 <split [18/2]> Fold06 #> 7 <split [13/2]> Fold07 #> 8 <split [11/2]> Fold08 #> 9 <split [10/2]> Fold09 #> 10 <split [13/2]> Fold10 data(ames, package = "modeldata") ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326) ames_neighborhoods <- spatial_leave_location_out_cv(ames_sf, Neighborhood)