Spatial or cluster cross-validation splits the data into V groups of disjointed sets using k-means clustering of some variables, typically spatial coordinates. A resample of the analysis data consists of V-1 of the folds/clusters while the assessment set contains the final fold/cluster. In basic spatial cross-validation (i.e. no repeats), the number of resamples is equal to V.

spatial_clustering_cv(data, coords, v = 10, ...)

## Arguments

data A data frame. A vector of variable names, typically spatial coordinates, to partition the data into disjointed sets via k-means clustering. The number of partitions of the data set. Extra arguments passed on to stats::kmeans().

## Value

A tibble with classes spatial_cv, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and an identification variable id.

## Details

The variables in the coords argument are used for k-means clustering of the data into disjointed sets, as outlined in Brenning (2012). These clusters are used as the folds for cross-validation. Depending on how the data are distributed spatially, there may not be an equal number of points in each fold.

## References

A. Brenning, "Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest," 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, 2012, pp. 5372-5375, doi: 10.1109/IGARSS.2012.6352393.

## Examples

data(ames, package = "modeldata")
spatial_clustering_cv(ames, coords = c(Latitude, Longitude), v = 5)
#> #  5-fold spatial cross-validation
#> # A tibble: 5 x 2
#>   splits             id
#>   <list>             <chr>
#> 1 <split [1980/950]> Fold1
#> 2 <split [2590/340]> Fold2
#> 3 <split [2717/213]> Fold3
#> 4 <split [2467/463]> Fold4
#> 5 <split [1966/964]> Fold5