<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>cross-validation | GeoDS</title><link>https://geods.netlify.app/tag/cross-validation/</link><atom:link href="https://geods.netlify.app/tag/cross-validation/index.xml" rel="self" type="application/rss+xml"/><description>cross-validation</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2021-2022 Alexander Brenning</copyright><lastBuildDate>Sun, 05 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://geods.netlify.app/media/icon_hu22a554fba6643c20c80139d4c0ffb6d4_21540_512x512_fill_lanczos_center_3.png</url><title>cross-validation</title><link>https://geods.netlify.app/tag/cross-validation/</link></image><item><title>Aligning Model Validation with Deployment</title><link>https://geods.netlify.app/post/cross-validation/</link><pubDate>Sun, 05 Apr 2026 00:00:00 +0000</pubDate><guid>https://geods.netlify.app/post/cross-validation/</guid><description>
&lt;p>&lt;strong>Cross-validation (CV)&lt;/strong> is routinely used to estimate predictive performance when independent test data are unavailable. In spatial and environmental applications, however, CV often evaluates the wrong quantity: it reflects the sampling design rather than the conditions under which the model is ultimately deployed.&lt;/p>
&lt;p>In &lt;strong>spatial prediction&lt;/strong>, monitoring networks are rarely representative of the full prediction domain. Air-quality stations, for example, are concentrated in urban areas, while predictions are required across entire regions. As a result, validation tasks generated by CV differ fundamentally from deployment tasks.&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
&lt;/p>
&lt;p>Key concept — Deployment risk: Predictive performance should be evaluated as the expected loss over deployment tasks rather than sampled validation tasks.&lt;/p>
&lt;p>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Prediction tasks can be represented as &lt;span class="math inline">\(T = (x, d)\)&lt;/span>, where &lt;span class="math inline">\(x\)&lt;/span> denotes covariates and &lt;span class="math inline">\(d\)&lt;/span> characterizes task difficulty such as prediction distance.&lt;/p>
&lt;p>&lt;strong>Target-Weighted Cross-Validation (TWCV)&lt;/strong> reweights validation losses so that the distribution of validation tasks aligns with deployment. To achieve this, TWCV uses calibration weighting to match marginal distributions of task descriptors—covariates as well as prediction distance as a proxy for task difficulty. TWCV is related to importance-weighted CV (IWCV), which however uses models to assess sampling density ratios—this can be unstable in high-dimensional feature space.&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
&lt;/p>
&lt;p>Take-home message: Cross-validation bias in spatial prediction is primarily driven by task distribution mismatch.&lt;/p>
&lt;p>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>&lt;strong>Buffered task generators&lt;/strong>, such as buffered leave-one-out, produce validation tasks with a broader range of prediction distances, improving coverage of the deployment task space.&lt;/p>
&lt;div class="figure" style="text-align: center">&lt;span style="display:block;" id="fig:fig-task">&lt;/span>
&lt;img src="figures/aqfig2.png" alt="Task distribution mismatch between validation and deployment depends on the task generator being used. Case study: Air quality (NO2) in Germany. Buffered LOO gets us closer to the deployment situation, slightly better than Spatial kNNDM, but weighting is still necessary." width="1800" />
&lt;p class="caption">
Figure 1: Task distribution mismatch between validation and deployment depends on the task generator being used. Case study: Air quality (NO2) in Germany. Buffered LOO gets us closer to the deployment situation, slightly better than Spatial kNNDM, but weighting is still necessary.
&lt;/p>
&lt;/div>
&lt;p>The NO&lt;span class="math inline">\(_2\)&lt;/span> case study illustrates how biased sampling leads to distorted validation results and how TWCV corrects this. It agrees with IWCV, an alternative approach that requires &lt;em>modeling&lt;/em> of density ratios. Simulation studies confirm that conventional non-spatial and spatial CV estimators can indeed be biased, depending on the sampling scenario.&lt;/p>
&lt;div class="figure" style="text-align: center">&lt;span style="display:block;" id="fig:fig-no2">&lt;/span>
&lt;img src="figures/aqfig3.png" alt="Air quality (NO2) case study results comparing validation strategies for two spatial prediction models: random forest and regression--kriging. A deeper dive into the results shows that the TWCV and IWCV are plausible while the other CV estimators are pessimistically biased. Weighted random CV performs OK here, but it is biased in other scenarios in our simulation studies." width="750" />
&lt;p class="caption">
Figure 2: Air quality (NO2) case study results comparing validation strategies for two spatial prediction models: random forest and regression–kriging. A deeper dive into the results shows that the TWCV and IWCV are plausible while the other CV estimators are pessimistically biased. Weighted random CV performs OK here, but it is biased in other scenarios in our simulation studies.
&lt;/p>
&lt;/div>
&lt;p>TWCV reframes cross-validation as a &lt;strong>distribution alignment problem&lt;/strong>: the validation task generator samples from one distribution, while deployment corresponds to another. Bias arises when these distributions differ. Target-weighting fixes this discrepancy. Still—TWCV requires a task generator such as buffered leave-one-out resampling that ensures good coverage of the deployment task distribution’s support.&lt;/p>
&lt;div id="reference" class="section level2">
&lt;h2>Reference&lt;/h2>
&lt;p>Brenning, A., &amp;amp; Suesse, T. (2026). Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction. &lt;em>arXiv preprint&lt;/em>, &lt;a href="https://arxiv.org/abs/2603.29981" class="uri">https://arxiv.org/abs/2603.29981&lt;/a>&lt;/p>
&lt;p>&lt;img src="http://vg09.met.vgwort.de/na/cd6d5d1b712a4324bb4dfab1450f959e" width="1" height="1" alt="">&lt;/p>
&lt;/div></description></item></channel></rss>