Publication date: 29 juni 2026
University: Wageningen University

From Space to Soil

Summary

Digital soil mapping (DSM) refers to the quantitative prediction of soil types or soil properties in digital form and at defined spatial and temporal resolutions. Over the past decade, DSM applications targeting Soil Organic Carbon (SOC) have expanded rapidly. This growth reflects two converging developments. First, SOC has gained prominence as a key indicator of soil functioning and a leverage point for climate change mitigation, creating a strong demand for spatially explicit and temporally dynamic information on its distribution. Second, DSM of SOC has become increasingly feasible due to the increasing collection of SOC measurements, a rising commitment to open data, expanding Earth Observation (EO) archives, and the widespread adoption of machine learning techniques for environmental modelling. Together, these factors have transformed SOC mapping from a research-driven activity into an emerging operational capability, able to provide spatially explicit and temporally dynamic information at continental scales.

Despite these developments, producing pan-European SOC maps that resolve three spatial dimensions and time (3D+T) remains a gap. Progress is constrained by the lack of a continent-wide harmonized SOC database, the limited availability of consistent, ready-to-use, and high-resolution EO data cubes tailored for environmental modelling, and the considerable computational resources required for large-scale spatiotemporal modelling. The overarching goal of this thesis is therefore to address these constraints by developing an operational, wall-to-wall SOC DSM pipeline for Europe at 30 m resolution, spanning multiple soil depths and time periods. In doing so, the thesis aims not only to deliver new SOC datasets, but also to identify the methodological and practical challenges encountered and explore viable solutions that support the future growth of 3D+T DSM.

Chapter 1 is an introduction to the thesis, presenting the motivation for monitoring SOC and the developments in DSM that enable large-scale spatiotemporal SOC modelling. It reviews existing large-scale SOC mapping products and highlights opportunities to extend them toward more spatially and temporally detailed representations. It concludes with an overview of how the subsequent chapters contribute to the main thread of the work: the key efforts undertaken to achieve pan-European SOC mapping at higher spatial and temporal resolution, the challenges identified along the way, and the exploration of a strategy to address them.

Chapter 2 lays the groundwork by constructing an analysis-ready and cloud-optimized EO data cube for Europe, derived from the Landsat ARD V2 archive (2000–2022). Although Europe is richly monitored from space, its EO resources are dispersed across sensors, formats, and preprocessing standards, and often contain substantial gaps due to poor-quality pixels. This fragmentation hampers their use in environmental modelling, which relies on coherent and temporally consistent data. This cube addresses these limitations by reorganizing more than two decades of Landsat observations into a harmonized, gap-reduced, temporally consistent, and thematically rich feature space. It provides 30 m predictors at bimonthly, annual, and long-term scales, capturing vegetation dynamics, soil exposure, hydrological conditions, and tillage signals. The chapter details the reconstruction of time series affected by low-quality pixels, along with a comprehensive quality assessment of the resulting long-term record, including evaluating gap-filling accuracy using artificially introduced gaps, conducting plausibility checks of spectral index proxies for land-surface processes at the European scale, visual inspection, and predictive tests using SOC regression and land cover classification. As the foundation for all subsequent chapters, the cube supplies the spatial and temporal context needed for meaningful and comparable SOC predictions across regions, depths, and years at high spatial resolution. Its open release further supports transparent, reproducible, and scalable DSM workflows.

Chapter 3 presents the measurements-to-maps spatiotemporal modelling framework developed to produce 3D+T high-resolution SOC density time series across Europe. Using harmonized SOC observations from across the continent together with the Landsat-based feature cube and additional EO-derived predictors, the chapter details how Random Forest (RF) and Quantile Regression Forests (QRF) were applied to generate biannual 30 m SOC density maps for multiple depth intervals down to 2 m for the period 2000–2022. In addition to predictions, the framework provides per-pixel uncertainty estimates and extrapolation risk layers to support transparent communication of map reliability and limitations to users. Independent validation shows good overall performance, with accuracy varying by land cover type, depth interval, and year. The resulting maps offer detailed spatial and temporal patterns of SOC across Europe, reflecting known soil–landscape relationships and long-term environmental gradients. Model driver analysis highlights soil depth as the dominant predictor, followed by vegetation indices, long-term bioclimate, and topography. Although pixel-level uncertainties remain substantial, particularly at greater depths, spatial aggregation reduces them. Together, these products constitute the first operational, temporally explicit SOC baseline for Europe at 30 m resolution.

Chapter 4 examines one of the major challenges in large-scale SOC mapping: the detectability of SOC change given the substantial uncertainties inherent in current modelling approaches. Detecting temporal SOC dynamics is inherently difficult because SOC changes slowly over time and exhibits high spatial variability. To quantify this challenge, the chapter introduces a model-based signal-to-noise ratio (SNR) framework, in which detectability is defined as the ratio between predicted SOC change and its associated uncertainty. Applied across Europe using RF and QRF models and evaluated with repeated SOC observations, the framework compares two modelling strategies—change first and state first. Results show that SOC change is generally undetectable at the pixel scale, with SNR values typically below one. This outcome reflects the combined limitations of current DSM–ML–EO workflows: restricted SOC measurements quality and temporal length, EO-related uncertainties, and simplified model assumptions. Nonetheless, spatial aggregation improves SNR, enabling more reliable assessments of SOC change at regional and national scales. The chapter argues that routine reporting of SNR diagnostics is essential to increase transparency in future SOC dynamics modelling and mapping efforts.

Chapter 5 explores whether hybrid, soil-science–informed models can improve the reconstruction of SOC density (g/cm3) when bulk density (BD, g/cm3) measurements are sparse compared to the availability of SOC concentration (g/kg) data, even though both are required to compute SOC density. Conventional DSM–ML approaches typically model SOC density in a univariate manner, which not only under-utilises available information but may also overlook key physical relationships between SOC concentration and BD. Leveraging the abundant SOC concentration data from multiple Land Use/Cover Area Frame Statistical Survey (LUCAS) survey years, this chapter compares three neural network architectures: a univariate network, a multivariate network, and a hybrid soil-informed network that embeds established SOC–BD relationships into its latent structure. The models are evaluated on their ability to predict SOC density, reproduce the joint SOC–BD distribution, and reconstruct reliable SOC density time series. Results show that, compared to the univariate model, the multivariate model and especially the hybrid soil-informed model produces more physically consistent SOC density trajectories, mitigating the noisy or implausible patterns often observed in univariate predictions. Additional plausibility checks on the soil-informed model’s latent BD components indicate that embedding domain knowledge can steer the model toward more soil-physically meaningful representations. This chapter demonstrates the potential of hybrid, soil-science–informed neural networks to enhance SOC density reconstruction in data-limited DSM contexts.

Chapter 6 reviews the gaps addressed, the challenges encountered, and the potential solutions identified. It also reflects on the broader lessons learned and outlines key questions for future research. Together, these elements position the thesis in continuity with prior developments while contributing to future advances. These contributions were developed at a time when DSM of SOC is shifting from producing detailed, large-scale 3D+T SOC maps toward addressing broader questions of their interpretation, usability, and improvement. At the outset, generating consistent, high-resolution SOC maps with quantified uncertainty across Europe represented both a research and operational gap, as well as a practical need for soil-health monitoring, carbon accounting, and land management. The increasing availability of relevant studies and products, however, raises further questions: the practical limits of model accuracy in data-poor settings; transparent and appropriate communication of uncertainty; the relevance of continental-scale products for local decision-making; and the methodological considerations that emerge once technical limitations are no longer the primary constraint. In several cases, this thesis reinforces known challenges and introduces new ones, rather than providing definitive solutions. By systematically documenting advances and limitations in data preparation, modelling, uncertainty quantification, SOC-change detectability, and integration with soil science, this work seeks to clarify the opportunities and constraints of current DSM–ML–EO approaches. In doing so, it aims to offer stepping stones for future researchers in developing more reliable, scalable, and transparent SOC monitoring frameworks, and to support the broader community in fostering open, honest, and constructive discussions.

See also these dissertations

We print for the following universities