Publication date: 19 mei 2022
University: Wageningen University
ISBN: 978-94-6447-124-3

Statistical analysis and modelling of crop yield and nitrogen use efficiency in China

Summary

Crop yields in China increased substantially over the past decades, mainly driven by the increasing use of chemical fertilizers, improved crop varieties and agronomic management. Nitrogen (N), as a major constituent of chemical fertilizer, is applied to agricultural fields to improve the growth and yield of crop. However, excess N application not only decreases the economic efficiency of fertilizer application, but can also result in serious environmental problems, such as waterbody eutrophication, greenhouse gas emission and soil acidification. In other words, there is an urgent need to improve nitrogen use efficiency (NUE), since this would allow increasing yield and profits with minimal environmental impact.

Recent research showed that there is a large temporal and spatial variation of crop yield and NUE in China. But existing research did not perform a thorough analysis and interpretation of this phenomenon. Explanatory variables of NUE, such as socio-economic variables (e.g., income), agricultural management practice (e.g., irrigated area, agricultural machinery) and environmental variables (e.g., soil, climate) are crucial for explaining the variation of NUE in space and time, developing strategies to balance crop yield, profitability and environmental sustainability, and achieving suitability-based efficient agricultural management. Most existing research only concentrated on the influence of N application rate, crop variety and soil type on NUE by performing experiments for specific sites, which does not yield representative relationships between NUE and explanatory variables for the entire country. More advanced statistical methods are required to explore the influence factors of NUE. Stepwise multiple linear regression (SMLR) models the linear relation between a dependent variable (i.e., NUE indicator) and explanatory variables, by an iterative process that continues to add or remove variables from the regression equation until there is no improvement. Random forest (RF) is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees. Both SMLR and RF are practical, meaningful, and informative methods for exploring the effect of explanatory variables on NUE and quantifying their relative importance in explaining NUE variability from a large data set. Policy makers are typically focused on overall patterns, and hence they are more interested in general findings for aggregated crops. However, there is no established scientific and unified method to aggregate yield and NUE among different crops regionally.

The main objective of this thesis was to apply (geo)statistical methods to analyze and explain space-time patterns of crop yield and NUE in China at two spatial scales, to support the development of effective strategies and policies. These policies should improve NUE in a sustainable way without impacting crop productivity. The objective was approached based on: 1) an overview and analysis of space-time variation of NUE and corresponding model predictions with agricultural, environmental and economic explanatory variables (Chapter 2); 2) an overview and analysis of space-time variation of different crop yields and their relation with explanatory variables (Chapter 3); 3) a finer resolution exploration and statistical modelling of space-time NUE in northeast China (Chapter 4); and 4) uncertainty quantification of NUE predictions using Monte Carlo simulation and quantile regression forests in China (Chapter 5). Chapters 2, 3 and 5 focused on analysis at provincial scale, while Chapter 4 was carried out at county scale.

In Chapter 2, I collected yield, livestock, and fertilizer data and corresponding parameters at provincial scale. Spatial and temporal variation of NUE was analyzed and revealed in maps and graphs. In addition, I developed and calibrated multiple linear regression models that predict NUE indicators at provincial scale in China from explanatory variables (crop type, climate, topography, soil type and properties, economic variables and agricultural management practices (AMP)). The results showed substantial temporal and spatial variation of the partial factor productivity of N (PFPN) and partial nutrient balance of N (PNBN). PFPN was larger in east and south China than in central and west China. It was also smaller than 30 kg kg-1 yr-1 in most provinces. PNBN was low in south China (< 0.40 kg kg-1 yr-1), moderate in most provinces (0.41-0.50 kg kg-1 yr-1), and high in northeast and southwest China. The PFPN in China decreased from 32 kg kg-1 in 1978 to 27 kg kg-1 in 1995, after which it increased to 38 kg kg-1 in 2015. PNBN varied from 0.53 in 1978 to 0.38 kg kg-1 in 2000, after which it remained constant until 2015. SMLR proved to be an effective and powerful modelling approach to model and predict NUE and derive the major influencing factors of the dependent variables. The models derived in Chapter 2 explained more than 70% of the variation of NUE. Crop types and various soil properties were influential factors of the PFPN model, while crop types, climate and soil properties accounted for most of the variation of PNBN. Although the models could explain a large part of the spatial and temporal variation, they may be improved by expanding the covariate set with additional relevant variables and by exploring the use of non-linear statistical models (as was done in Chapter 4). Suitable crop types, temperature and soil properties should be considered by policy makers when taking decisions on developing agricultural land management in an agricultural resource use efficient way, by balancing NUE, productivity, and the environment. In Chapter 3 I analyzed temporal and spatial variation of yield for multiple crop aggregations. Stepwise multiple linear regression was used to explore the relationships between crop yield and agricultural, environmental and economic explanatory variables. The temporal and spatial patterns of yields were different for different levels of crop aggregations. Most of the models explained more than 60% of the crop yield variance, except for rice, potato and cotton. AMP, soil and economic covariates were the most important factors in all models. Topography had an influence on the aggregate yield (provincial yield including all crops, calculated as provincial production divided by provincial cultivated land) but was not included in the staples and cash model. Instead, climatic covariates were important for the staples and cash models, but not for the aggregate yield model. Model performance for the aggregate yield was different for each province in individual years and residuals of the regression model had distinct spatial and temporal patterns. Hence, a more detailed analysis of model performance and residual analysis is needed to explore the causes of these patterns. The models could not predict the impact of natural hazards, plant diseases and insect pests due to lack of data. This may be improved in future research using a combination of natural disaster prediction and pest diagnosis analysis. With the increasing food requirement and limited agricultural land resources, enhancing economic growth might be a possible solution for China to safeguard food security, if this is combined with better management practices, breeding and planting technologies, and taking account of crop suitability (i.e., adaptability of crops to the local environment). Since a provincial scale analysis may be too coarse for some policy decisions, I also analyzed spatial and temporal variation of NUE at county scale in the high NUE region in northeast China, expecting remaining potentials of NUE improvement (Chapter 4). Results demonstrated that the NUE indicators decreased in most counties during the study period and were higher in Heilongjiang than in the other two provinces of northeast China. SMLR and RF models were both applied in this chapter, to explore the explanatory variables of NUE from a more comprehensive and complex perspective. The RF model had a superior performance than the SMLR model, indicating that many covariates had a non-linear relation with NUE. Both models smoothed the reality and underpredicted high extremes and overpredicted low extremes. The relative importance of crop covariates was much higher in SMLR than in RF, while soil and climatic covariates were more important in RF, confirming a difference between linear and non-linear models of the relation between dependent and explanatory variables. These novel findings are particularly valuable when put into action in supporting land-use management and policymaking. As we know, no model is perfect. In order to quantify the uncertainty of NUE prediction uncertainty in RF modelling, in Chapter 5, I conducted a comprehensive uncertainty analysis using Monte Carlo simulation and quantile regression forests (QRF), with a consideration of the spatial and temporal correlation of measurement errors. I used three scenarios (pessimistic, reference, and optimistic) to evaluate the sensitivity of the results for the magnitude of the measurement errors in yield, N input and N removal. The results showed, as expected, that NUE calculations uncertainty of the reference scenario was larger than that of the optimistic and smaller than that of the pessimistic scenario. The differences between scenarios were large, which indicates that proper quantification of input errors is important. For PFPN calculations, Guangxi and Shanghai had the largest probability distribution width between the 0.05 and 0.95 quantiles, while Jilin and Inner Mongolia had the smallest. For PNBN calculations, Heilongjiang and Jilin had the largest distribution width, while Beijing and Hainan had the smallest. Results also revealed that the temporal variation of NUE prediction uncertainty (90% Prediction Interval Width, PIW90 and Prediction Interval Ratio, PIR90) had a downward trend due to the improvement of technology and policy. In 2015, the PFPN had lower uncertainty in northeast China, while PNBN had higher uncertainty in northeast China. This was likely caused by the difference in major crop types between these regions. NUE had smaller input uncertainty than model uncertainty in most provinces, except for PNBN, which showed converse results after 2010. This means that the QRF model had a better performance for PNBN in the 2010s. Future work should focus on bookkeeping of detailed field data and accurate collection of crop parameters and explanatory variables. The thesis synthesis is given in Chapter 6. It discusses the main findings of this thesis, my personal implications and recommendations for government and policy makers, and points out the innovations and limitations of this thesis research. In conclusion, the relative importance of explanatory variables can be diverse at different scales and for different crops, and can be different between yield and NUE. Policy makers should make considerate decisions on agronomic policies based on food security and environmental sustainability, and for this they require adequate information and insights, which this thesis aimed to provide. Soil, crop and climatic covariates had high relative importance for NUE, while economic variables and agricultural management practices were also important for crop production. Considering the uncertainty contributions of input data and models for NUE prediction, we encourage the government to standardize the data collection process and inspire scientists to explore available data better using statistical tools and to develop more suitable models.

See also these dissertations

We print for the following universities