A 100 m gridded population dataset of China's seventh census using ensemble learning and big geospatial data
-
Published:2024-08-16
Issue:8
Volume:16
Page:3705-3718
-
ISSN:1866-3516
-
Container-title:Earth System Science Data
-
language:en
-
Short-container-title:Earth Syst. Sci. Data
Author:
Chen Yuehong, Xu Congcong, Ge Yong, Zhang Xiaoxiang, Zhou Ya'nanORCID
Abstract
Abstract. China has undergone rapid urbanization and internal migration in the past few years, and its up-to-date gridded population datasets are essential for various applications. Existing datasets for China, however, suffer from either outdatedness or failure to incorporate data from the latest Seventh National Population Census of China, conducted in 2020. In this study, we develop a novel population downscaling approach that leverages stacking ensemble learning and big geospatial data to produce up-to-date population grids at a 100 m resolution for China using seventh census data at both county and town levels. The proposed approach employs stacking ensemble learning to integrate the strengths of random forest, XGBoost, and LightGBM through fusing their predictions in a training mechanism, and it delineates the inhabited areas from big geospatial data to enhance the gridded population estimation. Experimental results demonstrate that the proposed approach exhibits the best-fit performance compared to individual base models. Meanwhile, the out-of-sample town-level test set indicates that the estimated gridded population dataset (R2=0.8936) is more accurate than existing WorldPop (R2=0.7427) and LandScan (R2=0.7165) products for China in 2020. Furthermore, with the inhabited area enhancement, the spatial distribution of population grids is intuitively more reasonable than the two existing products. Hence, the proposed population downscaling approach provides a valuable option for producing gridded population datasets. The estimated 100 m gridded population dataset of China holds great significance for future applications, and it is publicly available at https://doi.org/10.6084/m9.figshare.24916140.v1 (Chen et al., 2024b).
Funder
National Key Research and Development Program of China National Natural Science Foundation of China
Publisher
Copernicus GmbH
Reference53 articles.
1. Baynes, J., Neale, A., and Hultgren, T.: Improving intelligent dasymetric mapping population density estimates at 30 m resolution for the conterminous United States by excluding uninhabited areas, Earth Syst. Sci. Data, 14, 2833–2849, https://doi.org/10.5194/essd-14-2833-2022, 2022. 2. Bright, E. A. and Coleman, P. R.: LandScan: a global population database for estimating populations at risk, Photogramm. Eng. Rem. S., 66, 849–858, 2000. 3. Chen, M., Xian, Y., Huang, Y., Zhang, X., Hu, M., Guo, S., Chen, L., and Liang, L.: Fine-scale population spatialization data of China in 2018 based on real location-based big data, Scientific Data, 9, 624, https://doi.org/10.1038/s41597-022-01740-5, 2022. 4. Chen, Q., Hou, X., Zhang, X., and Ma, C.: Improved GDP spatialization approach by combining land-use data and night-time light data: a case study in China's continental coastal area, Int. J. Remote Sens., 37, 4610–4622, 2016. 5. Chen, Q., Ye, T., Zhao, N., Ding, M., Ouyang, Z., Jia, P., Yue, W., and Yang, X.: Mapping China's regional economic activity by integrating points-of-interest and remote sensing data with random forest, Environment and Planning B: Urban Analytics and City Science, 48, 1876–1894, https://doi.org/10.1177/2399808320951580, 2021.
|
|