Landslides represent a critical geohazard in many regions worldwide, including the Campania region, Southern Italy, which is particularly vulnerable to rapid, flow-like landslides triggered by intense short-term rainfall, especially if the latter occur after prolonged wet conditions. These events, mostly prevalent in mountainous areas covered by pyroclastic deposits, have caused severe casualties and damage in recent decades, motivating extensive research into their complex triggering mechanisms. Traditionally, physically based models have been employed to simulate landslide dynamics by solving rigorous thermo-hydro-mechanical equations. While effective at the local scale, these models often struggle to incorporate the spatial heterogeneity of geotechnical and hydraulic properties over larger areas. In parallel, machine learning (ML) techniques have emerged as powerful alternatives capable of handling complex, non-linear relationships and integrating large, heterogeneous datasets comprising geological, geomorphological, atmospheric, and vegetation-related variables. Although ML models are commonly used to generate static susceptibility maps based on spatial characteristics, few studies have addressed their potential for capturing temporal variability, especially concerning dynamic atmospheric conditions. To bridge such a gap, this study proposes a comprehensive framework based on tree-based ML algorithms, including Extreme Gradient Boosting (XGBoost) (Chen and Guestrin, 2016), Light Gradient Boosting Machine (LightGBM) (Ke et al., 2017), Categorical Boosting (CatBoost) (Prokhorenkova et al., 2018), Random Forest (RF) (Breiman, 2001), and Decision Tree (DT) (Breiman et al., 2017) to predict landslide susceptibility and, then, dynamic hazard indexes across space and time. A custom spatial-temporal dataset was developed using QGIS (http://www.qgis.org) by integrating georeferenced landslide event data with relevant thematic layers, enabling the extraction of both spatial and temporal predictors for ML training. Additionally, the study investigates the impact of varying landslide-to-non-landslide area ratios in model development and aims to enhance interpretability by employing SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017) to elucidate model outputs. This ongoing research seeks to improve understanding of landslide behavior and support the integration of ML methodologies in geotechnical applications, particularly for early warning systems and regional risk mitigation strategies.
Integrating Spatiotemporal Parameters for Landslide Susceptibility and Hazard Prediction: A Machine Learning Framework with SHAP Interpretation in Campania, Italy
Gennaro Sequino;Luca Comegna;Roberto Greco;
2025
Abstract
Landslides represent a critical geohazard in many regions worldwide, including the Campania region, Southern Italy, which is particularly vulnerable to rapid, flow-like landslides triggered by intense short-term rainfall, especially if the latter occur after prolonged wet conditions. These events, mostly prevalent in mountainous areas covered by pyroclastic deposits, have caused severe casualties and damage in recent decades, motivating extensive research into their complex triggering mechanisms. Traditionally, physically based models have been employed to simulate landslide dynamics by solving rigorous thermo-hydro-mechanical equations. While effective at the local scale, these models often struggle to incorporate the spatial heterogeneity of geotechnical and hydraulic properties over larger areas. In parallel, machine learning (ML) techniques have emerged as powerful alternatives capable of handling complex, non-linear relationships and integrating large, heterogeneous datasets comprising geological, geomorphological, atmospheric, and vegetation-related variables. Although ML models are commonly used to generate static susceptibility maps based on spatial characteristics, few studies have addressed their potential for capturing temporal variability, especially concerning dynamic atmospheric conditions. To bridge such a gap, this study proposes a comprehensive framework based on tree-based ML algorithms, including Extreme Gradient Boosting (XGBoost) (Chen and Guestrin, 2016), Light Gradient Boosting Machine (LightGBM) (Ke et al., 2017), Categorical Boosting (CatBoost) (Prokhorenkova et al., 2018), Random Forest (RF) (Breiman, 2001), and Decision Tree (DT) (Breiman et al., 2017) to predict landslide susceptibility and, then, dynamic hazard indexes across space and time. A custom spatial-temporal dataset was developed using QGIS (http://www.qgis.org) by integrating georeferenced landslide event data with relevant thematic layers, enabling the extraction of both spatial and temporal predictors for ML training. Additionally, the study investigates the impact of varying landslide-to-non-landslide area ratios in model development and aims to enhance interpretability by employing SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017) to elucidate model outputs. This ongoing research seeks to improve understanding of landslide behavior and support the integration of ML methodologies in geotechnical applications, particularly for early warning systems and regional risk mitigation strategies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


