Active-transfer learning for natural hazard assessment
Combining transferred knowledge with optimal local data acquisition
Guest post by Dr. Zhihao Wang, PhD 2023, now postdoc at UC Davis
When training machine-learning models for natural hazard assessment, we still rely on hundreds (even thousands) of labeled instances, which are very difficult, time-consuming, or expensive to obtain.
For building the training dataset at little cost, two ideas, data minimization and existing datasets, have been attracting increased attention. In terms of data minimization, active learning (AL) (Settles, 2010) is to choose the data (“informative” data) that the model is curious about as the training data for achieving high accuracy using as few labeled instances as possible. Wang and Brenning (2021) compared different active learning strategies in landslide detection assessments and confirmed the good performance of active learning. For using existing datasets as the training data, transfer learning (Pan and Yang, 2010) is to extract information from existing datasets to help a learning algorithm achieve a good performance in a new target area. Wang et al. (2022) used different transfer learning techniques in landslide susceptibility assessment and found that the similarity of environment and data characteristics are key factors for successful model transfers. However, active learning and transfer learning have disadvantages in terms of model stability and precise understanding of the governing factors in the target area, respectively.
My recent contribution (Wang and Brenning, 2023) proposes novel a natural hazards framework that addresses these issues:
When using UATL to do natural hazard assessment, it can fully take advantage of the existing datasets from geographically distant (source areas) and a small amount of data from the target area to achieve excellent predictive performances and more precise relationships between the response and predictors in the target area.
How it works
UATL is a human-in-the-loop framework for natural hazard assessments. First, it applies the case-based reasoning (CBR) transfer learning method to compare the similarity of characteristics (e.g., environmental characteristics) between source and target areas, and the source area with the highest similarity is used as the training set to building a natural hazard model (CBR model).
Next, UATL combines the CBR model with the active learning framework to obtain “informative” training data from the target area in a human-in-the-loop manner. Unlike the active learning framework, where the initial training set is randomly selected by the human, this process not only further saves the cost of creating the training set but also ensures the quality of the initial training set.
Finally, when certain selection criteria are met (e.g., time of selection), UATL stops selecting “informative” instances from the target area.
UATL can be applied to natural hazard assessments. Here we will have a brief look at open-source landslide inventories that can be found in Tanyas et al. (2021). In the paper, I also demonstrate the performance of UATL in the landslide detection assessment (Wang and Brenning, 2023).
Case Study: the open-source landslide inventories of Indonesia
The source area is located in the western Aceh province of Sumatra, and the target area is in the southern Palu province of Sulawesi, both in Indonesia (Figure 2). Landslides in both study areas occur frequently and may be triggered by seismic activity as well as heavy rainfall or their combined effect. The red polygons are the landslide inventories for the target area (Palu) and the source area (Reuleut).
For evaluating the performance of UATL, the partial AUROC is adopted (Figure 3). The partial AUROC ingrates the ROC curve only within a restricted domain, which allows us to assess the model’s ability to predict landslide occurrence while correctly classifying most non-landslide sites as stable (Brenning, 2012). Three benchmarks are used, which include only using active learning (AL benchmark), only using the transferred model (CBR benchmark), and the model trained by the target area itself (Target benchmark).
This is what we learn from the partial AUROCs obtained by different methods:
The predictive accuracy obtained by UATL was only 2% lower than that obtained by the target benchmark in a study area of 229 km² in Indonesia.
UATL reduced 80% of the efforts of building the training data compared to only using the active learning framework.
In addition, we showed the areas that are most likely to be landslides in the landslide map that was obtained with adaptive UATL using only 235 landslide and non-landslide points (Figure 4). Almost all landslides were covered by areas categorized as being very likely to present a landslide, which was consistent with the numerical performance indicators obtained above (Figure 3).
Conclusion
By taking advantage of the existing, older dataset from a geographically distant, but similar source area, UATL is effectively able to use a small number of “informative” data points from the target area for sufficiently achieving excellent predictive performances with a little help from the “human in the loop”. Moreover, due to its availability as a reference for modeling other natural hazards, UATL has the potential to become a generic framework for modeling natural hazards.
References
Brenning, A. (2012): Improved spatial analysis and prediction of landslide susceptibility: Practical recommendations. Landslides and Engineered Slopes, Protecting Society through Improved Understanding, edited by: Eberhardt, E., Froese, C., Turner, AK, and Leroueil, S., Taylor & Francis, Banff, Alberta, Canada, 789–795.
Pan, S. J. and Yang, Q. A. (2010): A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359, DOI: 10.1109/Tkde.2009.191.
Settles, B. (2010): Active learning literature survey, University of Wisconsin, Computer Sciences Technical Report.
Tanyas, H., Kirschbaum, D., Gorum, T., van Westen, C.J., Lombardo, L. (2021): New insight into post-seismic landslide evolution processes in the tropics. Frontiers in Earth Science, 9, 551, DOI: 10.3389/feart.2021.700546.
Wang, Z & Brenning, A. (2021): Active-learning approaches for landslide mapping using support vector machines. Remote Sensing, 13(13), 2588. DOI: 10.3390/rs13132588.
Wang, Z., Goetz, J., Brenning, A. (2022): Transfer learning for landslide susceptibility modeling using domain adaptation and case-based reasoning. Geoscientific Model Development, 15(23), 8765–8784. DOI: 10.5194/gmd-15-8765-2022.
Wang, Z. H. and Brenning, A. (2023): Unsupervised active-transfer learning for automated landslide mapping, Computers & Geosciences, 181, ARTN 105457, DOI: 10.1016/j.cageo.2023.105457.