Validating predictions of burial mounds with field data: the promise and reality of machine learning-Reference-Cited by-同舟云学术

Validating predictions of burial mounds with field data: the promise and reality of machine learning

Published:2024-04-26 Issue:5 Volume:80 Page:1167-1189
ISSN:0022-0418
Container-title:Journal of Documentation
language:en
Short-container-title:JD

Author:

Sobotkova Adela^ORCID,Kristensen-McLachlan Ross Deans,Mallon Orla,Ross Shawn Adrian^ORCID

Abstract

PurposeThis paper provides practical advice for archaeologists and heritage specialists wishing to use ML approaches to identify archaeological features in high-resolution satellite imagery (or other remotely sensed data sources). We seek to balance the disproportionately optimistic literature related to the application of ML to archaeological prospection through a discussion of limitations, challenges and other difficulties. We further seek to raise awareness among researchers of the time, effort, expertise and resources necessary to implement ML successfully, so that they can make an informed choice between ML and manual inspection approaches.Design/methodology/approachAutomated object detection has been the holy grail of archaeological remote sensing for the last two decades. Machine learning (ML) models have proven able to detect uniform features across a consistent background, but more variegated imagery remains a challenge. We set out to detect burial mounds in satellite imagery from a diverse landscape in Central Bulgaria using a pre-trained Convolutional Neural Network (CNN) plus additional but low-touch training to improve performance. Training was accomplished using MOUND/NOT MOUND cutouts, and the model assessed arbitrary tiles of the same size from the image. Results were assessed using field data.FindingsValidation of results against field data showed that self-reported success rates were misleadingly high, and that the model was misidentifying most features. Setting an identification threshold at 60% probability, and noting that we used an approach where the CNN assessed tiles of a fixed size, tile-based false negative rates were 95–96%, false positive rates were 87–95% of tagged tiles, while true positives were only 5–13%. Counterintuitively, the model provided with training data selected for highly visible mounds (rather than all mounds) performed worse. Development of the model, meanwhile, required approximately 135 person-hours of work.Research limitations/implicationsOur attempt to deploy a pre-trained CNN demonstrates the limitations of this approach when it is used to detect varied features of different sizes within a heterogeneous landscape that contains confounding natural and modern features, such as roads, forests and field boundaries. The model has detected incidental features rather than the mounds themselves, making external validation with field data an essential part of CNN workflows. Correcting the model would require refining the training data as well as adopting different approaches to model choice and execution, raising the computational requirements beyond the level of most cultural heritage practitioners.Practical implicationsImproving the pre-trained model’s performance would require considerable time and resources, on top of the time already invested. The degree of manual intervention required – particularly around the subsetting and annotation of training data – is so significant that it raises the question of whether it would be more efficient to identify all of the mounds manually, either through brute-force inspection by experts or by crowdsourcing the analysis to trained – or even untrained – volunteers. Researchers and heritage specialists seeking efficient methods for extracting features from remotely sensed data should weigh the costs and benefits of ML versus manual approaches carefully.Social implicationsOur literature review indicates that use of artificial intelligence (AI) and ML approaches to archaeological prospection have grown exponentially in the past decade, approaching adoption levels associated with “crossing the chasm” from innovators and early adopters to the majority of researchers. The literature itself, however, is overwhelmingly positive, reflecting some combination of publication bias and a rhetoric of unconditional success. This paper presents the failure of a good-faith attempt to utilise these approaches as a counterbalance and cautionary tale to potential adopters of the technology. Early-majority adopters may find ML difficult to implement effectively in real-life scenarios.Originality/valueUnlike many high-profile reports from well-funded projects, our paper represents a serious but modestly resourced attempt to apply an ML approach to archaeological remote sensing, using techniques like transfer learning that are promoted as solutions to time and cost problems associated with, e.g. annotating and manipulating training data. While the majority of articles uncritically promote ML, or only discuss how challenges were overcome, our paper investigates how – despite reasonable self-reported scores – the model failed to locate the target features when compared to field data. We also present time, expertise and resourcing requirements, a rarity in ML-for-archaeology publications.

Publisher

Emerald

Reference112 articles.

1. Evaluation of output embeddings for fine-grained image classification,2015

2. Albrecht, C.M., Fisher, C., Freitag, M., Hamann, H.F., Pankanti, S., Pezzutti, F. and Rossi, F. (2019), “Learning and recognizing Archeological features from LiDAR data”, in Baru, C., Huan, J., Khan, L., Hu, X.H., Ak, R., Tian, Y., Barga, R., et al. (Eds), pp. 5630-5636, doi: 10.1109/BigData47090.2019.9005548.

3. A review of artificial intelligence and remote sensing for archaeological research;Remote Sensing,2022

4. Bailey, D.W. (1998), “Bulgarian archaeology; ideology, sociopolitics and the exotic”, in Meskell, L. (Ed.), Archaeology under Fire: Nationalism, Politics and Heritage in the Eastern Mediterranean and Middle East, Routledge, London, New York, pp. 87-110.

5. The data explosion: tackling the taboo of automatic feature recognition in airborne survey data;Antiquity,2014

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Guest editorial: Artificial intelligence for cultural heritage materials;Journal of Documentation;2024-09-03