An Online Collaborative Platform for Data Extraction in Geoscience Literature with AI Assistance

Geoscientists, as well as researchers in many fields, need to read a huge amount of literature to locate, extract, and aggregate relevant results and data to enable future research or to build a scientific database, but there is no existing system to support this use case well. In this paper, based on the findings of a formative study about how geoscientists collaboratively annotate literature and extract and aggregate data, we proposed DeepShovel, a publicly-available AI-assisted data extraction system to support their needs. DeepShovel leverages the state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc. in a human-AI collaboration manner. A follow-up user evaluation with 14 researchers suggested DeepShovel improved users’ efficiency of data extraction for building scientific databases, and encouraged teams to form a larger scale but more tightly-coupled collaboration.