Geoscience Data Journal

High-Resolution Geospatial Database: National Criteria-Air-Pollutant Concentrations in the Contiguous U.S., 2016-2020
Lu T, Kim SY and Marshall JD
Concentration estimates for ambient air pollution are used widely in fields such as environmental epidemiology, health impact assessment, urban planning, environmental equity and sustainability. This study builds on previous efforts by developing an updated high-resolution geospatial database of population-weighted annual-average concentrations for six criteria air pollutants (PM, PM, CO, NO, SO, O) across the contiguous U.S. during a five-year period (2016-2020). We developed Land Use Regression (LUR) models within a partial-least-squares-universal kriging framework by incorporating several land use, geospatial and satellite-based predictor variables. The LUR models were validated using conventional and clustered cross-validation, with the former consistently showing superior performance in capturing the variability of air quality. Most models demonstrated reliable performance (e.g., mean squared error-based > 0.8, standardised root mean squared error < 0.1). We used the best modelling approach to develop estimates by Census Block, which were then population-weighted averaged at Census Block Group, Census Tract and County geographies. Our database provides valuable insights into the dynamics of air pollution, with utility for environmental risk assessment, public health, policy and urban planning.
Digitizing UK analogue magnetogram records from large geomagnetic storms of the past two centuries
Beggan CD, Eaton E, Maume E, Clarke E, Williamson J and Humphries T
Continuous geomagnetic records of the strength and direction of the Earth's field at the surface extend back to the 1840s. Over the past two centuries, eight observatories have existed in the United Kingdom, which measured the daily field variations using light-sensitive photographic paper to produce analogue magnetograms. Around 350,000 magnetograms have been digitally photographed at high resolution. However, converting the traces to digital values is difficult and time consuming as the magnetograms can have over-lapping lines, low quality recordings and obscure metadata for conversion to SI units. We discuss our approach to digitizing the traces from large geomagnetic storms and highlight some of the issues to be aware of when capturing magnetic information from analogue measurements. These include cross-checking the final digitized values with the recorded hourly mean values from observatory year books and comparing several observatory records for the same storm to catch errors such as sign inversions or incorrect 'wrap-around' of data on the paper records.
An updated global atmospheric paleo-reanalysis covering the last 400 years
Valler V, Franke J, Brugnara Y and Brönnimann S
Data assimilation techniques are becoming increasingly popular for climate reconstruction. They benefit from estimating past climate states from both observation information and from model simulations. The first monthly global paleo-reanalysis (EKF400) was generated over the 1600 and 2005 time period, and it provides estimates of several atmospheric fields. Here we present a new, considerably improved version of EKF400 (EKF400v2). EKF400v2 uses atmospheric-only general circulation model simulations with a greatly extended observational network of early instrumental temperature and pressure data, documentary evidences and tree-ring width and density proxy records. Furthermore, new observation types such as monthly precipitation amounts, number of wet days and coral proxy records were also included in the assimilation. In the version 2 system, the assimilation process has undergone methodological improvements such as the background-error covariance matrix is estimated with a blending technique of a time-dependent and a climatological covariance matrices. In general, the applied modifications resulted in enhanced reconstruction skill compared to version 1, especially in precipitation, sea-level pressure and other variables beside the mostly assimilated temperature data, which already had high quality in the previous version. Additionally, two case studies are presented to demonstrate the applicability of EKF400v2 to analyse past climate variations and extreme events, as well as to investigate large-scale climate dynamics.
Global earth mineral inventory: A data legacy
Prabhu A, Morrison SM, Eleish A, Zhong H, Huang F, Golden JJ, Perry SN, Hummer DR, Ralph J, Runyon SE, Fontaine K, Krivovichev S, Downs RT, Hazen RM and Fox P
Minerals contain important clues to understanding the complex geologic history of Earth and other planetary bodies. Therefore, geologists have been collecting mineral samples and compiling data about these samples for centuries. These data have been used to better understand the movement of continental plates, the oxidation of Earth's atmosphere and the water regime of ancient martian landscapes. Datasets found at 'RRUFF.info/Evolution' and 'mindat.org' have documented a wealth of mineral occurrences around the world. One of the main goals in geoinformatics has been to facilitate discovery by creating and merging datasets from various scientific fields and using statistical methods and visualization tools to inspire and test hypotheses applicable to modelling Earth's past environments. To help achieve this goal, we have compiled physical, chemical and geological properties of minerals and linked them to the above-mentioned mineral occurrence datasets. As a part of the Deep Time Data Infrastructure, funded by the W.M. Keck Foundation, with significant support from the Deep Carbon Observatory (DCO) and the A.P. Sloan Foundation, GEMI ('Global Earth Mineral Inventory') was developed from the need of researchers to have all of the required mineral data visible in a single portal, connected by a robust, yet easy to understand schema. Our data legacy integrates these resources into a digestible format for exploration and analysis and has allowed researchers to gain valuable insights from mineralogical data. GEMI can be considered a network, with every node representing some feature of the datasets, for example, a node can represent geological parameters like colour, hardness or lustre. Exploring subnetworks gives the researcher a specific view of the data required for the task at hand. GEMI is accessible through the DCO Data Portal (https://dx.deepcarbon.net/11121/6200-6954-6634-8243-CC). We describe our efforts in compiling GEMI, the Data Policies for usage and sharing, and the evaluation metrics for this data legacy.
Reconstructed monthly river flows for Irish catchments 1766-2016
O'Connor P, Murphy C, Matthews T and Wilby RL
A 250-year (1766-2016) archive of reconstructed river flows is presented for 51 catchments across Ireland. By leveraging meteorological data rescue efforts with gridded precipitation and temperature reconstructions, we develop monthly river flow reconstructions using the GR2M hydrological model and an Artificial Neural Network. Uncertainties in reconstructed flows associated with hydrological model structure and parameters are quantified. Reconstructions are evaluated by comparison with those derived from quality assured long-term precipitation series for the period 1850-2000. Assessment of the reconstruction performance across all 51 catchments using metrics of MAE (9.3 mm/month; 13.3%), RMSE (12.6 mm/month; 18.0%) and mean bias (-1.16 mm/month; -1.7%), indicates good skill. Notable years with highest/lowest annual mean flows across all catchments were 1877/1855. Winter 2015/16 had the highest seasonal mean flows and summer 1826 the lowest, whereas autumn 1933 had notable low flows across most catchments. The reconstructed database will enable assessment of catchment specific responses to varying climatic conditions and extremes on annual, seasonal and monthly timescales.
The Ensemble Mars Atmosphere Reanalysis System (EMARS) Version 1.0
Greybush SJ, Kalnay E, Wilson RJ, Hoffman RN, Nehrkorn T, Leidner M, Eluszkiewicz J, Gillespie HE, Wespetal M, Zhao Y, Hoffman M, Dudas P, McConnochie T, Kleinböhl A, Kass D, McCleese D and Miyoshi T
The Ensemble Mars Atmosphere Reanalysis System (EMARS) dataset version 1.0 contains hourly gridded atmospheric variables for the planet Mars, spanning Mars Year (MY) 24 through 33 (1999 through 2017). A reanalysis represents the best estimate of the state of the atmosphere by combining observations that are sparse in space and time with a dynamical model and weighting them by their uncertainties. EMARS uses the Local Ensemble Transform Kalman Filter (LETKF) for data assimilation with the GFDL/NASA Mars Global Climate Model (MGCM). Observations that are assimilated include the Thermal Emission Spectrometer (TES) and Mars Climate Sounder (MCS) temperature retrievals. The dataset includes gridded fields of temperature, wind, surface pressure, as well as dust, water ice, CO surface ice and other atmospheric quantities. Reanalyses are useful for both science and engineering studies, including investigations of transient eddies, the polar vortex, thermal tides and dust storms, and during spacecraft operations.
Hourly weather observations from the Scottish Highlands (1883-1904) rescued by volunteer citizen scientists
Hawkins E, Burt S, Brohan P, Lockwood M, Richardson H, Roy M and Thomas S
Weather observations taken every hour during the years 1883-1904 on the summit of Ben Nevis (1345 m above sea level) and in the town of Fort William in the Scottish Highlands have been transcribed from the original publications into digital form. More than 3,500 citizen scientist volunteers completed the digitization in less than 3 months using the http://WeatherRescue.org website. Over 1.5 million observations of atmospheric pressure, wet- and dry-bulb temperatures, precipitation and wind speed were recovered. These data have been quality controlled and are now made openly available, including hourly values of relative humidity derived from the digitized dry- and wet-bulb temperatures using modern hygrometric algorithms. These observations are one of the most detailed weather data collections available for anywhere in the UK in the Victorian era. In addition, 374 observations of aurora borealis seen by the meteorologists from the summit of Ben Nevis have been catalogued and this has improved the auroral record for studies of space weather.
Historical watershed stressors for the Laurentian Great Lakes
Reavie ED, Cai M and Brown TN
This report provides a detailed set of historical stressor data for 60 watersheds comprising the Laurentian Great Lakes basin. Archival records were transcribed from public records to create quantitative data on human activities: population, mining, deforestation, and agriculture. Yearly records of stressors are provided from 1780 through 2010. These data may be used to track historical impacts on Great Lakes coastal and open water conditions. They may further be used to examine corresponding effects on response variables such as biological communities quantified during monitoring and palaeoecological programmes.
Estimating active layer thickness and volumetric water content from ground penetrating radar measurements in Barrow, Alaska
Jafarov EE, Parsekian AD, Schaefer K, Liu L, Chen AC, Panda SK and Zhang T
Ground penetrating radar (GPR) has emerged as an effective tool for estimating active layer thickness (ALT) and volumetric water content (VWC) within the active layer. In August 2013, we conducted a series of GPR and probing surveys using a 500 MHz antenna and metallic probe around Barrow, Alaska. We collected about 15 km of GPR data and 1.5 km of probing data. Here, we describe the GPR data processing workflow from raw GPR data to the estimated ALT and VWC. We include the corresponding uncertainties for each measured and estimated parameter. The estimated average GPR-derived ALT was 41 cm, with a standard deviation of 9 cm. The average probed ALT was 40 cm, with a standard deviation of 12 cm. The average GPR-derived VWC was 0.65, with a standard deviation of 0.14.
Datasets related to in-land water for limnology and remote sensing applications: distance-to-land, distance-to-water, water-body identifier and lake-centre co-ordinates
Carrea L, Embury O and Merchant CJ
Datasets containing information to locate and identify water bodies have been generated from data locating static-water-bodies with resolution of about 300 m (1/360) recently released by the Land Cover Climate Change Initiative (LC CCI) of the European Space Agency. The LC CCI water-bodies dataset has been obtained from multi-temporal metrics based on time series of the backscattered intensity recorded by ASAR on Envisat between 2005 and 2010. The new derived datasets provide coherently: distance to land, distance to water, water-body identifiers and lake-centre locations. The water-body identifier dataset locates the water bodies assigning the identifiers of the Global Lakes and Wetlands Database (GLWD), and lake centres are defined for in-land waters for which GLWD IDs were determined. The new datasets therefore link recent lake/reservoir/wetlands extent to the GLWD, together with a set of coordinates which locates unambiguously the water bodies in the database. Information on distance-to-land for each water cell and the distance-to-water for each land cell has many potential applications in remote sensing, where the applicability of geophysical retrieval algorithms may be affected by the presence of water or land within a satellite field of view (image pixel). During the generation and validation of the datasets some limitations of the GLWD database and of the LC CCI water-bodies mask have been found. Some examples of the inaccuracies/limitations are presented and discussed. Temporal change in water-body extent is common. Future versions of the LC CCI dataset are planned to represent temporal variation, and this will permit these derived datasets to be updated.
Bridging the gap between climate models and impact studies: the FORESEE Database
Dobor L, Barcza Z, Hlásny T, Havasi Á, Horváth F, Ittzés P and Bartholy J
Studies on climate change impacts are essential for identifying vulnerabilities and developing adaptation options. However, such studies depend crucially on the availability of reliable climate data. In this study, we introduce the climatological database called FORESEE (Open Database for Climate Change Related Impact Studies in Central Europe), which was developed to support the research of and adaptation to climate change in Central and Eastern Europe: the region where knowledge of possible climate change effects is inadequate. A questionnaire-based survey was used to specify database structure and content. FORESEE contains the seamless combination of gridded daily observation-based data (1951-2013) built on the E-OBS and CRU TS datasets, and a collection of climate projections (2014-2100). The future climate is represented by bias-corrected meteorological data from 10 regional climate models (RCMs), driven by the A1B emission scenario. These latter data were developed within the frame of the ENSEMBLES FP6 project. Although FORESEE only covers a limited area of Central and Eastern Europe, the methodology of database development, the applied bias correction techniques, and the data dissemination method, can serve as a blueprint for similar initiatives.
A daily Azores-Iceland North Atlantic Oscillation index back to 1850
Cropper T, Hanna E, Valente MA and Jónsson T
We present the construction of a continuous, daily (09:00 UTC), station-based (Azores-Iceland) North Atlantic Oscillation (NAO) Index back to 1871 which is extended back to 1850 with additional daily mean data. The constructed index more than doubles the length of previously existing, widely available, daily NAO time series. The index is created using entirely observational sea-level pressure (SLP) data from Iceland and 73.5% of observational SLP data from the Azores - the remainder being filled in via reanalysis (Twentieth Century Reanalysis Project and European Mean Sea Level Pressure) SLP data. Icelandic data are taken from the Southwest Iceland pressure series. We construct and document a new Ponta Delgada SLP time series based on recently digitized and newly available data that extend back to 1872. The Ponta Delgada time series is created by splicing together several fractured records (from Ponta Delgada, Lajes, and Santa Maria) and filling in the major gaps (pre-1872, 1888-1905, and 1940-1941) and occasional days (145) with reanalysis data. Further homogeneity corrections are applied to the Azores record, and the daily (09:00 UTC) NAO index is then calculated. The resulting index, with its extended temporal length and daily resolution, is the first reconstruction of daily NAO back into the 19th Century and therefore is useful for researchers across multiple disciplines.