Reports: DNI855837-DNI8: Using Shape to Track Open Ocean Community Structure since the Late Cretaceous

Pincelli M. Hull, PhD, Yale University

In Year 2 of ACS funding, additional progress was made in all fours areas outlined in our Year 1 report towards the major goal of the proposal:  the generation of a 72 million year long record of planktonic foraminiferal morphology. The four major areas of effort include:

i)     technical improvements to software and software usability

ii)    completion of modern dataset

iii)  ecological calibration and investigation of modern dataset

iv)  generation of Cenozoic spanning foraminiferal slides and images

In spite of  considerable progress to date including two published papers, one in press paper, one manuscript in the final rounds of revisions, another in review, several more in preparation, and considerable progress on major international collaboration to identify more than 60,000 planktonic foraminifera images to species-level, additional time is needed to complete the goals of this grant.  I will discuss progress to date before outlining the final steps needed. 

i) Technical improvements to software and software usability

Over the course of the ACS Yr1 and Yr2 funding we carried out substantial improvements to our automated image processing pipeline. This was carried out by Allison Hsiang, a paid post-doctoral researcher on the grant,  and Kaylea Nelson, a scientific computing expert at Yale University, and tested by Leanne Elder, a paid post-doctoral researcher on the grant.

This resulted in two technical papers over the methodological aspects of this research (Hsiang et al. 2016 and Hsiang et al. 2017).

A third paper is in review focusing on the comparability of size measurements made with different methods:

Brombacher A., Elder L.E., Hull P.M., Wilson P.A., Ezard T.H.G. in review. Calibrations of test diameter and area measured from two-dimensional images as proxies for body size in the planktonic foraminifer Globoconella punticulata. In review at the Journal of Foraminiferal Research

The software is freely available through GitHub on the Hull Lab page and our data can be found on data repository Zenodo.

We have also test and published on the efficacy of this software for high throughput imagining in other fossil groups.  Our dataset on limpets is currently in press at Scientific Data:

Kahanamoku S.S., Hull P.M., Lindberg D.R., Hsiang A.Y., Clites E.C., Finnegan S. in press. Twelve thousand recent limpets (Mollusca, Patellogastropoda) from a northeastern Pacific latitudinal gradient. In press at Scientific Data

Two additional manuscripts are in preparation that generate large datasets leveraging this software and include a fish tooth data set led by E. Sibert (postdoctoral researcher at Harvard University) and a modern planktonic foraminiferal porosity dataset led by J.E. Burke (graduate student at Yale University).

Although we view the AutoMorph software as a work in progress (with targeted updates underway and planned for the coming years), the publication of multiple methodological and calibration datasets attests to its utility as a new tool for morphological applications.

ii) Completion of modern dataset

Paleontological inferences over the past 72 million years depend on strong anchors with regards to modern morphologies and ecologies. To this end,  we collected and cleaned an extensive, spatially explicit, dataset of modern planktonic foraminifera.  This data was collected by Leanne Elder, a paid post-doctoral researcher on the project, and included sites spread throughout the Atlantic Ocean. Of the more than 124,000 objects imaged,  we identified 61,000 planktonic foraminifera. This extensive dataset was completed and classified to the level of major group during the first year of funding and the manuscript is now in final revision at Scientific Data:

            Elder L.E., Hsiang A.Y., Nelson K., Strotz L.C., Kahanamoku S.S., Hull P.M. in revision. Sixty-one thousand recent planktonic foraminifera from the Atlantic Ocean. In revision for Scientific Data.

We are currently identifying the planktonic foraminifera using a combination of community expertise and machine learning.  Specifically, a portal has been set up on Zooniverse to allow modern planktonic foraminiferal experts to classify approximately 25,000 images.  More than 20 experts have signed on to the project and there have been 29,746 classifications to date.   Once these images are fully classified (i.e.,  100,000 classifications attached to 25,000 images), we will combine this image data set with another 22,000 images segmented from the slide collection of Henry Buckley collection at the Natural History Museum in London.  Together these 47,000 identified images will provide by far the largest taxonomic library of modern planktonic species.  Identification and segmentation is on-track to be finished in March and we plan to serve these images through several portals to allow the community at large to use this dataset for taxonomic purposes.

Within our research group, A. Hsiang is prepared to use the 47,000 identified images as a machine learning training set to automatically identify the remaining ~40,000 images from the AutoMorph collection.

iii) Ecological calibration and investigation of modern dataset

With an eye toward interpreting the Cenozoic results, two major projects were begun to interpret the modern morphological data:  a comparison of body size across pelagic taxa and a detailed investigation of the body size trends within species of extant foraminifera.  Major progress was made only on the first during Year 1 and was described in the previous grant reporting period.  We did not make additional progress on this portion of the work during Yr 2 due and writing and submitting these two manuscripts is one major goal of a requested no-cost extension.

iv) Generation of Cenozoic spanning foraminiferal slides and images

Post-doctoral researcher L. Elder also led the data collection efforts for the Cenozoic record in Yr1.  Her accomplishments in this year include accumulating and sorting the potential sample set, and preparing and imaging a very low resolution (1 sample per 10 million year) slide collection.  In order to finish collecting the 72 million year long record originally proposed,  I have secured funding from the Invertebrate Paleontology Collections at the Yale Peabody Museum to fund L. Elder for an additional 4-months devoted entirely to sample preparation and imaging in 2018.  With the community identification portal and machine learning routines set up, these images will be classified  and provided under the guidance of former post-doc A. Hsiang in the fall of 2018.