Funded Seed Grants

Rolling Seed Grants 5

Application Deadline: Rolling and prior to April 1, 2023

Broadening Diversity in the North American Plant Phenotyping Community

Noah Fahlgren, Malia Gehan
Download Narrative

The 2023 NAPPN annual conference leverages the successes of the previous Phenome conference series (2016-2020) and the previous NAPPN annual conferences (2021-2022). The conference will assemble a range of sciences and technologies in engineering, agronomy, ecology, and plant systems biology from among academic, federal, and commercial entities to address the plant-environment interface. A key goal of the conference is to invite and support a new generation and a broad community of scientists from diverse scientific and cultural backgrounds. By participating in these types of exchanges attendees will accelerate the rate of advance in phenotyping by populating the field with innovative and neurodiverse thinkers. As part of our effort to diversify the plant phenomics research community, we request funding from AG2PI to support the participation of students, postdoctoral scholars and faculty members from minority serving institutions (MSIs) including Historically Black Colleges and Universities (HBCUs), Hispanic Serving Institutions (HSIs), and Primarily Undergraduate Institutions (PUIs). We have announced this year's conference ( and intent to provide support through emails and direct contact with faculty and administrators at MSIs.

Enabling Collaborations and Interdisciplinary Engagement between the Agricultural Genome to Phenome and Computational Biology Communities

Noah Fahlgren, Camilo Valdes, Iddo Friedberg
Download Narrative

The 30th Conference on Intelligent Systems for Molecular Biology (ISMB) is the premier international conference for computational biology and bioinformatics. Hosted by the International Society for Computational Biology (ISCB) this annual conference is expected to assemble over 2,000 participants from a range of biological and computational disciplines with interests in bioinformatics, computational biology, imaging, high throughput sequencing, AI/machine learning, and systems biology as applied to the life sciences. This year the conference will be a hybrid event with in-person activities held from July


Special session on Digital Agriculture in Intelligent Systems for Molecular Biology 2022 conference (July 12, 2022). For more information, visit:

Session 1

Harnessing population-based patterns and inferring ecological signal from complex foodborne pathogen whole-genome datasets
Joao Carlos Gomes Neto
Department of Food Science and Technology
University of Nebraska-Lincoln

Cross-kingdom Interactions in the Porcine Gut: Implications in Health and Performance
Katie Lynn Summers
Animal Biosciences and Biotechnology Laboratory
United States Department of Agriculture - Agricultural Research Service (USDA-ARS)

Combining Two Analytical Techniques with Chemometric Analysis to Characterize Wine by Vineyard, Region, and Vintage
Alexandra Crook
Graduate Research Assistant
University of Nebraska-Lincoln

Porcine Reproductive and Respiratory Syndrome Virus Infection Upregulates Negative Immune Regulators and T-Cell Exhaustion Markers
Chia Sin Liew
Bioinformatics Core Research Facility
University of Nebraska-Lincoln

Developing a Low-Cost Digital Imaging System for Plant Phenotyping Using Raspberry Pi Computers
Manoj Natarajan
Graduate Research Assistant
Dalhousie University Faculty of Agriculture

An On site Feces Image Classifier System for Poultry Health Assessment
Guoming Li
Postdoc Research Associate
Iowa State University
Session 2

Digital Agriculture at Scale
Addie Thompson
Department of Plant, Soil and Microbial Sciences
Michigan State University
Session 3

PlantifyAI: A Novel Convolutional Neural Network Based Mobile Application for Efficient Crop Disease Detection and Treatment
Samyak Shrimali
High School Student
Jesuit High School, Portland, Oregon

DNA Stable Isotope Probing Reveals Beneficial Effects of Plant Associated Fungi on Bacterial Communities in Drought Affected Soil
Rachel Hestrin
Stockbridge School of Agriculture
University of Massachusetts, Amherst

Integration of Epigenomic and Transcriptomic Data to Identify Regulatory Elements and Networks Controlling Immune Cell-Type Gene Expression in the Pig
Christopher Tuggle
Department of Animal Science
Iowa State University

Learning the Grammar of Plant Regulatory DNA
Tobias Jones
Postdoc Research Associate
University of Washington

Big Data Applications in Strawberry Breeding
Zhen Fan
Postdoc Research Associate
University of Florida

Speech-Based Genotype to Phenotype Analysis for Association Genetics in Maize: A Proof of Concept
Colleen F. Yanarella
Graduate Research Assistant
Iowa State University

Hands-on training in high-throughput phenotyping

Margaret Krause, Jessica Rutkoski
Download Narrative

Many breeders, researchers, and students lack the expertise necessary to deploy high-throughput phenotyping (HTP) to benefit cultivar development and research. To address this community-wide gap in capacity, we plan to host an in-person workshop that will engage attendees in hands-on, step-by-step procedures to deploy HTP and make meaningful use of HTP data. Training materials will be made publicly available, and attendees will gain skills to serve as an HTP resource at their home institutions.


Broadening diversity in the North American Plant Phenotyping Network

David LeBauer, Alexander Bucksch, Jennifer Clarke
Download Narrative

The 2022 NAPPN annual conference leverages the successes of the previous Phenome conference series (2016-2020) and the virtual 2021 NAPPN annual conference. The conference will assemble a range of sciences and technologies in engineering, agronomy, ecology, and plant systems biology from among academic, federal and commercial entities to address the plant-environment interface. A key goal of the conference is to invite and support a new generation of scientists from diverse scientific and cultural backgrounds into this field in hopes that participating in these types of exchanges early in their careers will accelerate the rate of advance by populating the field with innovative thinkers. As part of our effort to diversify the phenomics research community, we request funding from AG2PI to cover the costs of participation for minority participants from minority serving institutions (MSIs) including Historically Black Colleges and Universities (HBCUs). This funding will cover registration and travel support.

Cross Training Future Workforce on Data-Driven Decision Support Tools for Precision Phenotyping

Mahendra Bhandari, Sushil Paudyal, Lucy Huang
Download Narrative

The overall goal of this project is to enhance transdisciplinary learning with a major focus in digital agriculture. The specific objectives of this project include: Objective 1: Organize a summer training internship program for undergraduate students with background in animal science, crop science, and computer science: Undergraduate students from Departments of Animal Science, Plant Science, and Computer Science at Texas A&M University-college station, TAMU-Kingsville, and TAMU-Corpus Christi respectively will be enrolled as an intern during Summer 2022. Students will have opportunities to receive course credits for ANSC 494 internship course coordinated by Co-PI at TAMU. Similarly, student from TAMU-K will also receive the course credits for internship course.

Datasets obtained from precision dairy monitoring tools (pedometer) and Unmanned Aerial Systems (UAS) collected from cotton field will be used as two case studies. Students will learn about the data collection procedures in both systems, create a standardized data base, and utilize the database to train and validate machine learning models to predict disease events in dairy cows and yield prediction in cotton. The goal is to develop student competency of understanding data generating systems, the database, and utilize it to develop data-driven tools for precision phenotyping.

An inaugural cohort of three students will be selected for this 10-week program from each of the three representative campuses. Students will attend the weekly training sessions on best practices of handling, management and curation of big data obtained from the two systems. Students will be assigned tasks each week that they are required to complete individually. Each week will end with a debrief session where students reflect on the progress and plan to adjust the approach for the upcoming weeks. This presents the opportunity for shared learning as the cross disciplinary team learns from each other.

Objective 2: To develop a training manual on big data management in agriculture: A training manual on the best practices of database management utilizing the resources created during this internship will be developed and tested by the cohort. Based on the student experience and recommendation, final version of the manual will be published for use in future cohorts.

  • A Training Guide for New Users for Database Management in MySQL: Because of the limited background of students in programming skills we decided to start with developing a database in MySQL and A training manual for beginners to use MySQL was created in this program and is uploaded in GitHub for public access.
  • A Publicly Available Standardized Database with Pedometer Data from Dairy Cows and UAS Data from Cotton: Easily accessible and understandable standardized datasets have been created for pedometer data and UAS data and uploaded in GitHub.
  • Recording of the Training Sessions Available Through Online Platforms: All the recordings are uploaded and made publicly available in our program YouTube channel.
  • Programming Codes Developed for Data Analytics Shared on Public Platform: The students developed and tested few exercises for treatment comparisons in MySQL. The code along with the database is shared in GitHub.
  • Cross Training Future Workforce on Data-Driven Decision Support Tools: A summer training program on data management in animal and plant systems was conducted from June 3, 2022 to August 25, 2022, to develop student competency in understanding data-generating systems, and the database, and utilize it to develop data-driven tools for precision phenotyping.
  • Cross Training Future Workforce on Data Handling and Interpretation for Precision Agriculture Systems American Society of Animal Science Conference: Southern Section Meeting (January 21-24, 2023)

Round 3 Funded Seed Grants 9

Application Deadline: March 15, 2022

Homomorphic encryption to enable sharing of confidential data

Hao Cheng, Jack C.M. Dekkers, Christopher K. Tuggle, Richard Mott
Download Narrative

The overall goal of this proposal is to evaluate the ability of a recently proposed homomorphic data encryption method to address privacy or intellectual property issues that prevent data sharing and to enable adherence to and capitalizing on the benefits of the FAIR (Findable, Accessible, Interoperable and Reusable) principles for research data and industry data. A recent review of issues and methods related to safeguarding privacy of genomic data in human genetics is in Wang et al. (2022).

Standardizing data management and terminology for increased adoption of virtual fence systems

Jameson Brennan, Logan Vandermark, Krista Ehlert, Hector Menendez, Ryan Reuter, Mitchell Stephenson, Dana Hoag, Paul Meiman, Joslyn Beard, Rory Charles O'Connor
Download Narrative

Advancement of precision land management technologies enables producers to manage the landscape with grazing animals to strategically improve ecosystem health and sustainability. Among the more novel of these technologies is virtual fencing (VF) - borders without physical barriers - to implement precision grazing management (Anderson 2007; Umstatter 2011). VF systems operate via GPS-enabled collars on each animal. There is a three-way interaction between the collars, a base station in the field, and a user interface (software) on a computer that allows users to 'draw' their pasture boundaries. These boundaries transmit to the base station (operated by cellular and solar), which 'pushes' the virtual fence instructions to the collars. Livestock are controlled within the virtual pasture with an auditory stimulus followed by an electrical pulse if the animal goes farther into the virtual boundary. The system is designed such that animals learn the association between the auditory cures and the electrical pulse and respond to the auditory cues alone.

Understanding emergent agricultural phenomena through Big Data Analytics: creating frameworks for understanding using Physics-guided Machine Learning and agent-based models

Michael Kantar, Diane R. Wang, Bryan Runck, Barath Raghavan, Adam Streed, Patrick Ewing
Download Narrative

Agriculture has the greatest footprint of any human activity, and much work has gone into improving its sustainability (Harwood, 2020). In modern conventional agriculture some hope to mitigate impacts/costs through optimization while in agroecology some hope to create holistic, resource-conserving methodologies for management. However, these two approaches to sustainable agriculture often come from different epistemological viewpoints; as a result, it is difficult both intellectually and practically to determine the best or even a good course of action in sustainable farming today (Jordan and Davis, 2015). While much work has gone into exploring complex cropping systems that provide more ecosystem services while producing the same amount of food, feed, fiber, and fuel as simpler systems (Tamburini et al. 2020), these systems are often idiotypic (Shaffer et al., 2000) and not transferable outside of the farms where they were trialed (Robertson et al., 2012). As computing has penetrated nearly all aspects of modern society (e.g., transportation, health and medicine, and human interaction), many have proposed to leverage computing to improve the sustainability and productivity of agriculture (Raghavan et al., 2016). We propose a way of merging individual farm-based solutions and accommodating different epistemological frameworks by borrowing tools from computer science---in particular, the notion of a state space (e.g., plant traits, cropping system) which can be explored by an artificial agent.


Developing education, research, and extension training on precision agriculture phenotyping tools at HBCU

Jingqiu Chen, Wei-zhen Liang, Violeta M. Tsolova, Jian Jin
Download Narrative

With the advancements of machine learning and artificial intelligence in digital agriculture, especially the precision agriculture phenotyping sensors and tools. There is a gap between HBCU (Historically Black Colleges and Universities) education, research, outreach, and the advances in precision agriculture phenotyping technologies. Florida Agricultural and Mechanical University (FAMU) is an 1890 land-grant institution (#1 Public HBCU by U.S. News & World Report) dedicated to the advancement of knowledge, resolution of complex issues and the empowerment of citizens and communities. As the land-grant arm of FAMU, the College of Agriculture and Food Sciences (CAFS), PI Chen's home college, plays a vital role in providing researched-based information and resources directly to Florida's farmers, individuals, producers, communities, and agri-businesses. FAMU CAFS Center for Viticulture and Small Fruit Research is recognized internationally for excellence in warm climate grape research and facilitator of outstanding academic programs for experiential learning and student training. Viticulture Center maintains the most extensive muscadine grape germplasm collection in the world and is serving as one of the five National Clean Plant Centers for Grapes. The Biological Systems Engineering (BSE), PI Chen's home program, is a branch of engineering which integrates agricultural, biological, chemical, and engineering sciences. The BSE program is one of the two ABET (Accreditation Board for Engineering and Technology) accredited BSE programs among the nineteen 1890 HBCUs in the U.S. Currently, there is a critical need for CAFS especially BSE program to develop education, research, and extension training on precision agriculture phenotyping tools.

  • Poster: Experiential Learning on Precision Agriculture Phenotyping Tool in in Muscadine Vineyards and Data Analytics
  • Presentation: Developing Education, Research, and Extension Training on Precision Agriculture Phenotyping Tools at HBCU Communities
  • Video: Introduction to Precision Ag and Plant Phenotyping
  • Precision Agriculture: What is the precision agriculture? Why it is a likely answer to climate change and food security?
  • Plant Phenotyping: What is plant phenotyping?

A genetic data portal to enable discovery of deleterious genetic variants in farmed animals

Theodore S. Kalbfleisch
Download Narrative

Recessive lethal alleles exist benignly in breeding populations, until a sire and dam carrying them are mated. One quarter of the resulting pregnancies will be homozygous for the lethal allele and will result in an aborted pregnancy. Missed breeding opportunities are expensive. These recessive lethal alleles will increase in frequency within the population, distributed as heterozygotes, until ultimately manifesting themselves as lethal when two heterozygous carriers are mated. If it is possible to identify these lethal alleles, then farm managers can mitigate the problem by ensuring that two carriers are never mated to one another, thus boosting the likelihood of a successful pregnancy by 25% for any carrier.

Leveraging single-cell genomics in QTL mapping

Susanta Kumar Behura, Jared Egan Decker
Download Narrative

This project seeks to develop an information hub for training/teaching agricultural researchers with an aim to facilitate application of single-cell functional genomics in quantitative trait loci (QTLs) mapping of agriculturally important traits. Integration of genetic variation data with cellular and molecular data has been used to map expression QTL (eQTL) or methylation QTL (mQTL) or chromatin accessible QTL (caQTL) linked to diverse phenotypes (Kumasaka et al., 2016; Volkov et al., 2016; Ciuculete et al., 2017; Benaglio et al., 2020a; Keele et al., 2020; Zhao et al., 2020). Such approaches have also been applied to map QTLs linked to traits of agricultural importance (Long et al., 2011; Liu et al., 2020; Kushanov et al., 2021; Yuan et al., 2021). However, these studies have been performed with functional genomics data derived from bulk tissues that cannot determine if the phenotype is influenced by specific cell types in the tissue. Recent spur in single-cell RNA sequencing (scRNA-seq) and single-cell Assay for Transposase-Accessible Chromatin sequencing (scATAC-seq) have generated new opportunities to integrate genetic variation with changes in gene expression and open chromatin profiles to identify single-cell eQTLs and caQTLs (Benaglio et al., 2020b; van der Wijst et al., 2020; Neavin et al., 2021). These methods have great capabilities to untangle cellular and molecular links to major phenotypic traits such as crop yield, animal production and plant resistance to insect pests to sustain agricultural productivity (Cole et al., 2021; Tripathi and Wilkins, 2021; Zhang et al., 2021; Zhu et al., 2021; Nyyssölä et al., 2022).

Using unmanned aerial vehicles to detect nitrogen stress in alfalfa (Medicago sativa L.)

Anju Biswas, Esteban F. Rios, Aditya Singh
Download Narrative

The resurgence in sustainable farming practices in recent years is driven mostly by interests in improving soil health, nutrient cycling, and carbon sequestration. However, most of the research has focused on utilizing annual cover crops, which are often terminated at the end of the season, and the benefits of alfalfa (Medicago sativa L.) in cropping systems have been largely overlooked. Due to its perennial nature, alfalfa can improve soil structure, decrease erosion, and increase carbon sequestration in soil. Increased utilization of alfalfa will not only help to reach ecological goals, but it will also help in improving wildlife habitat and biodiversity, while providing a highly nutritious feedstuff for livestock.

An AI toolkit for video phenotyping in livestock

Samantha A. Brooks, Madelyn Smythe, Kyle Allen, Adam H. Biedrzycki, João Bittar
Download Narrative

Lameness presents a major animal welfare concern and is a significant economic burden for the livestock industry. For example, lameness costs the dairy industry alone around $52 million a year. Current methods of assessing lameness, conformation and locomotion phenotypes are often plagued by a lack of repeatability and accuracy, yielding heritability values for lameness of just 0.01 and 0.22; indicating need for a more accurate phenotyping approach for locomotor traits. Visual assessment, the most common approach, lacks repeatability [6], accelerometer and gyroscope methods alter natural gait patterns, and reflective 3D markers are not feasible in less tractable livestock systems where application of reflectors and utilization of multiple-camera detector arrays is impractical and costly. StepMetrix technology in cattle has documented stance time and ground reaction force but has yet to consider diverse locomotor phenotypes indicative of lameness like back posture. This project will utilize a published machine learning package (DeepLabCut, DLC version 2.2b7) in combination with a custom gait analysis pipeline to produce quantitative locomotor phenotyping protocols specifically for livestock. Previous work demonstrated the deep neural network (DNN) approach employed by DLC can label landmarks on an animal with the same accuracy as the human eye but in far less time. For example, the pilot project described below would take one full-time operator about four months of continuous work to label the 77,000 frames of data, but only a few hours by applying the DLC machine learning approach.

Creating a FAIR data ecosystem for incorporating single cell genomics data into agricultural G2P research

Christopher K. Tuggle, Peter W. Harrison, Christine Elsik, Nicholas Provart
Download Narrative

The analysis of how genome information creates phenotypes at the single cell level, the fundamental unit of biology, is a powerful approach for understanding genome function, and is rapidly becoming the gold standard for human genetics research predicting phenotype from genotype. The multicellular complexity of plant and animal agricultural species limits our understanding of the regulation and organization of their genome, and the expression patterns of their genes in each cell composing these species. To make the enormous promise of single-cell (SC) genomics a reality for the agricultural genome to phenome community, we need to develop Findable, Accessible, Interoperable, and Reuseable (FAIR) SC data resources and informatic tools for storing, sharing, and analyzing such data that is currently accumulating in crop and livestock research groups. We believe this Enabling seed grant proposal addresses topic areas #1 and 2 in the AG2PI RFP. The lack of FAIR SC data and the computational skills required for researchers to use such data currently prevents the adoption of this powerful method within the AG2PI community.

  • Toward a Data Infrastructure for the Plant Cell Atlas
    Noah Fahlgren, Muskan Kapoor, Galabina Yordanova, Irene Papatheodorou, Jamie Waese, Benjamin Cole, Peter Harrison, Doreen Ware, Timothy Tickle, Benedict Paten, Tony Burdett, Christine G Elsik, Christopher K Tuggle, Nicholas J Provart
    Plant Physiology. kiac468 | October 6, 2022
  • Presentation: Creating a FAIR data ecosystem for incorporating single cell genomics data into agricultural G2P research

Round 2 Funded Seed Grants 11

Application Deadline: September 19, 2021

Creation of a database designed to promote dairy cow welfare using non-invasive phenotypic indicators of heat stress

Courtney Daigle

There is a need to characterize the variability of the dairy cow heat stress response and to identify non-invasive phenotypic indicators of heat stress that can be automatically detected using existing technologies. We aim to

  1. Characterize regional variability in productivity responses to heat stress
  2. Identify and collate the needed phenotyping data to characterize the heat stress response
  3. Develop strategies for integrating disparate data types from animal monitoring systems to create a relational database of non-invasive phenotypic indicators of heat stress

GPS collars as precision agriculture tools for managing extensive rangeland production systems

Andrew Hess, Scott Huber
Download Narrative

Animals in extensive production systems are faced with many environmental challenges which may impact their ability to perform. We will use GPS units as a precision agriculture management tool to make land-use management decisions to maintain a healthy ecosystem, track animal behavior, and develop novel trait definitions for individual performance in a rangeland landscape. We expect GPS collars will provide a means to address the economic and environmental costs of an extensive sheep operation by providing quantitative measures of animal behavior in a rangeland environment.

Harnessing Ag Genomics Data to link genotype to phenotype

James Koltes, Chris Tuggle, Peter Harrison, Alenka Hafner

Plant and animal communities are accelerating the creation of functional genomics data. To capitalize on these investments, better methods and data standards are needed to integrate disparate datatypes, predict regulatory elements and glean new insights. We propose to bring together experts in plant and animal functional genomic data reuse and sustainability for a workshop and on-going discussion groups to identify and prioritize shared needs for data re-use tools to link genotype to phenotype.

  • Workshop: Harnessing the Ag Genomics Data Torrent: A Community-driven Discussion on Best Practices for Using and Reusing Genomics Data

Community engagement to improve standards and integration for genotype, phenotype, and environmental data for model and non-model plants

Irene Cobo, Meg Staton, Jill Wegrzyn

While the technical aspects of data integration are achievable, the metadata collection required for robust meta-analysis of G2P and G2E studies, especially for non-model plant systems, remains a hurdle. We will develop a fully FAIR data submission module (TPPS) that can be implemented by all Tripal plant databases, integrate the WildType mobile application to collect trait data for landscape-based studies, and train the scientific community from biocurators to Tripal database administrators.


  • Tripal Plant Popgen Pipeline (TPPS) has now adopted MIAPPE (Minimal Information About a Plant Phenotyping Experiment) standards to improve its interoperability across a wider range of experimental designs and systems. This new release of TPPS and the training materials can be found in this link:
  • Seed grant objectives was successfully completed with TreeSnap (the partner application) rather than WildType. The application has been updated and is publicly available here for Apple and Android platforms:
  • Workshop at PAG 30 International conference entitled The AgBioData Consortium: Challenges and Recommendations for FAIR Genetic, Genomic and Breeding Data. Session entitled: Challenges and Opportunities in Connecting Genotype to Phenotype Data
  • AgBioData monthly webinar presentation (March 1, 2023)
  • PAG 2022 Poster: Integrating, Visualizing and Analyzing Plant Environments, Phenotypes and Genotypes Using Cartograplant, Wildtype and Tripal Galaxy
  • CartograPlant YouTube Channel
  • Seed Grant Project Final Report

Democratizing the access to artificial intelligence solutions for underrepresented and non-expert communities

Joao Dorea, Tiago Bresolin

Currently, the biggest challenge for underrepresented and non-expert users is to have access to AI techniques through a more user-friendly interface to perform basic AI tasks without requiring extensive expertise in the corresponding areas. In this project, we will develop an open-source software to democratize the access to AI techniques. Such software will be used to perform image classification by training new and customized deep learning algorithms on datasets provided by the user. Our goal is to create more accessible user interfaces so that coding ability is not a barrier.

  • Workshop: Train Your Network: Simplifying Computer Vision Application Development

Event-based plant phenotyping using deep learning: Algorithms, tools and datasets

Sruti Das Choudhury, Ashok Samal, Srinidhi Bashyam, Yufeng Ge

Event-based phenotyping analysis refers to the timing detection of the important events in a plant's life. This research will develop deep learning-based algorithms for emergence timing detection and growth tracking of seedlings using time-lapse image sequences, and detecting flowers and fruits from multi-view images to compute reproductive stage phenotypes. We will publicly release a benchmark dataset to develop and evaluate the algorithms. A software tool called iPlantSeg+ will be released to allow non-experts to perform segmentation and compute common phenotypes.


Dataset for Dynamic Plant Phenotypes; 1) maize emergence dataset; 2) flower pheno dataset; 3) fruit pheno dataset

Developing a cost-effective method for collecting informative, population-level molecular phenotypes

Troy Rowan, Jon Beever, Kurt Lamour, Liesel Schneider
Download Narrative

We propose using a sub-$5 targeted gene expression approach as a molecular phenotype in beef cattle. This proposal aims to computationally identify 500 high-information genes that will be assayed in ~1,500 beef stocker calves. We will explore the utility of using these expression counts to predict future calf performance & health outcomes and as a latent phenotype. While we focus on beef cattle, we expect that this technology could be applied across species and genome to phenome applications.

Developing a new machine learning tool for improved genomic selection in non-model systems

James Polashock, Joseph Kawash

Identifying genetic adaptation in non-model systems (NMS) often cannot be sufficiently addressed using standard marker assisted selection (MAS) methodologies. While MAS is indispensable for the selective breeding of traits, NMS have not been able to take advantage of advancements in MAS due to the high overhead and imperfect data they often face. NMS contend with imperfect pedigrees, smaller populations, missing phenotypic/genotypic information, and complex interacting genetic components. Machine learning (ML) methods have shown to be tolerant of these biases and offer an alternative means of providing markers for genome selection. In spite of the potential for ML to vastly improve MAS in NMS, little is available by way of tools for researchers to utilize for breeding programs. We plan to address these problems through

  1. Development of an effective ML-based algorithm tailored for genome selection
  2. A simple to implement tool for use by those that are familiar with breeding/MAS

This tool will identify variant locations that are contributing to phenotypic variation of a dataset without adding to user workload by utilizing high throughput genotypic and phenotypic information that is common to a MAS breeding program. The selected variant sites will be utilized as the basis of genetic markers for population screening and selection towards the improvement of germplasm. This methodology would reduce the resources needed for MAS in non-model crop species or those with complex phenotypic targets.

Sharing Unoccupied Aerial System (UAS) based high-throughput plant phenotyping data via public cloud

Jinha Jung, Zhou Zhang
Download Narrative

The main goal of this project is to develop online educational material for managing and sharing UAS based HTP data using public cloud services. Although the importance of the FAIR principles in data-intensive science has been addressed multiple times, little attention has been paid to management and sharing of the big geospatial data generated from the UAS imageries yet. We propose to develop online educational material to provide tutorials on how to share the geospatial data products generated from the UAS data with the general public as web services.


Workshop with Tutorials that cover the three modules (Module 1: Web Server configuration, Module 2: Raster data sharing, and Module 3: Point cloud data sharing)

Raster and point cloud data sharing available at:

Workshop Recordings:

Cross-species genomic analysis of photosystem II: Building connections from molecular structure to phenotype

Carmela Rosaria Guadagno, Marilyn Gunner
Download Narrative

The use of scale-invariant properties can improve our understanding of genome to phenome associations. This project uses the first principles of biophysics to develop cross-scale correlations between photosystem II related genes and drought phenotypes for agricultural species. We will perform a comparative genomic study searching for changes in the sequences to be imported into protein crystal structures for molecular modeling. Modeled water affinity across species will be correlated to existing phenotypic information to build associations for the ability of plants to grow and strive under water limitations.

Impact of breed type on beef production and sustainability

Kara Thornton-Kurth, Sulaiman Matarneh, Brenda Murdoch, Gordon Murdoch

Despite years of research, there is not a clear understanding of how genotype contributes to phenotype of beef cattle. Our long term goal is to determine how underlying genetic differences present between cattle of different breed types translate to differences in animal performance, carcass quality, environmental impact and therefore economic viability. The overall objective of this proposal is to gather preliminary data to better understand how genetic differences relate to economically important traits and establish a resource for future genome and phenome comparison and manipulation.

Round 1 Seed Grants 7

Application Deadline: March 19, 2021

Empowering High-Throughput Phenotyping using Unoccupied Aerial Vehicles (UAVs)

Max Feldman, Filipe Matias, Jennifer Lachowiec, David LeBauer
Download Narrative

The AG2P Initiative strives to connect genotype to phenotype in many environments. A growing source of phenome data in agricultural research across animals and plants are Unoccupied Aerial Vehicles (UAV). In this proposal we build a foundation to empower more researchers to use UAS. In the objectives, we will complete an international survey of agricultural animal science and plant science researchers of UAV use and build a community-informed web resource to provide instructional information and benchmarking tools to support the growing number of UAV users.

Expected Outcomes and Deliverables: In summary, this project will provide the foundation to accelerate high-throughput phenotyping in agriculture to support the mission of AG2PI. A survey will assess how UAV imagery has been applied to plant and animal sciences in agriculture and identify obstacles common among research groups that will pinpoint potential solutions. Simultaneously, developing standardized best practices will support UAV adoption and reduce barriers to entry. These efforts will be shared through workshops, videos, conferences, and manuscripts.


Ethics, Diversity and Inclusivity in G2P Research

Cassandra Dorius, Shawn Dorius, Kelsey Van Selous, Rachael Voas
Download Narrative

The AG2P Initiative strives to connect genotype to phenotype in many environments. A growing source of phenome data in agricultural research across animals and plants are Unoccupied Aerial Vehicles (UAV). In this proposal we build a foundation to empower more researchers to use UAS. In the objectives, we will complete an international survey of agricultural animal science and plant science researchers of UAV use and build a community-informed web resource to provide instructional information and benchmarking tools to support the growing number of UAV users.

Sustainably supporting worldwide food production is a wicked problem of immense scale and complexity. As such, there is an urgent need for novel ideas and technological innovations in the agricultural research. One strategy for rapidly infusing new ideas into existing knowledge networks is by making these person-centric networks more diverse, data more accessible to relevant stakeholders, and by infusing current practices with the kinds of Ethical, Legal, Social, Ecological, and Economic (ELSEE) considerations that will transform agricultural genome and phenome research practices. Bringing underrepresented groups to the table is one way to infuse AG2PI with new ideas. Improving and expediting knowledge transfer (e.g. data sharing), more effectively communicating research findings to the general public, policy makers, and funding agencies, and developing new science practices, can also help AG2PI to achieve sustainable genetic improvements. We propose to advance the aims of the AG2PI by conducting social science research to encourage cross-fertilization of AG2P data and ideas and motivate agriculture focused analysis from an ELSEE perspective. We will also create human-centered


Seeding public-private partnerships for AG2P training

Addie Thompson, Tammy Long, Jyothi Kumar
Download Narrative

The main goal of this project is to form public-private partnerships to expand meaningful interactions between industry and the public sector. Outcomes will include:

  1. Enhancing graduate student training through the use of current real-life AG2P project scenarios and datasets;
  2. Generating public educational resources including datasets and code for use by other scientific communities for AG2P training;
  3. Serving as a model and test bed for PPP engagement and seeding meaningful ongoing collaborations in interdisciplinary groups.

Optimizing 3D canopy architecture for better crops

Bedrich Benes, Duke Pauli, Fiona McCarthy, James Schnable

Implementing machine learning approaches to connect genotype-phenotype has been hindered by the lack of available, labeled training datasets. To overcome this limitation, we will use geometric modeling and object reconstruction, using point cloud data, to develop simulated datasets of organismal development. These simulated growth models will generate labeled datasets which can be used by machine learning algorithms as training data to study the complexities of phenotypic diversity. Moreover, we will develop and test a system for illumination estimation of the virtual crops.

Aim 1: We will provide a large, labeled dataset of point cloud data for sorghum plants with varying precision for ML and evaluation by the plant science community. We will develop the generative procedural model of sorghum that will be parameterized by branching angles, plant age, etc.

Aim 2: We will provide a highly parallel path tracer simulating sorghum illumination at a photonic level at varying wavelengths. This simulator will be expandable for further BRDF and BTDF values and will be parameterizable for varying longitude and latitude.

  1. Novel 3D simulations of sorghum grown at high-density with plant parts segmented & labeled,
  2. Plant and organ labeled simulated point cloud data for sorghum at high planting densities,
  3. ML models using simulated segmented plant data tested on greenhouse and field LIDAR data,
  4. Improved understanding of light interception and interaction dynamics with plant canopy,
  5. Engagement with animal researchers to assess how to extend approaches to animal science,
  6. Hosted AG2PI workshop to engage with plant and animal scientists and expand community.

Machine learning competitions for G2P and end-of-season phenotype prediction

Abby Stylianou
Download Narrative

In this project, we will organize competitions to engage with the broader machine learning community to produce models that can answer phenomic questions, using curated datasets from the Department of Energy ARPA-E Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) program [Burnette et al., 2018]. The TERRA-REF program aimed to transform plant breeding by using remote sensing from a state of the art field scanner gantry system, seen in Figure 1 (top) to increase the speed at which plant traits can be measured. The field scanner includes a number of sensors, including a millimeter resolution laser 3D scanner, high resolution stereo-RGB cameras, multiple hyperspectral sensors, and a thermal camera, among others (example data products are shown at the bottom of Figure 1). Over the course of several seasons, this sensor collected over a petabyte of sensor data for bioenergy sorghum lines, their corresponding genetic data, a large volume of ground truth measurements of plant phenotypes and growing conditions, and a baseline set of algorithmic approaches for extracting phenotypic data.

We propose that structured machine learning contests are an excellent way to share data and ensure that the desired scientific questions are actually the ones that are answered. Contests, where a specific problem is shared, training and testing data are provided, and a specific evaluation protocol is defined, are a frequent and popular means of advancing results to specific questions within the machine learning community. For example, the Fine-Grained Visual Classification community hosts annual contests on difficult visual recognition problems, such as the iNaturalist contest to recognize different species in image data, which had 1,477 submissions last year from 249 different competitors [Van Horn et al., 2018]. These communities are hungry for well organized datasets with specific scientific questions.


Identifying Educational Resources and Gaps in AG2P Data Science Across Plant and Animal Agriculture Genomics

Breno Fragomeni, Cedric Gondro, Margaret Young, Gabriella Dodd, Tasia Taxis
Download Narrative

Recent developments in genomics and the implementation of new technologies in agriculture have enabled a new research horizon in the field (Harper et al., 2018, Morota et al., 2018). The quality and quantity of genomic data, including microbiome, gene expression, high density SNP markers, sequence data, among others, have great potential to improve both plant and animal science enterprises. However, with these new developments a new challenge has arisen: practitioners must be able to manage the large data sets that result from these technologies – a skill that has traditionally is not been taught during the training of agricultural scientists (Eisen, 2008).

The goal of the Seed Grant is to catalog the available resources and resource gaps in data science to support the Agricultural Genome to Phenome (AG2P) initiative and to outline solutions to fill the gaps. We will develop surveys to identify how aware students and researchers are of the available resources. Additionally, we will create an online repository linking to available training materials in both plant and animal agricultural data science. In this repository we aim to provide the community a unified access point to information about workshops, seminars, online and in-person classes, and course curricula for different career stages. We will prepare a white paper describing our findings based on the survey and the catalog, focusing on how to advance data science education in AG2P. We will fund a graduate student to carry out this work. This student will work in the laboratory of PI Fragomeni with additional support from our team of investigators. We will use the results of the seed grant as preliminary data to apply for a large educational project to develop solutions to the needs identified in this project.


Cattle Genome to Herd Phenotyping for Precision Agriculture

Stephanie McKay, Darren Hagen, Robert Schnabel, Brenda Murdoch

The overarching goal of the Cattle Genome to Herd Phenotyping for Precision Ag initiative is to exploit new phenotyping technologies and high throughput genomics to improve cattle productivity and profitability. To accomplish this, we will establish a network of researchers from a variety of disciplines and agencies to facilitate generation and implementation of next generation phenotyping technologies in cattle and ascertain existing resources.

Expected Outcomes and Deliverables: PIs McKay, Hagen, Schnabel and Murdoch expect to form a CG2HP working group consisting of scientists from a variety of disciplines (i.e. nutrition, physiology, engineering and economics). Scientists will represent leaders from universities, government, industry and breed associations. This working group will be expected to attend an initial virtual meeting in September and meet in person in San Diego in January 2022. A CG2HP concept presentation will be given at PAG and a concept paper addressing the needed phenotyping data and technologies necessary to implement CG2HP in cattle will be generated and published. Further directions and future action items will be discussed, including future funding opportunities. Additionally, a presentation will be given at an AG2PI conference or workshop.