AG2PI Field Day #30 - Wednesday March 20, 2024

Phenomic Prediction & Bioinformatic Workflows - AG2PI "Coconut" Grant Outcomes

Wednesday March 20, 2024 @ 10:30 AM - 12:00 PM (US Central Time)
Download Flyer
Wednesday March 20, 2024
10:30 AM - 12:00 PM
(US Central Time)


Discussion of outcomes from two AG2PI Coconut seed grant projects as one addresses the use of phenomic prediction and the other develops new bioinformatic workflows.


(Virtual Zoom Meeting)

Register for the virtual event by clicking the link below. Upon registration, you will receive a confirmation email with information about joining the meeting

Field Day Registration

Field Day Recording

Field day recording is now available, click the button below to launch and watch the recording

Watch Field Day Recording

Chat Questions

Questions directed to the speakers placed in the chat can be viewed by clicking the button below

See Chat Questions

When Are Models Too Good to be True? Accurately Evaluating Phenomic Prediction as a Tool for Plant Breeding

High-throughput phenotyping (HTP) describes a rapidly developing and expanding toolkit for plant breeders to collect large-scale phenotype data on their candidate varieties. HTP tools like Near-infrared spectroscopy (NIRS) can rapidly and cheaply collect dozens to hundreds of data points per plant. Many recent studies have shown that these phenomics features can be used by machine learning models to accurately predict other phenotypes of plants such as grain quality or yield, either earlier in the season or more cheaply than these target traits can be measured directly. In some cases, Phenomic Prediction appears competitive with Genomic Prediction in accuracy, and therefore has been proposed as a viable tool to increase the accuracy and rate of genetic gain in breeding. Here, we discuss a key complication with this approach – that directly comparing Phenomic Prediction accuracy and Genomic Prediction accuracy is not an appropriate way to evaluate whether Phenomic Prediction is useful for breeding. We show how such comparisons can give the misleading impression that Phenomic Prediction will increase rates of gain, even in cases when it will not. We end by discussing ways to appropriately use and evaluate the benefit of Phenomic Prediction in breeding programs.


Dr. Daniel Runcie

Dr. Daniel Runcie is an Associate Professor in the Department of Plant Sciences at the University of California Davis. His group studies the genetic basis of plant adaptations to their environments, developing statistical and mechanistic models linking genotype to phenotype in both crop and natural systems. He has developed several statistical software tools for quantitative genetics including GridLMM and MegaLMM that enable genome-scale analyses of large datasets.

Developing Bioinformatics Workflows to Support Agricultural Genomics

Agricultural genotype to phenotype (G2P) applications are hindered by the lack of accessible bioinformatic workflows. Well documented workflows support those new to genomic data analysis, enable G2P integration and allow benchmarking to compare workflows. I will demonstrate how we supported students to develop and provide common genomics workflows as documented, freely available resources for the agricultural research community. I will also provide the initial findings from a discussion of the agricultural community's bioinformatic needs to support genomics analyses.


Dr. Fiona McCarthy

Dr. Fiona McCarthy is a Professor in School of Animal and Comparative Biomedical Sciences at the University of Arizona. She served as a NRSP-8 Bioinformatics Co-coordinator for 12 years and Co-Chaired the writing committee for the new NRSP-8 project.

Chat Questions

To gain phenomic data you need to create individuals/line crosses and measure them; for genomic prediction you can predict performance of lines that don't exist and which crosses to make to maximize probability of top performance. Is that right?
Dr. Daniel Runcie

Yes, that's right. For phenomic data you have to wait until you can grow the plants. And most successful phenomic prediction models have used phenomic data that was late in the plant's development.

Phenomic prediction for variety release as it contains non-additive genetic effect, but GS for selecting parents, is that right?
Dr. Daniel Runcie

Yes, that's right.

In genomic selection, having dense markers is good for prediction. is it appropriate to carry out data quality control by filtering out markers with minor allele frequency prior to fitting genomic prediction model?
Dr. Daniel Runcie

Low frequency markers tend not to be very useful because their effects can't be measured accurately due to the low sample size in any experiment. This means that they don't tend to help at all. But they don't tend to hurt much either. In my experience, it doesn't make a lot of difference if you leave them in, except it will slow down the analysis.

Why can't the phenolic prediction be used to improve the speed like GS?
Dr. Daniel Runcie

Basically for the reason discussed above - you have to wait for the plants to grow. If you can collect your phenomic data on seeds or seedlings, then yes, speed can go up a lot. But in all the successful phenomic prediction models we've seen for interesting traits like quality or yield, the phenomics traits have been measured close in time to the target trait, which means waiting as long as you'd have to wait for phenotypic selection. So in these applications, the cycle length will remain slow.

(From Audience): Most of the reports are based on NIRS. Very few have used temporal data, but these seem promising.

(Daniel Runcie): Yes, I agree these data are likely useful, but again collecting temporal data requires waiting for most (or all) of the season before you can plug values into your prediction model, so you won't gain in speed.

I am not clear on how you determined that genomic selection has a 2x rate of gain? I have not seen any actual results near that high. Isn't it only faster if you have seed chipping technology and can pre-select what is planted? Similar on genomic selection, isn't there greater potential for a genetic bottleneck/ erosion?
Dr. Daniel Runcie

First question: The 2x was supposed to refer to speed (1/2 cycle length) as an example of a realistic shortening of cycles relative to phenotyping selection. But you're right this won't necessarily translate to gain for various reasons including potentially reduced accuracy or reduced variance. My point was that it's much easier to construct Genomic Selection (GS) schemes with fold-changes in speed than Phenomic Selection (PS) schemes with fold-changes in speed. As discussed above, most successful phenomic prediction models have required phenomics data from late in the field seasons. One use of genomic selection even without seed chipping is speed breeding, where you just grow plants long enough to make crosses without worrying about trying to phenotype them.

(Follow-Up Question): I agree with Dr. Dekkers that genomic selection with 2x gains has been seen in livestock and is transformative when you have to wait for years in sexual maturity to evaluate progeny, same with trees. But in terms of something like strawberries or annual crops, I have not seen an actual breeding program show 2x gain in elite material, have you found reports of this?

(Daniel Runcie): I have not! But I'm not sure the gains that large commercial corn companies are achieving using genomic selection. But even in annual crops, varieties are often screened over multiple field seasons, and sometimes require early-year growouts to increase seed. This means it still takes several years to complete a cycle.

Second question: There is some concern that GS will erode genetic variation. But I think this is not necessarily the case. Some discussion is here:, showing that GS is expected to maintain greater genetic variation than pedigree selection. One possible advantage of genomic selection schemes is that you can directly measure genetic relatedness and make selection decisions that explicitly try to maintain diversity which is harder to do with phenotypic/phenomic selection schemes.

(Response from Audience): Re: Second question: It has been awhile since I read this article but I think you are referring to the discussion at the end (e.g. Goddard) I am not aware of anyone implementing these considerations, as it would be more difficult and likely would not choose the top individuals, are you? In PS, I feel we will likely be selecting more on favorable genomic epistatic combinations (additive and others, which we know are pervasive but hard to measure - see Trudy Mackays review) over pure additive effects that don't show epistasis (which are rare). If this is the case PS would likely maintain genetic diversity better than GS. But this is a hypothesis, and one that would be difficult to test.

(Daniel Runcie): Good points! But that applies mostly to variety release, I think, because these epistatic combinations aren't going to be very helpful for improving the breeding population across generations because they don't contribute to breeding values.

If the phenomics contains a lot of noise, will it affect the predictive ability of phenotypic selection?
Dr. Daniel Runcie

Yes, but that equally applies to genomic data as well. Markers that are not in high LD with causal markers are effectively noise, and genotyping errors are possible too.

Is not the gap between accuracy and ability in phenomic selection ultimately due to the GLM used for prediction that does not account for GxExM? Would not an ecophysiological model not eliminate this problem?
Dr. Daniel Runcie

The ecophysiological model would get better measures of GxExM, but I think would not necessarily help estimate breeding values directly. Breeding values average over ExM, so predicting GxExM itself isn't really the right goal for population advancement. You'll still need a genetic inheritance model to get estimated breeding values.

Thanks for your presentation, a couple of times you mentioned increase heritability in the context of phenomic selection, does this refer to decreasing environmental variability?
Dr. Daniel Runcie

I'm referring to narrow-sense heritability, the correlation between observed traits and true breeding values. Part of the reason that traits differ from breeding values is environmental variability (say microenvironmental variation among plots with different genotypes across a single field). But non-additive genetic variation among the lines, and GxE across different environments in the TPE also reduces narrow-sense heritability. If we knew true breeding values, we'd use them to train our genomic prediction models. We don't so we have to use trait values instead. But the more correlated those trait values are to the actual breeding values, the better the genomic prediction models will be. This means the higher the narrow-sense heritability, the better the genomic prediction model.

Can you discuss a bit more about pleiotropy and selection of traits to phenotype? My understanding is that breeders often don't know which traits actually contribute to gain/yield — other than some single genes.
Dr. Daniel Runcie

Yes, that's something that I've wondered about when trying to turn phenomics data into breeding decisions. The idea of Phenomic Prediction is to let a machine learning model figure out what traits are useful. As long as you do the model training/testing carefully, it should work. This means that your training data should be a fair sample of the whole population you're trying to predict, which in practice means that the correlations between the phenomic traits and with the target traits should be the same between the training and target populations.

Considering genotype evaluation in a network of experiments (various environments). Is it possible to measure all the characteristics of interest in a few environments and, on the other hand, measure a few characteristics in other environments and train models to predict the missing data in environments where there was no phenotyping?
Dr. Daniel Runcie

Yes, this should be possible. I think this should be a reasonable way to increase the heritability of traits you can't measure in every environment. Having predictions in every environment would allow you to do better genomic (or pedigree-based) selection on the average of these traits (i.e. their genetic values).

Two more (probably biased) questions on what you considered:
  1. Phenomic data can be more predictive across environments (in part because it captures GxE, epistasis, dominance, in part because it can measure numerous traits correlated with the trait of interest [e.g. indirect selection]) even early in growth, so you may need less training data to develop a robust prediction model than genomic selection?
  2. Genomic selection is mature and has been around for 20 years with few major additional breakthroughs occurring in the last 10. Phenomic selection has only been attempted by a few groups for ~5 years (grain/product NIRS based, but temporal based <3 years) and so future breakthroughs seem likely (and highly publishable, which is part of the goals for a public program).
  3. Do you see some potential here, or are you just considering what has been reported in this short time for immediate use in genetic gain?
Dr. Daniel Runcie

I agree that Phenomic Prediction should continue to improve and there is an opportunity for innovation there. I expect that eventually it will be possible to predict the final performance of a plant very early in its development, and probably to predict its performance in other environments as well. More generally, I think phenomic models should be possible that greatly reduce the amount of difficult phenotyping that's needed. But I also think that at the end, this is still just phenotyping. We need good phenotypes to do breeding. But you don't want to select on phenotypes themselves, regardless of how good they are. If you have a pedigree or better genetic marker data, you should use them to turn your good phenotypes into better-estimated breeding values.

Can you provide the link to the survey mentioned during your presentation?
Dr. Fiona McCarthy

Certainly, the qualtrics survey is locatred at: