AG2PI Field Day #30 - Wednesday March 20, 2024
Phenomic Prediction & Bioinformatic Workflows - AG2PI "Coconut" Grant Outcomes
Wednesday March 20, 2024
10:30 AM - 12:00 PM
(US Central Time)
Purpose
Discussion of outcomes from two AG2PI Coconut
seed grant projects as one addresses the use of phenomic prediction and the other develops new bioinformatic workflows.
Registration
Register for the virtual event by clicking the link below. Upon registration, you will receive a confirmation email with information about joining the meeting
Field Day RegistrationField Day Recording
Field day recording is now available, click the button below to launch and watch the recording
Watch Field Day RecordingChat Questions
Questions directed to the speakers placed in the chat can be viewed by clicking the button below
See Chat QuestionsWhen Are Models Too Good to be True? Accurately Evaluating Phenomic Prediction as a Tool for Plant Breeding
High-throughput phenotyping (HTP) describes a rapidly developing and expanding toolkit for plant breeders
to collect large-scale phenotype data on their candidate varieties. HTP tools like Near-infrared
spectroscopy (NIRS) can rapidly and cheaply collect dozens to hundreds of data points per plant.
Many recent studies have shown that these phenomics features can be used by machine learning models
to accurately predict other phenotypes of plants such as grain quality or yield, either earlier
in the season or more cheaply than these target traits can be measured directly. In some cases,
Phenomic Prediction
appears competitive with Genomic Prediction in accuracy, and
therefore has been proposed as a viable tool to increase the accuracy and rate of genetic gain
in breeding. Here, we discuss a key complication with this approach – that directly comparing
Phenomic Prediction accuracy and Genomic Prediction accuracy is not an appropriate way to evaluate
whether Phenomic Prediction is useful for breeding. We show how such comparisons can give the misleading
impression that Phenomic Prediction will increase rates of gain, even in cases when it will not.
We end by discussing ways to appropriately use and evaluate the benefit of Phenomic Prediction
in breeding programs.
Presenter
Dr. Daniel Runcie is an Associate Professor in the Department of Plant Sciences at the University of California Davis. His group studies the genetic basis of plant adaptations to their environments, developing statistical and mechanistic models linking genotype to phenotype in both crop and natural systems. He has developed several statistical software tools for quantitative genetics including GridLMM and MegaLMM that enable genome-scale analyses of large datasets.
Developing Bioinformatics Workflows to Support Agricultural Genomics
Agricultural genotype to phenotype (G2P) applications are hindered by the lack of accessible bioinformatic workflows. Well documented workflows support those new to genomic data analysis, enable G2P integration and allow benchmarking to compare workflows. I will demonstrate how we supported students to develop and provide common genomics workflows as documented, freely available resources for the agricultural research community. I will also provide the initial findings from a discussion of the agricultural community's bioinformatic needs to support genomics analyses.
Presenter
Dr. Fiona McCarthy is a Professor in School of Animal and Comparative Biomedical Sciences at the University of Arizona. She served as a NRSP-8 Bioinformatics Co-coordinator for 12 years and Co-Chaired the writing committee for the new NRSP-8 project.
Chat Questions
Low frequency markers tend not to be very useful because their effects can't be measured accurately due to the low sample size in any experiment. This means that they don't tend to help at all. But they don't tend to hurt much either. In my experience, it doesn't make a lot of difference if you leave them in, except it will slow down the analysis.
Basically for the reason discussed above - you have to wait for the plants to grow. If you can collect your phenomic data on seeds or seedlings, then yes, speed can go up a lot. But in all the successful
phenomic prediction models we've seen for interesting traits like quality or yield, the phenomics traits have been measured close in time to the target trait, which means waiting as long as you'd have to wait for phenotypic selection. So in these applications, the cycle length will remain slow.
(From Audience): Most of the reports are based on NIRS. Very few have used temporal data, but these seem promising.
(Daniel Runcie): Yes, I agree these data are likely useful, but again collecting temporal data requires waiting for most (or all) of the season before you can plug values into your prediction model, so you won't gain in speed.
First question: The 2x was supposed to refer to speed (1/2 cycle length) as an example of a realistic shortening of cycles relative to phenotyping selection. But you're right this won't necessarily translate to gain for various reasons including potentially reduced accuracy or reduced variance. My point was that it's much easier to construct Genomic Selection (GS) schemes with fold-changes in speed than Phenomic Selection (PS) schemes with fold-changes in speed. As discussed above, most successful phenomic prediction models have required phenomics data from late in the field seasons. One use of genomic selection even without seed chipping is speed breeding, where you just grow plants long enough to make crosses without worrying about trying to phenotype them.
(Follow-Up Question): I agree with Dr. Dekkers that genomic selection with 2x gains has been seen in livestock and is transformative when you have to wait for years in sexual maturity to evaluate progeny, same with trees. But in terms of something like strawberries or annual crops, I have not seen an actual breeding program show 2x gain in elite material, have you found reports of this?
(Daniel Runcie): I have not! But I'm not sure the gains that large commercial corn companies are achieving using genomic selection. But even in annual crops, varieties are often screened over multiple field seasons, and sometimes require early-year growouts to increase seed. This means it still takes several years to complete a cycle.
Second question: There is some concern that GS will erode genetic variation. But I think this is not necessarily the case. Some discussion is here: https://doi.org/10.1093/bfgp/elq001, showing that GS is expected to maintain greater genetic variation than pedigree selection. One possible advantage of genomic selection schemes is that you can directly measure genetic relatedness and make selection decisions that explicitly try to maintain diversity which is harder to do with phenotypic/phenomic selection schemes.
(Response from Audience): Re: Second question: It has been awhile since I read this article but I think you are referring to the discussion at the end (e.g. Goddard) I am not aware of anyone implementing these considerations, as it would be more difficult and likely would not choose the top
individuals, are you? In PS, I feel we will likely be selecting more on favorable genomic epistatic combinations (additive and others, which we know are pervasive but hard to measure - see Trudy Mackays review) over pure additive effects that don't show epistasis (which are rare). If this is the case PS would likely maintain genetic diversity better than GS. But this is a hypothesis, and one that would be difficult to test.
(Daniel Runcie): Good points! But that applies mostly to variety release, I think, because these epistatic combinations aren't going to be very helpful for improving the breeding population across generations because they don't contribute to breeding values.
Yes, but that equally applies to genomic data as well. Markers that are not in high LD with causal markers are effectively noise
, and genotyping errors are possible too.
The ecophysiological model would get better measures of GxExM, but I think would not necessarily help estimate breeding values directly. Breeding values average over ExM, so predicting GxExM itself isn't really the right goal for population advancement. You'll still need a genetic inheritance model to get estimated breeding values.
increase heritabilityin the context of phenomic selection, does this refer to decreasing environmental variability?
I'm referring to narrow-sense heritability, the correlation between observed traits and true breeding values. Part of the reason that traits differ from breeding values is environmental variability (say microenvironmental variation among plots with different genotypes across a single field). But non-additive genetic variation among the lines, and GxE across different environments in the TPE also reduces narrow-sense heritability. If we knew true breeding values, we'd use them to train our genomic prediction models. We don't so we have to use trait values instead. But the more correlated those trait values are to the actual breeding values, the better the genomic prediction models will be. This means the higher the narrow-sense heritability, the better the genomic prediction model.
Yes, that's something that I've wondered about when trying to turn phenomics data into breeding decisions. The idea of Phenomic Prediction
is to let a machine learning model figure out what traits are useful. As long as you do the model training/testing carefully, it should work. This means that your training data should be a fair
sample of the whole population you're trying to predict, which in practice means that the correlations between the phenomic traits and with the target traits should be the same between the training and target populations.
Yes, this should be possible. I think this should be a reasonable way to increase the heritability of traits you can't measure in every environment. Having predictions in every environment would allow you to do better genomic (or pedigree-based) selection on the average of these traits (i.e. their genetic values).
- Phenomic data can be more predictive across environments (in part because it captures GxE, epistasis, dominance, in part because it can measure numerous traits correlated with the trait of interest [e.g. indirect selection]) even early in growth, so you may need less training data to develop a robust prediction model than genomic selection?
- Genomic selection is mature and has been around for 20 years with few major additional breakthroughs occurring in the last 10. Phenomic selection has only been attempted by a few groups for ~5 years (grain/product NIRS based, but temporal based <3 years) and so future breakthroughs seem likely (and highly publishable, which is part of the goals for a public program).
I agree that Phenomic Prediction should continue to improve and there is an opportunity for innovation there. I expect that eventually it will be possible to predict the final performance of a plant very early in its development, and probably to predict its performance in other environments as well. More generally, I think phenomic models should be possible that greatly reduce the amount of difficult
phenotyping that's needed. But I also think that at the end, this is still just
phenotyping. We need good phenotypes to do breeding. But you don't want to select on phenotypes themselves, regardless of how good they are. If you have a pedigree or better genetic marker data, you should use them to turn your good phenotypes into better-estimated breeding values.
Certainly, the qualtrics survey is locatred at: https://bit.ly/3Tl7IFO