Phenotypic Prediction Workshop brings first overseas speaker

When Patricio Muñoz and Marcio Resende were still just doctoral students, batting around the idea of a workshop focused on phenotype prediction, they never imagined that hundreds of people from around the world might want to participate.

“Originally, the idea was to make it local only, to help Florida Breeders to understand and finally apply these new methods,” said Muñoz, assistant professor of agronomy. “However, later we realize that many of our colleagues in different countries will never have the chance to listen some of these speakers, so we decide to pursue the online delivery on real time.”

The third Phenotypic Prediction Workshop will be held August 25, in the Cancer and Genetics Research Complex Auditorium 101. Register to attend at here.

Muñoz and Resende, Genetics Institute doctoral graduate, started the workshop as a way to bring members of their field together to discuss their research.

“We realized that this area has a lot of potential, and a lot of people were interested,” Muñoz said. “[We] wanted to bring respected people in the field that other people might never have the chance to hear speak.”

This year is their first time bringing a speaker from across the Atlantic. John Hickey, a quantitative geneticist, is coming from The Roslin Institute in Edinburgh, Scotland.

Also speaking is Mattia Prosperi, associate professor of epidemiology at the University of Florida. His research covers big data mining, biomedical modeling and translational science.

“Big data is not only a huge amount of data,” Prosperi said. “It’s also when you have many variables that have to be considered with interactions.”

While at the University of Manchester, Prosperi applied a machine learning framework for predicting development of asthma and eczema in children. The project analyzed a random selection of 3-year-olds from various hospitals in Manchester, England.

Prosperi’s work analyzed over 1,000 variables using information domains integrated from various sources: demographics, clinical exams such as lung capacity tests, analysis of home environment, allergen sensitivity and genetics (using high-throughput sequencing).

For the latter, Prosperi was provided with curated data sets of over 200 single nucleotide polymorphisms (SNPs). His data mining program compared the capability of different domains -individually and collectively- to explain the variance in asthma and eczema diagnoses.

His results revealed that analyzing genetics alone offered low likelihood of predicting whether a child would develop asthma. Better indicators of development included clinical visits, examining the home for dust mites or other toxins or performing a test for allergens.

“My analysis demonstrated that integration of these domains improves the confidence in diagnosis,” Prosperi said.

Being able to combine this quantity of data and select a powerful, yet interpretable, prediction model is no small task. It involves beginning with less complex models, assessing their reliability, and then increasing in complexity to reach satisfactory performance without over-fitting.

“Also, you need a statistical model that accepts heterogeneous data,” Prosperi said.

“When doing work like this,” Prosperi said, “it is essential to assemble an interdisciplinary team.” Projects like this employ such a broad range of disciplines and methodologies they can predisposition themselves to error, if the team is not equally multi-skilled.

Fostering collaboration is a major goal of the conference.

Muñoz, who is presenting during the conference, studies alfalfa. His talk is titled, “Towards Genomic Selection in Autotetraploids.”

Alfalfa, blueberries and potatoes are all polyploids, which means they have more than two copies of each chromosome. Humans are diploids, which means they only have two.

Scientists began rigorously studying diploids decades before polyploids, which means methodologies of research are much more advanced. The first quantitative genetics book on diploids was published in 1960, while the first for autopolyploids was only published in 2003.

During his talk, Muñoz will be presenting the advancements his team has made regarding accelerating the development of cultivars by improving the methodologies that are used in tetraploid species.

“We are developing the capacity to analyze the data the same way that diploids have been doing,” Muñoz said. “All the analytical tools and methodologies we are developing could be applied to any polyploidy species.”