Summary of PhD subject
Translations: frDating within gene trees:speciations, duplications, losses
My PhD work applies the molecular clock concept at the scale of the gene tree. A gene tree does not match exactly a species tree, because the number of functional copies in a genome varies, either by duplication to a new locus, or by loss (pseudogenisation or deletion). These events of gain and loss are frequent and crucial to organismal adaptation, by providing genetic plasticity. Hence I worked on the whole set of gene trees of twenty primate species, and aimed at dating duplications.
By contrast with alignment concatenation, the use of a single gene family enforces a limit on statistical power. This is what we first quantify in performing a control comparing the speciation dates in gene trees with reference ages of the taxa. With this control we select an accurate dating procedure, which optimizes upstream the quality of the alignment. We determine the distribution of the dating accuracy and then associate it with various measurable characteristics on the gene trees and alignments. Our analysis confirms the impact of the alignment length in the accuracy, but also of the heterogeneity of the substitution rates between branches, which is complicated to accommodate by molecular clock models. In concrete terms, our strategy allows us to predict a level of accuracy on new data, and we apply it to the duplication dates.
From this confidence prediction on dating trees with duplications, we select the best quality subset to establish the temporal distribution of duplications along lineages. In addition to the dates we calculate the duplication rates and characterise their variation: indeed it differs substantially between gene trees, with many trees without duplications and a low proportion of trees that duplicate a lot, which can be modeled by a Gamma law. Moreover, the duplication rate varies between organism lineages. We test the phylogenetic correlation between average genomic duplication rate per lineage, and diversification of this lineage.
Finally, the loss of genes involved in the lateralization of the embryo is characteristic of certain vertebrate taxa. We therefore determine by correlation new sequences that are potentially functional in humans, by screening for genes and enhancers showing similar losses.
Thus, after evaluating the appropriate methods for reliable inference, we have characterised the dynamics of gene turnover. This paves the way to understanding the association between these genomic dynamics and the adaptation and diversification of organisms.