Guillaume Louvel

Evolutionary genomics & bioinformatics

Postdoc project on Eukaryogenesis in Laura Eme’s lab, University Paris-Saclay.

Causes for discord in Eukaryotic protein domains inherited from Asgard or TACK archaea

Eukaryotes descend from the fusion of bacterial and archaeal ancestors: one alphaproteobacterium endosymbiont that became the mitochondria, possibly earlier bacterial mergers, and an archaeal parent that donated core components of the genetic machinery. Debates have sparked over the precise relation of these core components with archaeal homologs, implying different views on the nature of the proto-eukaryote vertical ancestor. Earlier analyses either branched Eukaryotes outside of all Archaea, or within it, as sister to Euryarchaeota, Crenarchaota or TACK, the latter conclusions being supported by more complex phylogenetic models able to deal with long-term molecular evolution. Theories on the first eukaryotic common ancestor (FECA) thus depend on novel genomes assembled from metagenomes (MAG) in addition to cultivated ones, and on phylogenomic methods able to recover homology signal from distant relatives. With such distant homologs, the chosen tree building methods must account for protein compositional heterogeneity between lineages in order to avoid artefacts, in particular long branch attraction (LBA). Asgard archaea, a superphylum discovered from metagenomes, appear to represent the closest lineage to eukaryotes.

However, in the sequence-based strategy for inferring species relationships, discord among genes is pervasive: different genes support different trees. This can be due to real incongruence caused by horizontal gene transfer, hybridization and incomplete lineage sorting, or by methodological errors, such as sequence contamination, or hidden patterns of duplications and losses, incorrect ortholog grouping, statistical uncertainty or inadequate sequence evolution model.

In this project we aim to identify causes for incongruence in a set of eukaryotic protein domains originating from either Asgard or TACK archaea, as identified by a previous study 1.

Starting from the identified 248 family profiles of protein domains (Pfam), we built phylogenetic trees across 185 Asgard, TACK and Eukaryote published genomes. We then locate the sister groups of Eukaryote clades and investigate tree characteristics associated with the Asgard or TACK origin. An Asgard origin is inferred twice as often as a TACK origin, and correlates with shorter branch lengths, suggesting that it is less prone to LBA. Other characteristics such as alignment quality metrics and compositional heterogeneity are also currently tested.

By analysing separate protein domains instead of the full protein, we also characterize intra-protein conflicts, and the evolution of protein architectures.

Finally, our dataset tailored for the Eukaryote, Asgard and TACK relationships provides a new selection of sequences to build a species tree while taking into account phylogenetic artefacts.


  1. Vosseberg et al 2021