Mongol song transcribed by
姜戎 (呂嘉民)
Jiang Rong (Lü Jiamin)
Bacterial genome annotation
The European Union supports research via grants which permit development of research activities associating several european partners, according to the principle of subsidiarity. The BioSapiens programme has been meant to improve annotation of genomes, in particular annotation of the human genome. The bulk of the activity was devoted to in silico research, but experimental validation was required in some cases. Furthermore, as animals are the result of symbiotic associations with bacteria which evolved since very early times of the development of life, and resulted in the cell's energy factories, the mitochondria, it was essential to annotate bacterial genomes as well. The effort presented here corresponds to grant LSHG CT-2003-503265 meant to improve bacterial genome annotation via experimental validation.
This page aims at informing the general public (in particular members of european countries which steers the European Union) about the ultimate developments of this research.
The life of the BioSapiens network spanned slightly more than five years at a time when huge changes developed in genomics. Our specific work was meant to provide and validate annotation profiles of bacterial genomes, with, as a final end point, the genome of Mycoplasma pneumoniae. Emphasis, in parallel, was to be placed on identification and annotation of unexpected metabolic pathways.
While there was no explicit involvement of phylogenetic studies in the programme, it appeared to us — in a way this is a trivial statement — that evolution being central to understanding life, this feature had to be implemented somehow, at least in bacterial genomes annotations. Indeed most extant annotations derive by inference from annotations obtained experimentally (whatever that means) from model organisms. We therefore set up a general in silico comparison of bacterial genomes in order to identify whether there existed ways to identify ubiquitous functions, and to see whether the corresponding genes displayed a particular organisation.
Our rationale was not the standard approach. We did not look for the elusive minimal genome. Indeed, because there is no one to one relationship between sequence, structure and function we could not use a simple overlap of orthologs shared by different genomes to identify those functions. This led us to create the concept of gene persistence, as a means to trace back at least some of the ubiquitous functions [6].
This effort led us to show that the genome is divided into two major parts, a set of persistent genes that we named the paleome, because the spread of its organisation within bacterial genomes reflects what could be a scenario of the origin of life, and an unlimited number of genes coding for functions permitting life in context, that we named the cenome (after the word biocenose, created by Karl Möbius in 1877 to define all members of a particular ecological niche). These two sets are separated by a large twilight zone, with genes which are involved in remarkable metabolic pathways that are fairly widely spread, but not spread in the majority of bacterial clades. This twilight zone, which we could name (proto)mixome, corresponds to pathways that are often consistently distributed in particular clades, and specific to the clade.
Our work was therefore divided into three parts :
1/ Experimental annotation of unexpected metabolic features
The explicit separation between the cell soma and the genetic
program suggests that living organisms can be seen as
computers-making-computers [3, 16].
This permitted us to separate between a fairly conserved network of
functions and functions that permit cells to live in context. In the
former category, the paleome, we tried to apply a type of reasoning
common in engineering. If we wish to construct a living machine, what
are the essential functions that we should not omit? This approach
requires investigation of all central biochemical activities in their
fine details. Among those RNA degradation is essential [17].
RNases are of two major types, endonucleases and exonucleases. The
former have often sequence or structure specificity, the latter are
usually processive enzymes. This has an unwanted consequence: starting
with fairly large pieces of RNA, which need to be somehow linked to
the enzyme, comes a moment when a small left over, usually shorter
than 5 nucleotides, has to be hydrolysed or phosphorylysed. Its
binding to the enzyme becomes quite weak, and the result is that a
great many very short RNA oligonucleotides ("nanoRNAs") begin to flood
the cell cytoplasm. Being short they can enter transcription and
replication bubbles and interfere with these essential processes. We
inferred, therefore, that the nanoRNA degradation function should be
ubiquitous. We showed that the function is coded by orthologs in Escherichia
coli and in humans, and that the saying "what is true for E.
coli is true for the elephant" can be reversed, in that the E.
coli enzyme, exactly as its human counterpart, is extremely
sensitive to lithium. We also demonstrated that this enzyme orn
in E. coli, REXO2 (SFN) [9].
Yet, we did not find a counterpart of orn in A+T-rich Firmicutes, nor
in alpha-proteobacteria. This triggered a search for the corresponding
enzymes, as the function, if our line of reasoning is correct, should
be ubiquitous. In brief, we uncovered in Bacillus subtilis
(and in other Gram positive bacteria, including Mycoplasma
pneumoniae) a gene, ytqI, which, when expressed in E.
coli could complement a defect in orn. We further
showed that this enzyme had also an activity as a 3',5'-adenosine
bisphosphate phosphatase, coupling sulfur metabolism to RNA
degradation, via completely different phylogenetic pathways, showing
convergent evolution of this interesting connection [12].
Work in progress in alpha-proteobacteria (the work will be completed
after the closing of the BioSapiens network) shows yet another
structure responsible for the same activity!
In the cenome fraction of all genomes we find a very large number of
operons coding for unknown functions. The general features of the
corresponding proteome permits us, however, to class these functions
in different cell compartments, according to their amino acid content
[5, 10 ]. Furthermore, many
features suggest that the corresponding genes code for enzymes. At the
beginning of our involvement in the BioSapiens network we have
completed our work on the methionine salvage pathway, with extension
from B. subtilis to several other bacteria and Homo
sapiens [2]. We have uncovered several
other pathways involving sulfur which should be explored experimentall
in-depth. The case of the interaction between RNA degradation, the
degradosome and sulfur metabolism is a case in point [17].
The best way to demonstrate our contribution to genome annotation
was to initiate a new genome programme and put in practive our
annotation procedure, as initiated in collaboration with Claudine
Médigue at the Genoscope in Paris (MaGe platform), while providing the
European Bioinformatics Institute with the entirely annotated genome
data. Noting that the physico-chemical constraints of cold conditions
have not been explored in-depth we sequenced, annotated, and developed
some experimental validation of annotation predictions for the genome
of the antarctic psychrophile Pseudoalteromonas haloplanktis
TAC125, in collaboration with the BioSapiens team of Gunnar von Heijne
[8]. This gave us a further handle to understand the
amino acid distribution of the bacterial proteomes [5,
10 ].
During this work, while annotating the genome, we discovered that some
of the annotation and data we extracted from the model organism B.
subtilis were inaccurate. We therefore decided to re-sequence
and re-annotate entirely the reference genome of strain 168. This led
us to discover a significant amout of errors in the published
sequence, and permitted us to provide the international community, via
the EBI and reference databases with an up-to-date annotated sequence
[15].
This work provided us with further support for the split of bacterial
genomes into a paleome, permitting life, and a cenome, permitting life
in context (occupation of a particular niche: B. subtilis is
clearly an epiphyte, with special relationships with the phylloplane
of hay plants). Furthermore Mollicutes are clear derivatives from
ancestral A+T-rich Firmicutes, and the annotation of the B.
subtilis genome can be used as a secure background for the
annotation of M. pneumoniae.
All publications (stars indicate experimental work), with the
BioSapiens logo are
displayed at our bibliography
pages
Further work, based on recent experiments is planned to be published
later on.
1. A Danchin
The bag or the spindle: the cell factory at the time of system's
biology
Microb Cell Fact (2004) 3: 13
2. * A Sekowska, V
Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin
Bacterial variations on the methionine salvage pathway
BMC Microbiology (2004) 4: 9
3. A Danchin
Genome diversity: A grammar of microbial genomes
ComPlexUs (2004/2005) 2: 61-70
4. G Fang, C Ho, YW
Qiu, V Cubas, Z Yu, C Cabau, F Cheung, I Moszer, A Danchin
Specialized microbial databases for inductive exploration of microbial
genome sequences
BMC Genomics (2005) 6: 14
5. G Pascal, C
Médigue, A Danchin
Universal biases in protein composition of model prokaryotes
Proteins (2005) 60: 27-35
6. G Fang, EPC Rocha,
A Danchin
How essential are non-essential genes?
Mol Biol Evol (2005) 22: 2147-2156
7. J Thornton, A Tramontano, U Roma, A
Valencia, S Brunak, S Antonarakis, P Bork, R Casadio, A Danchin, R
Durbin, D Frishman, R Guigo, G von Heijne, J van Helden, T Lengauer,
M Linial, R Mott, C Orengo, D Jones, L Patthy, L Rychlewski, V
Schachter, D Schomburg, E Ukkonen, AL Veuthey, M Vingron, G Vriend,
K Nyberg
BioSapiens: a European network for integrated genome annotation
Eur J Hum Genet (2005) 13: 994-997
8. * C Médigue, E
Krin, G Pascal, V Barbe, A Bernsel, PN Bertin, F Cheung, S
Cruveiller, S D'Amico, A Duilio, G Fang, G Feller, C Ho, S Mangenot,
G Marino, J Nilsson, E Parrilli, EPC Rocha, Z Rouy, A Sekowska, ML
Tutino, D Vallenet, G von Heijne, A Danchin
Coping with cold: the genome of the versatile marine Antarctica
bacterium Pseudoalteromonas haloplanktis TAC125
Genome Res (2005) 15: 1325-1335
9 * U Mechold, V
Ogryzko, S Ngo, A Danchin
Oligoribonuclease is a common downstream target of
lithium-induced pAp accumulation in Escherichia coli and
human cells
Nucleic Acids Res (2006) 34: 2364-2373
10. Pascal, C
Médigue, A Danchin
Persistent biases in the amino-acid composition of prokaryotic
proteins
Bioessays (2006) 28: 726-738
11. Y Makita, MJL de Hoon, A Danchin
Hon-yaku: a biology-driven Bayesian methodology for identifying
translation initiation sites in prokaryotes
BMC Bioinformatics (2007) 8: 47
12. * U Mechold, G
Fang, S Ngo, V Ogryzko, A Danchin
YtqI from Bacillus subtilis has both oligoribonuclease and
pAp-phosphatase activity
Nucleic Acids Res (2007) 35: 4552-4561
13. A Danchin
Natural selection and immortality
Biogerontology (2009) 10: 503-516
14. G Fang, EP Rocha, A Danchin
Persistence drives gene clustering in bacterial genomes
BMC Genomics (2008) 9: 4
15. * V Barbe, S
Cruveiller, F
Kunst, P Lenoble, G Meurice, A Sekowska, D Vallenet, TZ Wang,
I Moszer, C Médigue, A Danchin
From a consortium sequence to a unified sequence: The Bacillus
subtilis 168 reference genome a decade later
Microbiology (2009) 155:
1758-1775
16. A
Danchin
Bacteria as computers making computers
FEMS Microbiol Rev (2009) 33: 3-26
17. A Danchin
A phylogenetic view of bacterial ribonucleases
Prog Nucleic Acid Res Mol Biol (2009) 85: 1-41
Book Chapter
C Médigue, A Danchin
Annotating bacterial genomes (Chapter 4.2)
In: Modern Genome Annotation. The BioSapiens Network (D Frishman, A
Valencia, Editors), Springer: Wien NewYork, NY (2009) pp 165-190