百灵唱 了,春天来了。
Larks sing, spring arrives
Marmots screech, orchids bloom
Grey storks bawl, the rain broods
Young wolves howl, the moon rises

Mongol song transcribed by
姜戎  (呂嘉民)
Jiang Rong (Lü Jiamin)

Table of Contents


Other Programmes

Bacterial genome annotation

The European Union supports research via grants which permit development of research activities associating several european partners, according to the principle of subsidiarity. The BioSapiens programme has been meant to improve annotation of genomes, in particular annotation of the human genome. The bulk of the activity was devoted to in silico research, but experimental validation was required in some cases. Furthermore, as animals are the result of symbiotic associations with bacteria which evolved since very early times of the development of life, and resulted in the cell's energy factories, the mitochondria, it was essential to annotate bacterial genomes as well. The effort presented here corresponds to grant LSHG CT-2003-503265 meant to improve bacterial genome annotation via experimental validation.

This page aims at informing the general public (in particular members of european countries which steers the European Union) about the ultimate developments of this research.

The life of the BioSapiens network spanned slightly more than five years at a time when huge changes developed in genomics. Our specific work was meant to provide and validate annotation profiles of bacterial genomes, with, as a final end point, the genome of Mycoplasma pneumoniae. Emphasis, in parallel, was to be placed on identification and annotation of unexpected metabolic pathways.

While there was no explicit involvement of phylogenetic studies in the programme, it appeared to us — in a way this is a trivial statement — that evolution being central to understanding life, this feature had to be implemented somehow, at least in bacterial genomes annotations. Indeed most extant annotations derive by inference from annotations obtained experimentally (whatever that means) from model organisms. We therefore set up a general in silico comparison of bacterial genomes in order to identify whether there existed ways to identify ubiquitous functions, and to see whether the corresponding genes displayed a particular organisation.

Our rationale was not the standard approach. We did not look for the elusive minimal genome. Indeed, because there is no one to one relationship between sequence, structure and function we could not use a simple overlap of orthologs shared by different genomes to identify those functions. This led us to create the concept of gene persistence, as a means to trace back at least some of the ubiquitous functions [6].

This effort led us to show that the genome is divided into two major parts, a set of persistent genes that we named the paleome, because the spread of its organisation within bacterial genomes reflects what could be a scenario of the origin of life, and an unlimited number of genes coding for functions permitting life in context, that we named the cenome (after the word biocenose, created by Karl Möbius in 1877 to define all members of a particular ecological niche). These two sets are separated by a large twilight zone, with genes which are involved in remarkable metabolic pathways that are fairly widely spread, but not spread in the majority of bacterial clades. This twilight zone, which we could name (proto)mixome, corresponds to pathways that are often consistently distributed in particular clades, and specific to the clade.


Our work was therefore divided into three parts :

1/ Experimental annotation of unexpected metabolic features

The explicit separation between the cell soma and the genetic program suggests that living organisms can be seen as computers-making-computers [3, 16]. This permitted us to separate between a fairly conserved network of functions and functions that permit cells to live in context. In the former category, the paleome, we tried to apply a type of reasoning common in engineering. If we wish to construct a living machine, what are the essential functions that we should not omit? This approach requires investigation of all central biochemical activities in their fine details. Among those RNA degradation is essential [17]. RNases are of two major types, endonucleases and exonucleases. The former have often sequence or structure specificity, the latter are usually processive enzymes. This has an unwanted consequence: starting with fairly large pieces of RNA, which need to be somehow linked to the enzyme, comes a moment when a small left over, usually shorter than 5 nucleotides, has to be hydrolysed or phosphorylysed. Its binding to the enzyme becomes quite weak, and the result is that a great many very short RNA oligonucleotides ("nanoRNAs") begin to flood the cell cytoplasm. Being short they can enter transcription and replication bubbles and interfere with these essential processes. We inferred, therefore, that the nanoRNA degradation function should be ubiquitous. We showed that the function is coded by orthologs in Escherichia coli and in humans, and that the saying "what is true for E. coli is true for the elephant" can be reversed, in that the E. coli enzyme, exactly as its human counterpart, is extremely sensitive to lithium. We also demonstrated that this enzyme orn in E. coli, REXO2 (SFN) [9].
Yet, we did not find a counterpart of orn in A+T-rich Firmicutes, nor in alpha-proteobacteria. This triggered a search for the corresponding enzymes, as the function, if our line of reasoning is correct, should be ubiquitous. In brief, we uncovered in Bacillus subtilis (and in other Gram positive bacteria, including Mycoplasma pneumoniae) a gene, ytqI, which, when expressed in E. coli could complement a defect in orn. We further showed that this enzyme had also an activity as a 3',5'-adenosine bisphosphate phosphatase, coupling sulfur metabolism to RNA degradation, via completely different phylogenetic pathways, showing convergent evolution of this interesting connection [12]. Work in progress in alpha-proteobacteria (the work will be completed after the closing of the BioSapiens network) shows yet another structure responsible for the same activity!

2/ A metabolic cycle in the twilight zone: acquisitive evolution

In the cenome fraction of all genomes we find a very large number of operons coding for unknown functions. The general features of the corresponding proteome permits us, however, to class these functions in different cell compartments, according to their amino acid content [5, 10 ]. Furthermore, many features suggest that the corresponding genes code for enzymes. At the beginning of our involvement in the BioSapiens network we have completed our work on the methionine salvage pathway, with extension from B. subtilis to several other bacteria and Homo sapiens [2]. We have  uncovered several other pathways involving sulfur which should be explored experimentall in-depth. The case of the interaction between RNA degradation, the degradosome and sulfur metabolism is a case in point [17].

3/ Complete annotation of bacterial genomes

The best way to demonstrate our contribution to genome annotation was to initiate a new genome programme and put in practive our annotation procedure, as initiated in collaboration with Claudine Médigue at the Genoscope in Paris (MaGe platform), while providing the European Bioinformatics Institute with the entirely annotated genome data. Noting that the physico-chemical constraints of cold conditions have not been explored in-depth we sequenced, annotated, and developed some experimental validation of annotation predictions for the genome of the antarctic psychrophile Pseudoalteromonas haloplanktis TAC125, in collaboration with the BioSapiens team of Gunnar von Heijne [8]. This gave us a further handle to understand the amino acid distribution of the bacterial proteomes [5, 10 ].
During this work, while annotating the genome, we discovered that some of the annotation and data we extracted from the model organism B. subtilis were inaccurate. We therefore decided to re-sequence and re-annotate entirely the reference genome of strain 168. This led us to discover a significant amout of errors in the published sequence, and permitted us to provide the international community, via the EBI and reference databases with an up-to-date annotated sequence [15].
This work provided us with further support for the split of bacterial genomes into a paleome, permitting life, and a cenome, permitting life in context (occupation of a particular niche: B. subtilis is clearly an epiphyte, with special relationships with the phylloplane of hay plants). Furthermore Mollicutes are clear derivatives from ancestral A+T-rich Firmicutes, and the annotation of the B. subtilis genome can be used as a secure background for the annotation of M. pneumoniae.



All publications (stars indicate experimental work), with the BioSapiens logo biosapiensare displayed at our bibliography pages
Further work, based on recent experiments is planned to be published later on.

1. A Danchin

The bag or the spindle: the cell factory at the time of system's biology

Microb Cell Fact (2004) 3: 13
2. * A Sekowska, V Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin
Bacterial variations on the methionine salvage pathway
BMC Microbiology (2004) 4: 9
3. A Danchin

Genome diversity: A grammar of microbial genomes
ComPlexUs (2004/2005) 2: 61-70
4. G Fang, C Ho, YW Qiu, V Cubas, Z Yu, C Cabau, F Cheung, I Moszer, A Danchin
Specialized microbial databases for inductive exploration of microbial genome sequences
BMC Genomics (2005) 6: 14
5. G Pascal, C Médigue, A Danchin
Universal biases in protein composition of model prokaryotes
Proteins (2005) 60: 27-35
6. G Fang, EPC Rocha, A Danchin

How essential are non-essential genes?

Mol Biol Evol (2005) 22: 2147-2156
7. J Thornton, A Tramontano, U Roma, A Valencia, S Brunak, S Antonarakis, P Bork, R Casadio, A Danchin, R Durbin, D Frishman, R Guigo, G von Heijne, J van Helden, T Lengauer, M Linial, R Mott, C Orengo, D Jones, L Patthy, L Rychlewski, V Schachter, D Schomburg, E Ukkonen, AL Veuthey, M Vingron, G Vriend, K Nyberg

BioSapiens: a European network for integrated genome annotation
Eur J Hum Genet (2005) 13: 994-997
8. * C Médigue, E Krin, G Pascal, V Barbe, A Bernsel, PN Bertin, F Cheung, S Cruveiller, S D'Amico, A Duilio, G Fang, G Feller, C Ho, S Mangenot, G Marino, J Nilsson, E Parrilli, EPC Rocha, Z Rouy, A Sekowska, ML Tutino, D Vallenet, G von Heijne, A Danchin

Coping with cold: the genome of the versatile marine Antarctica bacterium Pseudoalteromonas haloplanktis TAC125

Genome Res (2005) 15: 1325-1335
9 * U Mechold, V Ogryzko, S Ngo, A Danchin
Oligoribonuclease is a common downstream target of lithium-induced pAp accumulation in Escherichia coli and human cells

Nucleic Acids Res (2006) 34: 2364-2373
10. Pascal, C Médigue, A Danchin

Persistent biases in the amino-acid composition of prokaryotic proteins

Bioessays (2006) 28: 726-738
11. Y Makita, MJL de Hoon, A Danchin

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes

BMC Bioinformatics (2007) 8: 47
12. * U Mechold, G Fang, S Ngo, V Ogryzko, A Danchin
YtqI from Bacillus subtilis has both oligoribonuclease and pAp-phosphatase activity

Nucleic Acids Res (2007) 35: 4552-4561
13. A Danchin

Natural selection and immortality

Biogerontology (2009) 10: 503-516
14. G Fang, EP Rocha, A Danchin
Persistence drives gene clustering in bacterial genomes

BMC Genomics (2008) 9: 4
15. * V Barbe, S Cruveiller, F Kunst, P Lenoble, G Meurice, A Sekowska, D Vallenet, TZ Wang, I Moszer, C Médigue, A Danchin

From a consortium sequence to a unified sequence: The Bacillus subtilis 168 reference genome a decade later
Microbiology (2009) 155: 1758-1775
16. A Danchin

Bacteria as computers making computers

FEMS Microbiol Rev (2009) 33: 3-26
17. A Danchin

A phylogenetic view of bacterial ribonucleases

Prog Nucleic Acid Res Mol Biol (2009) 85: 1-41

Book Chapter

C Médigue, A Danchin
Annotating bacterial genomes (Chapter 4.2)
In: Modern Genome Annotation. The BioSapiens Network (D Frishman, A Valencia, Editors), Springer: Wien NewYork, NY (2009) pp 165-190