Genetics of Bacterial Genomes

History of the Genetics of Bacterial Genomes Unit
(formerly, Regulation of Gene Expression Unit 1986-2000)






Databases Paris

Databases Hong Kong

Research developped at the Genetics of Bacterial Genomes Unit is based on the idea that the unit of life, the cell, be it a bacterium, is not a tiny test tube where genes would be expressed simply as mutual competitors (as would be selfish genes, in particular), but that there exists a coordinated set of interactions between the genes, their products, the architecture of the cell and the dynamics of the cell's processes. The main bacteria studied as references in the Unit are Escherichia coli K12 and Bacillus subtilis 168. They have been chosen because they are the best known unicellular organisms, and, of course, because of the historical idiosyncrasies of the training of the director of the Unit (who has been trained first as a pure mathematician and as a physicist, before he became a geneticist). Also, when other bacteria are studied, they are conveniently related to those models. The main contributions of the work are summarized here.

Briefly, A. Danchin began his life as a biologist by a study of the structure and dynamics of transfer RNA molecules, at a time when, based on the hypothesis of Francis Crick, these "adaptors" were considered to be extremely rigid molecules. To this aim, he used techniques of magnetic resonance (NMR and ESR), using the maganous ion as a probe. This work was performed at the Institut de Biologie Physico-chimique and at the Ecole Polytechnique. This work brought about conceptual results in physical chemistry: it discovered that the main contribution of the manganese ion to the relaxation of water protons is due to relaxation of the spin of its d electron shell, and not, as previously assumed, to movements of the complex or exchange of the water molecule of the coordination shell, and in biochemistry: mainly, that the tRNA molecule is made of two halves moving with respect to each other. During his post-doctoral stay at the Institut Pasteur in H. Buc's laboratory, A. Danchin developped a technique of affinity labelling of the ion binding sites of macromolecules with covalent analogs of magnesium (d6 low spin and d3 transition metal ions). He first applied this affinity labelling technique to nucleic acids, then to the AMP binding site of glycogen phosphorylase b as well as to the ATP binding site of myosin. An interesting observation pertaining to muscle contraction was that the myosin molecule senses changes  at the catalytic (ATP hydrolysis) site, more than 40 nanometers away, at the hinge of the coiled-coil helix of the myosin tail. Perhaps because it is unusual to consider mineral chemistry as important in biology, this technique has not been followed up, in spite of its interest in biochemistry, as well as in pharmaceutic industry (for its potentialities in the development of entirely new drugs). This work is summarized in a list of publications.

The central theme of the work developped at the Regulation of Gene Expression Unit, created at the Pasteur Institute in 1986, consisted in the identification of the regulatory processes involving small metabolic molecules and allowing coordination of gene expression, both in Escherichia coli and in Bacillus subtilis (see Publications 1986-1999). As stated above, the underlying hypothesis was that a genome is not a collection of genes competing with each other but that there is generally, if not always, cooperation between genes to allow for a harmonious development of the cell processes in an ever changing environment. At that time the research undertaken did not make any particular assumption about the causes of the interactions between genes. Mid-2006 work at the Unit that followed the former Unit proposed that the very fact to be, for a cell, will tend to group a certain category of genes together. Briefly, genes involved in provinding an important contribution to survival will avoid being uniformly distributed in genomes, but will cluster together.

First results (1976-1978) demonstrated a special effect of serine and of the one-carbon metabolism, coupled to isoleucine biosynthesis (fully understood only thirty years later, but not yet published) on the coordination between transcription and translation in E. coli. This was followed by identification of several molecules involved in the control of the corresponding processes, such as: ppGpp, 2-ketobutyrate, or cyclic AMP. Several control complexes were found to be involved in the serine effect, in particular 3',5'-cyclic AMP and its receptor CAP, and protein H-NS. Alpha-ketobutyrate was shown to play a role as an integrative signal, acting (perhaps indirectly) on the phophoenolpruvate-dependent carbohydrate transport system (PTS). This molecule is important during a shift from anaerobic growth conditions to adaptation to growth in the presence of oxygen. It could therefore be one of the mediators of the "Pasteur effect". The pathways in which it is involved (apart of course, isoleucine biosynthesis) are not yet fully identified, although it appears that they must be linked to puruvate dehydrogenase and/or its cofactors, thiamine and lipoic acid. In collaboration with A. Ullmann (till 1996), the work at the Regulation of Gene Expression Unit demonstrated that that cyclic AMP not only controls the start of transcription in certain operons (and is therefore one of the mediators of the glucose mediated repression of transcription discovered by Jacques Monod) but also of transcription termination. This permitted the discovery of a new step in the coupling between transcription and translation, unknown until this work, that corresponds to modulation of intercistronic premature transcrition termination mediated by the availability of one-carbon residues carried by tetrahydrofolate (through formylation of the methionine residue carried by the Met-tRNAMetF.

This is, in part, what led scientist in the Unit to try and isolate the cya genes (as well as core pts genes) in E. coli, first, and subsequently from many other organisms (including the TSM085 gene from an eucaryote, the yeast Saccharomyces cerevisiae). This allowed the understanding of some of the structure and regulation of these genes and enzymes, and also to develop their comparative analysis. In particular, in enterobacteria and related organisms, it became clear that several signals, such as the promoter itself, as well as the region encompassing the unusual translation start site, was very conserved, while the length and sequence of the leader mRNA molecule was not. In general it was found that adenylyl cyclases were large or very large proteins, comprising at least two domains, a catalytic domain, coupled to a regulatory domain, often of unknown function. Five totally different classes of such enzymes have been, until now, identified (the last fourth class was discovered in the Unit and published in 1998, and a new class was found later by another laboratory), and this raises an interesting question about the nature of this, presumably convergent, evolution, which would have led life to discover cAMP several times. In enterobacteria and related families such as Vibrio sp., Pasteurella sp., or Aeromonas sp., adenylyl cyclases are composed of two domains, the amino-terminal domain harbouring the catalytic site. Toxic adenylyl cyclases make the second class of such enzymes. They have been isolated from two unrelated pathogens, Bordetella pertussis — the whooping cough etiological agent — and Bacillus anthracis — the agent of anthrax. Both cyclases are activated by the host calmodulin. Their genes have been cloned and expressed in E. coli (in collaboration with A. Ullmann et M. Mock, respectively) using an original technique, using E. coli cells expressing one of the components of the activation system for the screening of a library of the pathogen's genome. Accordingly, this technique can be considered as the first example of the "double hybrid" technique: it has for example been used to clone the cDNA for calmodulin from human or mouse cell lines, by functional complementation of the B. pertussis adenylyl cyclase gene. At that time this triggered an ethical reflection on the use by military powers of these toxins, but with no significant reaction of our colleagues [Pdf]. These toxic enzymes form a group that can be easily identified, and that have, as yet, no other counterparts except in other Bordetella species. They are extremely active enzymes, and the fine detail of their activation and catalytic activity has been studied in collaboration with O. Barzu and their colleagues. The study of the secretion of the pertussis cyclase has been particularly revealing. It uses fusion with a protein, phylogenetically related to that of E. coli hemolysins to transport and secretes the toxic cyclase into the medium (and the host cell) using the complex type I secretion system. For this reason the hybrid adenylyl cyclase-hemolysin protein was named cyclolysin. A third adenylyl cyclases class is ubiquitous. It can be found both in Eukarya and in Bacteria. Its first instances have been isolated from yeast and from very distant bacteria, Rhizobium meliloti and Brevibacterium liquifaciens (collaboration with F. O'Gara, Ireland and E. Peters, UK), then in the differentiated bacteria Stigmatella aurantiaca and Streptomyces coelicolor. This latter class, which comprises the adenylyl and guanylyl cyclases of the higher eucaryotes, predates separation of Bacteria from Eukarya, and this asks very interesting questions about evolution. These questions were among the reasons that led the Unit to get involved in the fascinating adventure of the total sequencing of the compelte genome of an living organism, as will be seen later on. Using a genetic screening procedure, it became possible to make the R. meliloti enzyme catalytic domain to evolve from its natural substrate ATP to a new one, GTP. This permitted identification of a variety of structural features of the active site of the enzyme. Finally, a fourth, enigmatic, class of adenylyl cyclases, was discovered in Aeromonas hydrophila. It has a high temperature and pH optimum and it seems to be related to unknown gene products found in Archaea. It was also found (unpublished) that it is present in Yersinia pestis. This discovery raised the question of convergent evolution in the synthesis of cyclic AMP since it has been found that this protein is highly similar to thiamine diphosphate kinase, a novel enzyme discovered in the Central Nervous System of vertebrates, that seems to be involved in a new process of protein phosphorylation.

Early work by P. Lejeune discovered that hns mutants, in contrast to cya mutants which were resistant, were highly sensitive to serine. The scientists in the Unit developed a thorough study of its product H-NS and discovered several new puzzling properties. First of all it appeared that hns strains are mutators, but in a very unusual way: they produce only deletions, sometimes large deletions. They also discovered that, in addition to be a negative regulator of gene expression (such was the case of the involvement of hns in bacterial virulence), H-NS could act as a positive effector of gene expression. This led the scientist of the Unit to study the global effects of H-NS (especially with the two-dimensional gel electrophoresis technique), and to investigate its action on specific regulons. In parallel a phylogenetic study was initiated to gain insight from the action of the cognate gene in other organisms. The general effects of H-NS need now to be re-investigated, in particular in relation with the idea that the chromosome of bacteria is a highly organised structure, and using again its involvement in the serine effect. Motility and/or flagellum biosynthesis have been frequently associated with virulence in various microorganisms. In enterobacteria, this process requires the expression of numerous genes scattered on the chromosome and organised in an ordered cascade. The fliC mRNA coding for flagellin and the FliC protein itself are absent in an hns mutant, which results in a loss of motility. Moreover, using transcriptional fusions, it was shown that an hns mutation results in a decreased expression of flhDC, the master operon which controls all other flagellar genes. This was the first example of positive control by H-NS so far described. To know whether a similar mechanism of flhDC regulation could be extrapolated to other organisms, the promoter region of an homologous operon identified in Photorhabdus luminescens was analysed, using a method allowing direct determination of the  nucleotide sequence from genomic DNA. This demonstrated the presence of a cAMP/CAP binding site and of a non-translated region and suggested that, in this organism, the mechanism of flhDC regulation could be similar to that in E. coli.

The pleiotropic effect of the hns mutation led the H-NS group to analyse the role of H-NS on bacterial physiology using large scale esxpression profiling. In collaboration with J.P. LeCaer (Laboratoire de Neurobiologie et Diversité Cellulaire, ESPCI, Paris), it was demonstrated that the synthesis and/or the accumulation of about 60 proteins was specifically altered in an hns mutant on two-dimensional gel). Many of them were identified by microsequencing or by mass spectrometry. They are found to be involved in bacterial response to various stresses (pH, osmolarity, ...). Moreover, to study the global effect of H-NS on gene expression in E. coli, we analysed, in collaboration with A. Malpertuy (Unité de Génétique Moléculaire des Levures), the transcriptome of an hns strain using DNA arrays. These experiments showed that the expression level of 200 genes was modified in a mutant strain. Again, most of them are known to be involved in stress response. In particular, the high expression level of several genes induced by high osmolarity or low pH resulted in a strong increased resistance to both stresses in the hns strain. Moreover, many H-NS target genes with unknown function were predicted to encode fimbriae which could play a major role in virulence processes. These observations provide evidence that an hns mutation cannot be simply considered as a loss of function but can provide a selective advantage to the cell with respect to some stressful conditions. The main conlculsion stemming from this work is that hns controls proton availability in the periplasm of many Gram negative bacteria. Until recently, H-NS had been only characterised in enterobacteria. A phylogeny study was undertaken allowing characterisation of the general features of the H-NS protein. This work provided the first demonstration that proteins structurally and functionnally related to H-NS are widespread in Gram-negative bacteria). Moreover, by complementation of the serine susceptibility of hns mutants in E. coli, we recently isolated and characterised an hns-like gene in Vibrio cholerae, the agent of cholera. Similarily, in collaboration with P. Glaser (Laboratoire de Génomique des Microorganismes Pathogènes), we identified two H-NS-like proteins in P. luminescens, an entomopathogenic bacterium whose genome sequencing is currently in progress at the Pasteur Institute. These results further supports the existence of a large family of H-NS-like proteins in microorganisms. Future work will explore the functions associated to H-NS, when it exists, in psychophilic bacteria.

The general study of collective regulation of the expression of groups of genes led A. Danchin in 1986, to investigate the feasability of the sequencing of whole bacterial genomes, to use the knowledge of the sequence to analyse globally the self-consistency of genes organisation and functions in a whole genome. It appeared immediately that a project of such an importance would require a no less important investement in terms of computer sciences approaches. In particular it appeared necessary to develop a trend in computer sciences that did not yet exist at a large scale, and that could be necessary to deal with acquisition, analysis and management of the large quantity of data generated by the genome programs (in silico analysis). A similar interest was expressed a year later by Raymond Dedonder (then Director of the Institut Pasteur de Paris), and this led, in a collaborative effort, to the creation of the Bacillus subtilis genome program. Between 1988 and 1997, under Philippe Glaser (recruited for this endeavour in the Unit), a partially automated laboratory was organised in the Unit, where a large segment of the B. subtilis chromosome DNA was sequenced and annotated. One should note here that the cloning procedure for sequencing A+T-rich Gram positive bacteria is not a trivial task, and required the construction of special E. coli strains. The general program was finally set up, in a collaborative effort under F. Kunst for Europe, and N. Ogasawara and H. Yoshikawa for Japan as coordinators. The scientific coordination of annotation and management of the genome data was performed by the Unit, and in particular by I. Moszer, who constructed the reference database, SubtiList, regularly maintained and curated in the Unit.

In parallel, in collaboration with Alain Hénaut, then at the University of Versailles Saint-Quentin and Alain Viari then at the University Paris VI (Atelier de BioInformatique) and their colleagues, the Unit created a research consortium aiming at developping research in computer sciences devoted to the study of genomes. This collaboration was actualised with the creation, among other structures structures, of a Rsearch Group of the Centre National de la Recherche Scientifique, named the GDR 1029 Groupement de Recherche, Génomes et Informatique, grouping together some fifty scientists. The GDR 1029 was headed by A. Danchin and F. Rechenmann (INRIA, Grenoble) from the beginning of 1992 till the end of 1995. This structure permitted the development of studies that validated some of the concepts of Artificial Intelligence in the analysis of nucleic acids and protein sequences, to develop new methods for genome analysis and to create a general platform for genome annotation, ImaGeneTM. Among the studies that were inspired by approaches in AI an interesting result was obtained for the generation of descriptors of secretion signal peptides in E. coli. This platform was used to predict errors in the genome of B. subtilis, and the sequencing consortium resequenced dubious regions, resulting in a high-quality corrected sequence . In another study, the kinship between biosynthesis of tryptophan and cysteine indicated that studies should be undertaken to take into account metabolic pathways (metabolism reconstruction) in the applications of computer sciences to genomics. This is a strong incentive for a large investment in time and efforts for the annotation of genomes and management of the associated biological knowledge. In particular it soon appeared that about one half of the genes uncovered in the genome programs did not have a counterpart of known function in data libraries. To illustrate this point a thorough analysis of the sulfur and polyamine metabolism in E. coli and B. subtilis has been undertaken in the Unit, resulting in identification of wrong assignments in data libraries. This demonstrated that a much larger fraction of genetics than previously thought has still to be explored. This also demonstrates that in silico analysis should always be coupled to in vivo and in vitro experiments to validate the predictions made in silico.

Finally, an important feature of the activity of the director of the Unit is in epistemology and history of sciences (including ethics and the spreading of scientific concepts). It should be stressed at this point that, because these aspects of human knowledge are deeply rooted into each type of civilisation, an important fraction of their deep meaning can only be expressed in languages other than English (especially in its americanised version). The interested reader is therefore prompted to look into the summary of this activity (in French...: cf. Antoine Danchin). This work has been summarised in many publications (mostly in French and in Italian) and in four books: a first one (Ordre et Dynamique du Vivant, 1978) dealt with the basic concepts of molecular biology, the second one was centered on the concept of the genetic code, L'Oeuf et la Poule (The Chicken and the Hen), 1983; a third book presented a general view of the main theories about the origin of life, trying to relate the past with the present day knowledge about metabolism, accumulating with the advent of genome programs (Une Aurore de Pierres (see the concept of homeotopy for a summary). Finally a fourth book,The Delphic Boat (2003, Harvard University Press) endeavours to reinvestigate the concept of information and to link it with the abstract meaning of genomes. Among the main conclusions of this work is the observation that living organisms can be both deterministic and unpredictable. The alphabetic metaphor, that underlies the conceptual description of what are genomes allows us to understand this apparently paradoxical behaviour of life and gives us a positive view of evolution. Finally A. D. tries to develop certain aspects of the Greek philosophical tradition that insists more on the quest for explanations, rather than the anglo-saxon radical empiricism, which insist more on the collection of data. The adventure of the creation of the HKU-Pasteur Research Centre demonstrated that the Chinese way is another approach, that is complementary to the greco-latin hypothesis-driven exploration of reality and the anglo-american data-driven approach that provides the wealth of data that we will be keeping mining for decades...