Summary 1995-2000 of GBG lab activity

History

Root Map

Research activity 2000-2005

In year 2000, after 14 years, the Unit Regulation of Gene Expression terminated its activity mostly devoted to the Bacillus subtilis genome programme. In april 2000 the Director of the Unit went to Hong Kong (China) to create the HKU-Pasteur Research Centre Ltd, a joint venture between the University of Hong Kong and the Institut Pasteur. At the end of that year a new Unit was created in Paris: Genetics of Bacterial Genomes centered on research focused on functional genomics of bacteria of general interest.

Summary of the 1995-2000 activity

Created in 1986, the Regulation of Gene Expression Unit analysed the nature of heredity, which stores the information required to generate life, and focuses on the determination of gene functions in reference genomes, coupling prediction with computers (experiments in silico) to experiments in vivo. The scientists in the Unit investigate how the thousands of genes in the chromosome of a cell co-operate in an organised manner in an ever-changing environment. Their studies have been guided both by the results of experiments in vivo, which allowed the scientists to identify gene placed higher and higher in the hierarchy of genetic controls of the cell life, and by the spectacular progresses of molecular biology. Two reference micro-organisms are used: Escherichia coli, the most long-standing genetic model; and Bacillus subtilis, a source of numerous enzymes used by industry, often found on the surface of leaves, and abundant in soil. The studies developed in the Unit identify the genes which are critical to the overall adaptation of the bacterium to its environment, and are particularly investigating the metabolism of molecules essential for the cell's construction, that of sulfur and polyamines. Two-dimension gel electrophoresis of all the proteins in the bacteria is used to describe the co-variations in the concentrations of particular proteins as a function of the growth status of the bacteria, their environment and changes in the genes studied. Associated to the analysis of the whole set of transcripts (expression profiling) together with analysis of the genome sequences, typical of the new field of research now called "genomics", this approach provides a wealth of information. It has shown that there are large groups of genes within the cell that are regulated in the same way. A mass of information is generated by sequencing genomes, and many of the newly identified genes are enigmatic in nature. To contribute to their understanding, molecular genetic studies in the Unit are being complemented by research involving the most up-to-date techniques in computer data management, statistics and mathematics.
Two specialized databases have been constructed in collaboration with the University Paris 6 (Atelier de Bioinformatique) and the University of Versailles (they are available on the World-Wide Web: SubtiList, and Indigo). Biological and naturalist aspects of the work are being emphasised, to identify the major functions of the living organisms. In particular, the first analyses of the genomes has led to a remarkable observation: the order of genes on the chromosome relates to the cell's architecture. Indeed, the gene order in genomes is not random, and there are experimental hints suggesting that the map of the cell may be directly related to the chromosome structure. The first results of the in vivo, in vitro and in silico investigations aiming at understanding the selection pressure that underlies these architectural constraints suggest the systematic existence of supra-macromolecular complexes. Their components have their genes distributed in a non-uniform way along the chromosome, and they probably constitute structures of 10 to 50 nanometers that form the core of the cell's organization.

A genome view of the coordination of gene expression

1. The Bacillus subtilis genome sequence (P. Glaser, MF Hullo, with several students for variable periods of time)

In 1996, the very short genomes of two bacteria had been published by TIGR, and the yeast genome sequence was about to be completed. Started almost ten years earlier, the B. subtilis genome program was well on its way, and a BIOTECH grant from the European Union was supporting a consortium of European laboratories for completing the sequence, expected to be finished in the end of year 1998. The Japanese consortium was also well on the way. However, it appeared that it would be important to speed up our efforts to be present on the international scene at a moment when many laboratories began to be interested in the outcome of genome programs. Together with Frank Kunst, the European coordinator of the program, we decided to speed up the procedure, by involving laboratories which had been part of the yeast genome program in the sequencing effort. This was made somewhat difficult because many regions of B. subtilis DNA, as with all A+T rich Gram positives, are impossible to clone in standard E. coli recipients. We therefore combined standard cloning procedures in a special E. coli strain constructed for this purpose (TP611), cloning into B. subtilis itself, and Long Range PCR (without cloning) for the most difficult regions. This permitted us to possess the complete genome sequence in april 1997, well before the time expected, and to distribute it to the members of the consortium. In addition, we chose to distribute to the yeast teams some regions where we suspected the presence of errors, so that they would be sequenced again, ending with an excellent accuracy (of course, this was not said to the relevant sequencing groups, to avoid useless conflicts inside an exemplary collaborative effort). The complete sequence was presented at the International Bacillus Meeting in Lausanne mid-july 1997, and the sequence was made public in parallel with its publication in november of that same year.

2. Data bases and genome annotation platforms (Maude Klaerr-Blanchard, Claudine Médigue, Ivan Moszer, Eduardo Rocha, in collaboration with Louis Jones at the Service Informatique Scientifique, and several laboratories external to the Institut Pasteur)

Derived from its prototype,Colibri, for the E. coli genome, the sequence and annotation is displayed in the relational database SubtiList, which meets several thousands queries per day, more than two years after the sequence has been published.
Annotating a genome is a never ending process. Indeed, SubtiList is regularly updated, and the last update, just after the Genome 2000 International Meeting at the Institut Pasteur in April 2000, provided identification for several hundreds new genes. To prevent misannotation and propagating errors, we have assigned a special code name to all genes which have not been explicitely identified by their function (i.e. experimentally, in vivo or in vitro). In agreement with Amos Bairoch (SwissProt), we chose for these gene names that they all begin with a "y" letter. The code we have used follows as closely as possible Demerec's rule for gene nomenclature, despite much discussion from the community of B. subtilis scientists who often stick to old names, often without other reasons than purely anecdotal. We think that harmonizing nomenclature is very important for the future of the genetics of genomes.
Careful annotation asked for an elaborate approach in terms of computer sciences. In collaboration with Alain Hénaut and Jean-Loup Risler from the Université de Versailles Saint-Quentin, François Rechenmann from INRIA Rhône-Alpes and Alain Viari from the Atelier de BioInformatique at the University Paris 6, the Unit created an original strategy allowing genome annotation in silico. In this strategy the concept of "neighbourhood" has been favoured as a way to help discovery.
This strategy developed a succession of three relatively independent levels. Each level comprised as generic, and a specific level. The goal for the creation of the process was conceptual. It aimed at the prediction of essential biological functions using the genomic text, together with the associated biological knowledge distributed in scientific publications and data libraries. The precise goal was to identify crucial experiments (to be performed in "wet" laboratories), or to falsify the prediction. They were illustrated (see below) in the case of polyamine metabolism.

The three levels of the process were:
o 1. sequence data and annotation management: SubtiList and Colibri
o 2. a platform for sequence annotation: Imagene
o 3. a platform as a help to discover (technique of "neigborhoods"): Indigo

Each level was made of a generic computer software engine, together with specific data. The aim of the process was to define a set of three coupled software engines. Each specific application of the process gave as many valuable results as sequences, annotations or predictions which are created.
1. SubtiList is constructed from an engine for the management of genomic databases. It is composed of three parts:
1.1. A data scheme structure, GenoList;
1.2. A data base management system (4th Dimension for stand alone applications and Sybase for the WWW database);
1.3. An user interface, eventually with specific procedures for data exploitation (e.g. Blast, Fasta, and other rapid methods for sequence analysis). The interface can be reconstructed knowing simply the World-Wide Web access to it, but this can only be done properly knowing 1.1. This explains why, to our knowledge, there are not yet equivalent bacterial specialized databases.

To construct a specialized database (SubtiList, Colibri, TubercuList, and recently PyloriGene) it was necessary to introduce sequence data and their annotations in the GenoList engine. This input required a generic procedure. The value of the specialized databases comes from two sides, on the one hand from the genericity of the GenoList engine and of its user-friendly and biologically-oriented construction, and on the other hand from the quality of the sequences (above all, of their annotation). This value diminishes (respectively increases) as time elapses if annotation are not (respectively, are) curated. Curation of a set of annotations allows an important appreciation, in parallel with the creation of a know-how that is extremely difficult to reproduce. Our Unit is curating the B. subtilis annotations.

2. Imagene is a generic engine which allows management and strategic organisation of both biological objects (sequences, annotations, images, …) and methods for analysis or management, within the same platform. It is meant to make an in-depth analysis of genomes locally, for a fine description of their properties. It has been validated on the specific example of B. subtilis, by permitting identification of all its coding sequences and regions of transcription termination. It was used to predict regions that carried errors due to the sequencing process. These regions were PCRed out of the chromosome and resequenced by an independent team.
2.1. The platform was constructed in such a way as to allow one to plug-in easily any methods for genome analysis (even methods for which the source code is not available, or methods located far away but available through the Internet).;
2.2. It allows the chaining of methods, the definition of strategies, and, if needed, the ability to go reverse during the chaining of methods;
2.3. It possesses a generic visual interface (APIC) permitting one to start and control the progress of methods and of their results. APIC allows one to superimpose the results of entirely independent methods on the same screen. It permits direct access to their results.

Two special features give added value to Imagene as time elapses. On the one hand the data can be organised in such a way as to construct efficient specalised strategies for genome annotation. On the other hand, the number of methods for analysis that are plugged-in can increase without limitation. If they are accessed through the Internet the engine will know how to start them and recover their results. It will also know how to integrate them into its strategies. As a consequence the rational integration into new strategies of old and new methods will increase with time and use. We can notice, among the methods, the special case of data managenement: it is therefore quite possible to think about plugging-in to Imagene specialised databases. This will be seen by Imagene as a special task, data management. One also can think about creating relationships between sequence and annotation data (neighbourhoods, see Indigo, next paragraph).

3. Indigo is a prototype platform used as a help for discovery, meant to find difficult to predict neighbours between the various functions related to genes (P. Nitschké, C. Hénaut, in collaboration with P. Guerdoux-Jamet and A. Hénaut).
Indigo is organized in a simple way around a hierarchy of flat files, all centered around gene names, and corresponding to homogenous classes of features (such as codon usage bias, proximity in the chromosome, in metabolism, in isoelectric point of the gene products, in functional class, in literature articles, etc). It is clear that many other types of neighbourhoods should be considered as well, including quite elaborate ones. As an immediate goal for the improvement of Indigo one must create a data structure for the neighbourhood relationships. The published prototype is only meant to demonstrate the feasability of the approach. It illustrates the possibility to make interesting discoveries, even with the limited means allocated at present. Indigo is superficially organised as is GenoList. It possesses an engine (written in Java), that overlaps with the user interface. It is applied to specific data (at the present time E. coli, B. subtilis and Arabidopsis thaliana). One must therefore note that, even more than in the case of specialised databases, the value of a specialised Indigo is directly linked to the quality of the data included. The corresponding information results from annotation steps (statistical analysis for example, that could of course be produced by a strategy included in Imagene), but also from the extraction, at this time manual, of literature neighbours. The creation of appropriate files of this type could rapidly acquire a great value (if they are not publicly available). Finally, because Indigo is a method, it can be, in principle, plugged-in to Imagene.
This set of coordinated approaches has been used to set up an international network financed by the European Science Foundation.

3. The map of the cell is in the chromosome (I. Moszer, E. Rocha, in collaboration with A. Hénaut and A. Viari)

Knowledge of whole genome sequences is a unique opportunity to study the relationships between gene and gene products at the global level of the cell's architecture. Part of the difficulty of this study comes from the fact that — contrary to a generally accepted intuitive idea — there is often no predictable link between structure and function in biological objects. However, as the outcome of natural selection pressure, there must exist some fitness between gene, gene products and the survival of the organism. This indicates that observing biases in features which would conceptually be thought of as to be unbiased, is the hallmark of some selection pressure.
This prompted us to study global properties of complete genomes. A first analysis on the word content of genome texts suggested that they are not all managed in the same way. We therefore concentrated on long exact repeats, and discovered that, in contrast to what could be expected, the shortest genomes (the Mycoplasmas) had the highest repeat frequency. Also, genomes of comparable sizes such as those of E. coli and B. subtilis have an entirely different way to manage repeats. They are present everywhere in the former genome, while they are very rare, and in close proximity (ca 10 kb) in the latter. In constrast, when we studied the distribution of words, bases or codons in the leading strand as compared to the lagging strand, we made an extremely surprising discovery. There is such a strong bias in one strand as compared to the other (the leading strand is G+T-rich, while the lagging strand is A+C-rich), that the bias is reflected in the amino-acid composition of the proteins encoded by each strand (valine-rich for the leading strand, isoleucine+threonine-rich for the lagging strand)! This bias is not present in all genomes (it seems to be absent from genomes of bacteria having an important proportion of membranes, such as the methanogens or the cyanobacteria), but, when present, it is universally the same.
Among other consequences, all these observations tell us that genes do not move as frequently, or as easily as it is often implicitely assumed. There must exist, therefore, constraints in the gene organisation of a chromosome.
Because the genetic code is redundant, coding sequences can be studied by analysing their codon usage. If there were no bias, all codons for a given amino-acid should be used more or less equally. In contrast, it has long been observed in E. coli that genes could be split into three classes according to the way they use codons. The same was true for B. subtilis. Yet, random mutations should somehow smooth out differences. This is not the case: indeed, for leucine, where six codons are used, we find that the CUG codon is used more than 70% of the cases in genes that are expressed at a high level during exponential growth conditions, while CUA is expressed in less than 2% of the cases. What is the source of such biases? There might exist a systematic effect of context, some DNA sequences being favoured or selected against. While this could be true for some codons, this cannot be generalized. We know that translation of mRNA into proteins requires the action of transfer RNA adaptor molecules. Because there is less tRNAs specific for a given amino-acid than the number of codons, some tRNAs must read several codons. A bias in the concentration of tRNAs might thus result in a bias in codon usage. Therefore we must analyse selection pressure occuring at the level of tRNA synthesis. This is the generally accepted reason to account for the codon usage biases. Unfortunately, two reasons go against this interpretation. Firstly, in much the same way as that there would be all reasons to smooth out biases in codon usage, similar constraints would smooth out biases in tRNA synthesis. For example if a tRNA gene had a strong promoter, spontaneous mutations would tend to lower its efficiency, making transcription of this particular tRNA similar to its other counterparts. This is true, unless there is selection pressure for the converse. The second reason is that, while explanation for the strong bias in a given class of genes could be explained in this way, the same explanation cannot hold for a strong bias in another class of genes. However we know, both from the study of the E. coli and B. subtilis genomes, that two classes of genes display extremely strong, but different biases. And a same tRNA molecule cannot be both expressed at a high level, and not expressed at a high level…
This requires looking for another explanation. The cytoplasm of a cell is not a tiny test tube. One of the most puzzling feature of the organisation of the cell cytoplasm is that it must accomodate the presence of a very long thread molecule, DNA, and that this molecule must be transcribed as a multitude of RNA threads that usually have a length of the same order of magnitude as the length of the whole cell. This asks for some organisation of transcription, translation and replication so that mRNA molecules and DNA are not mixed up together all the time. The volume occupied by a ribosome is a cube with an 200 Å edge. In an E. coli cell growing exponentially in a rich medium there are at least 15,000 ribosomes. Thus, the fraction of the cell volume occupied by ribosomes is at least 12 %. The actual volume of the cell free of ribosomes is in fact significantly smaller if one takes into account the volume occupied by the chromosome and by the transcription and the replication machineries. If one now counts that the translation machinery asks for an appropriate pool of elongation factors, tRNA synthetases and tRNAs, it becomes clear that the cytoplasm behaves like a gel. In addition, simply counting the number of tRNA molecules sitting around a ribosome, it appears that one cannot speak about the concentration of such molecules, but only about a small, finite number. Compartmentalisation has been demonstrated to be important even for small molecules, despite the fact that they could diffuse quickly. As a consequence, a translating ribosome acts as an attractor of a certain pool of tRNA molecules. In such a case diffusion should only be considered locally. The cytoplasm becomes therefore a ribosome lattice, displaying relatively slow movements with respect to local diffusion of small molecules as well as macromolecules. This provides an efficient selection pressure leading to adaptation of the codon usage of the translated message as a function of its position in the cell's cytoplasm. If the codon usage changes from mRNA to mRNA, this indicates that these different molecules do not see the same ribosomes in the usual life cycle of the organism. In particular if two genes have very different codon usage this indicates that the corresponding mRNAs are not made from the same part of the cell (it is indeed difficult to see how ribosomes sitting next to each other could attract different tRNA molecules).
Several models of transcription account for a process where the transcribed regions are present at the surface of the chromoid, so that RNA polymerase does not have to circle the double helix it is unwinding and transcribing. Thus mRNA threads, usually structured at their 5' end, are pulled off DNA by the lattice of ribosomes, going from one ribosome to the next one, as does a thread in a wiredrawing machine (this is exactly the opposite view of textbooks translation, where ribosomes are supposed to travel along fixed mRNA molecules). In this process a nascent protein is synthesized on each ribosome, spread throughout the cytoplasm by the linear diffusion of the mRNA molecule from one ribosome to the next one, avoiding the requirement for the much slower 3D diffusion of the protein. Polycistronic operons ensure that proteins with related functions are co-expressed locally, permitting channelling of the corresponding substrates and products. It seems likely that the structure of mRNA molecules is coupled to their fate in the cell, and to their function in compartmentalisation. The fate of mRNA is therefore an important feature of gene regulation. We have therefore investigated the degradation process of mRNAs, comparing data extracted from the genomes of B. subtilis and E. coli. This led us to identify a main function of the elusive enzyme polynucleotide phosphorylase, as producing CDP needed for DNA synthesis, thus coupling translation, transcriptiona and replication together. If we consider genes translated sequentially in operons as physiologically and structurally relevant, we should also analyse mRNAs that are translated parallel to each other. Indeed if there is correlation of function and/or localisation in one dimension, there should also exist a similar constraint in the orthogonal directions. How would this be seen? This is where codon usage comes again. Indeed if ribosomes act as attractors of tRNA molecules, this implies a local coupling between these molecules and the codons they can use in the message they read. Obviously, this requires that the same ribosome mostly translates mRNAs having similar codon usage. This has the consequence that as one goes away from a strongly biased ribosome, there is less and less availability of the most biased tRNAs. In turn, there would be selection pressure for a gradient of codon usage bias as one goes away from the most biased messages and ribosomes. Transcripts are nested around central core(s), formed of transcripts for highly biased genes. This fits with what is seen of the general organisation of genes in the chromosome. In particular this agrees with the observation that the distance between E. coli genes oriented in the same direction on the chromosome is positively correlated to the expression level of the downstream gene.
Finally, the chromosomes must separate from each other and migrate in each of the daughter cells. There must exist some kind of repulsive force that pushes DNA strands away from each other. While there are probably gene products involved in this process, ribosome synthesis, in particular from regions near the origin of replication, performs exactly what is needed, by continuously creating new ribosomes. Continuous synthesis of ribosomes in between the replicating forks would also provide a mechanical stress on the bacterial wall in the middle of the cell. Koch has convincingly argued that the bacterial wall is indeed a stress-bearing fabric. If ribosome sources are organisers of the cell, mRNA for genes highly expressed under exponential growth conditions should be located near the center of these organisers, while other mRNAs should be translated in nested layers, all the way to the ribosomes that are located near the cytoplasmic membrane, and that would be involved in cotranslational membrane protein localisation. Organisation of the genes in the chromosome should therefore show regularities that are linked to this architecture, as we have indeed observed. This gives us strong reasons to propose that genes along the chromosome specify the map of the cell, a kind of celluloculus.

A geneticist's view: master genes and intermediary metabolism

1. Cyclic AMP and adenylate cyclases: the discovery of a fourth cyclase class (M.-P. Coudart-Cavalli, P. Trotot, P. Biville, O. Sismeiro)

Cyclic AMP is a mediator of catabolite repression in bacteria. Curiously, despite the interest for this important process, not much was known on the rather elusive enzymes, adenylate cyclases, which make this molecule from ATP. In 1996, the work in the Unit had already discovered three main classes of these enzymes, which were apparently unrelated phylogenetically. Very remarkably, this work demonstrated that Gram negative bacteria could differ in the nature of the adenylate cyclase they harboured: enterobacteria had one type, while myxobacteria, or rhizobia had another type (a more ancestral form, presumably, since it is phylogenetically similar to the enzymes found in Eukarya). In the course of a screening for adenylate cyclases in bacteria related to enterobacteria, but differing from them, we made the surprizing discovery that A. hydrophila harboured a fourth adenylate cyclase type, an enzyme much related to proteins found in Archaea. This protein was found in all species of A. hydrophila investigated, but not in other Aeromonas sp. The counterpart of the gene was found in the Y. pestis genome, and shown to express adenylate cyclase activity (unpublished). The reason for this extraordinary variety in adenylate cyclases in not known.

2. Global analysis of the H-NS protein function (P. Bertin, F. Hommais, O. Soutourina, C. Tendeng and several trainees)

To study the global regulation of bacterial metabolism, in particular in pathogenic microorganisms, we used the hns mutation in Escherichia coli as a reference system. Indeed, the H-NS protein is known to be involved in numerous fonctions in the cell and to affect the expression of genes regulated by environmental factors (temperature, osmolarity, ...). Three main topics have been developped since 1996.
Motility and/or flagellum biosynthesis have been frequently associated with virulence in various microorganisms. In enterobacteria, this process requires the expression of numerous genes scattered on the chromosome and organised in an ordered cascade. The fliC mRNA coding for flagellin and the FliC protein itself are absent in an hns mutant, which results in a loss of motility. Moreover, using transcriptional fusions, we showed that an hns mutation results in a 3-fold decreased expression of flhDC, the master operon which controls all other flagellar genes. This was the first example of positive control by H-NS so far described. Similar observations were made in a crp mutant, providing evidence that, like H-NS, the cAMP/CAP complex plays a role of activator on flagellar gene expression. To know whether these regulators could affect flhDC expression by interacting with its promoter, we performed gel shift experiments using purified proteins. The results demonstrated that the flhDC promoter region is preferentially retarded in the presence of H-NS or CAP. Moreover, DNAse footprinting experiments allowed us to determine precisely their binding sites on the flhDC regulatory region. In vitro transcription assays were performed in collaboration with S. Rimsky and A. Kolb (Unité de Physico-Chimie des Macromolécules Biologiques). Surprisingly, H-NS seems to repress flhDCtranscription while the cAMP/CAP complex activates its expression. Finally, in a crp mutant, motility is restored in the presence of wild-type CAP protein but not in the presence of protein mutated in region I involved in the interaction with RNA polymerase. This suggests that the cAMP/CAP complex positively regulates flagellum synthesis by a direct interaction with the C-terminal part of the RNA polymerase a subunit. In contrast, the binding of H-NS to the same region cannot explain its positive control observed in vivo on flagellum synthesis. In this respect, the existence of a long non-coding region between the +1 transcriptional start site and the ATG translational codon seems to play a crucial role in the control of the master operon by H-NS. Finally, to know whether a similar mechanism of flhDC regulation could be extrapolated to other organisms, we analysed the promoter region of an homologous operon recently identified in Photorhabdus luminescens, using a method allowing direct determination of the nucleotide sequence from genomic DNA. Our results demonstrated the presence of a cAMP/CAP binding site and of a non-translated region (unpublished observations). This suggests that, in this organism, the mechanism of flhDC regulation could be similar to that in E. coli.
The pleiotropic effect of the hns mutation led us to analyse the role of H-NS on bacterial physiology using large scale technologies. In collaboration with C. Laurent-Winter (Laboratoire de Physico-Chimie des Macromolécules) and J.P. LeCaer (Laboratoire de Neurobiologie et Diversité Cellulaire, ESPCI, Paris), we demonstrated that the synthesis and/or the accumulation of about 60 proteins was specifically altered in an hns mutant on two-dimension gel electrophoresis. Many of them were identified by microsequencing or by mass spectrometry. They are found to be involved in bacterial response to various stresses (pH, osmolarity, ...). Moreover, to study the global effect of H-NS on gene expression in E. coli, we analysed, in collaboration with A. Malpertuy (Unité de Génétique Moléculaire des Levures), the transcriptome of an hns strain using DNA arrays. These experiments showed that the expression level of 200 genes was modified in a mutant strain (unpublished). Again, most of them are known to be involved in stress response. In particular, the high expression level of several genes induced by high osmolarity or low pH resulted in a strong increased resistance to both stresses in the hns strain. Moreover, many H-NS target genes with unknown function were predicted to encode fimbriae which could play a major role in virulence processes. These observations provide evidence that an hns mutation cannot be simply considered as a loss of function but can provide a selective advantage to the cell with respect to some stressful conditions. Finally, these observations suggest that the main role of hns could be to control the proton availability in the periplasm of many gran-negative bacteria.
Until recently, H-NS had been only characterised in enterobacteria. In collaboration with S. Goyard (Unité de Biochimie des Régulations Cellulaires), an H-NS-like protein was identified in Bordetella pertussis, the aetiological agent of whooping-cough. Its structural gene was isolated and sequenced. Its product showed a significant similarity with H-NS, in particular in the C-terminal domain. Moreover, the screening of databases allowed us to identify a related protein in Rhodobacter capsulatus. In silico analysis of their amino-acid sequence (secondary structure prediction, presence of hydrophobic clusters, ...) in collaboration with R. Brasseur (Centre de Biophysique Moléculaire Numérique, Gembloux, Belgium) suggests that these proteins are structurally related. Moreover, amino-acid sequence alignment demonstrated the existence of a consensus in their DNA binding domain. The structural gene of these proteins was cloned after PCR amplification and proteins were expressed in an hns strain of E. coli. These experiments showed that all proteins are able to complement the phenotypic alterations in such a strain (loss of motility, reduction in growth rate, serine susceptibility, ...). Gel retardation experiments performed with purified proteins revealed a preferential binding to curved DNA similar to that of H-NS. Cross-linking experiments showed that, despite a low amino-acid conservation in their N-terminal domain, these proteins are able to dimerise in vitro. These observations are the first demonstration that proteins structurally and functionnally related to H-NS are widespread in Gram-negative bacteria. Moreover, by complementation of the serine susceptibility of hns mutants in E. coli, we recently isolated and characterised an hns-like gene in Vibrio cholerae, the agent of cholera disease. Similarily, in collaboration with P. Glaser (Laboratoire de Génomique des Microorganismes Pathogènes), we identified two H-NS-like proteins in P. luminescens, an entomopathogenic bacterium whose genome sequencing is currently in progress at the Pasteur Institute. These results further supports the existence of a large family of H-NS-like proteins in microorganisms.

3. Pyrophosphate effects on Escherichia coli: a link with iron metabolism (F. Biville, E. Turlin, M. Perrotte, C.-K. Wun, and several trainees)

In the course of the study of cAMP synthesis in E. coli, the effect of pyrophosphate, a product of the reaction producing cAMP from ATP was investigated. A first series of experiments demonstrated that, in a phosphate-rich minimal medium pyrophosphate had a surprising stimulating growth effect. This effect resulted in a significant modification of the expressed proteome pattern of the cells. This could not be due to a phosphate starvation, and the first hypothesis which came to mind was that energy from the energy-rich bond of the molecule was somehow recovered by the cell. However all experiments meant to explore this hypothesis were unsuccessful. In particular the non hydolysable analog methylene diphosphate had an effect similar to that of pyrophosphate. Analysis of the metabolic activities which varied upon pyrophosphate addition suggested that the tricarboxylic acid cycle was somehow involved. Further exploration demonstrated that the pyrophosphate effect is mimicked by addition of excess iron to the medium. This demonstrated first that, even in a medium supplemented by 5 mM iron, there is still some iron deficiency in a phosphate rich minimum medium, and, second, that the pyrophosphate molecule somehow helps the cell to scavenge existing iron in the environment in a way which permit it to strive on a low iron level (M. Perrotte thesis). Work in progress demonstrates that a phosphorelay system (two-component regulator) of unknown function is involved in this process. When unraveled this will add interesting information on a set of genes of unknown function in the genome of E. coli and will contribute to improve its annotation.

4. Functional analysis of the B. subtilis genome: polyamines and sulfur metabolism (JY Coppée, P. Glaser, M.-F. Hullo, I. Martin-Verstraete, E. Presecan, A. Sekowska, C.-K. Wun)

Among the aims of genomes functional analysis is the possibility to rapidly reconstruct entire metabolic pathways. This cannot be done using in silico analysis alone, because many proteins have a common descent. This results in the fact that related activities often share similar sequences (e.g. a decarboxylase specific for a given amino-acid must be similar to its counterpart specific for another amino-acid). We have therefore constructed relatively rapid tests on plates with molecules or ions that could help us to trace as efficiently as possible genes involved in integrated metabolic pathways. Amino acid metabolism is not well described in B. subtilis, and although quite a few gene similarities point to expected enzyme activities, it is necessary to validate the hypotheses derived from these similarities. We used amino-acid analogs or certain types of antibiotics is a way to achieve this goal. In addition, we set up several growth condition tests (in particular for swarming or gliding on plates) to test for more subtle phenotypes (A. Sekowska, thesis dissertation).
In the course of this systematic analysis, we remarked the importance of intermediary metabolism activities. In particular, polyamines, although dispensable under routinely used laboratory growth conditions, are extremely important for the cell. They are involved in macromolecular syntheses, and in particular in modulating the accuracy of translation, at steps which may be essential for survival of the cell populations. Their importance is reflected by the fact that their biosynthesis is energy costly. This is especially true for the larger molecules, such as spermidine, spermine and their analogues. In particular, spermidine synthesis requires S-adenosylmethionine (AdoMet) as a precursor. Surprisingly, AdoMet is not used as such in the reaction but is first decarboxylated to 3-aminopropyl-S-adenosine (dAdoMet). The aminopropyl- moiety of the substrate is subsequently transferred onto one of the amino-terminal ends of putrescine, to generate spermidine. A further transfer on spermidine yields spermine in some organisms.
Transamination and decarboxylation are ubiquitous steps in intermediary metabolism. They are generally achieved by enzymes carrying pyridoxal phosphate as a co-enzyme. However, a noteworthy feature of the known AdoMet decarboxylation reaction is that it is achieved by an enzyme carrying not a pyridoxal but a pyruvoyl group as the catalytic residue. Pyruvoyl enzymes perform a limited number of varied decarboxylation reactions; comprising the decarboxylation of AdoMet in Eukarya and Gram-negative bacteria. Combining gene disruption experiments and biochemical identification of polyamines, we unravelled the main features of polyamine biosynthesis in B. subtilis, showing that the predominant pathway proceeds from arginine via agmatine. We also observed that, in contrast to E. coli, B. subtilis does not maintain a significant intracellular pool of putrescine under conditions where the level of spermidine is similar to that found in E. coli. We further identified the pathway leading to the addition of an N-propylamine group to putrescine, creating spermidine. This reaction yields the sulfur-rich molecule, methylthioadenosine (MTA) as a by-product. We identified the nucleosidase encoded by the mtn (yrrU) gene as the first enzyme implicated in its recycling. By gene disruption, in vitro mutagenesis, cell-free protein synthesis and biochemical analysis of polyamines, we showed that the unknown gene ytcF, renamed speD, codes for the decarboxylase. Analysis of the phylogenetic relationships among bacterial enzymes demonstrated that the B. subtilis enzyme is very similar to several predicted proteins of unknown function from Archaea. The MJ0315 gene, which presumably encodes an AdoMet decarboxylase of Methanococcus jannaschii, was used to complement B. subtilis ytcF and E. coli speD mutants and was expressed in a cell free system and we could thus identify for the first time the nature of the corresponding gene and protein in Archaea.
While the number of genome sequences increases exponentially it remains difficult to identify gene functions explicitely. Automatic annotation procedures rest mostly on sequence comparisons. They are used to build up phylogeny trees, where reference activities are assumed to spread to neighbours by contiguity. The corresponding functions are thus described tentatively as identical to that of the known reference. However, these methods do not address the central question of enzyme recruitment for new activities. Furthermore, genes and proteins are not simply sequences of letters, they are made from chemicals deriving from the cell metabolism, and a single gene alteration may result in a general base or amino-acid content bias, changing the "style" of an organism, possibly altering its place in calculated phylogenies, thus leading to wrong assignments in enzyme activities. Ouzounis and Kyprides constructed an interesting evolutionary tree of agmatinases, with emphasis on their universal presence. Since this seminal work, many new sequences have been obtained and annotated by their similarity with the known sequences. We undertook a comparative analysis of the corresponding set of sequences. Genes that were deemed important were cloned and attempts were made to identify their functions. We first considered the usual types of phylogeny trees constructed on the variation of the amino-acid sequence in these proteins, without taking into account the presence of gaps in the sequences. Several discrepancies with respect to the expected position of some organisms in the trees were found. In a second approach, we reconstructed trees based only on the presence and evolution of gap-containing regions in the sequences, because gaps would be much less sensitive to genetic drift or amino-acid metabolism. The crucial enzyme activities that presumably evolved from ancestral ureohydrolases were validated by cloning, expressing and measuring activity of the corresponding enzymes. The emerging picture is consistent with a bacterial origin of hydrolases (ureohydrolases and related activities), which later evolved to those of the Archaea and the Eukarya. Our experiments therefore validate the use of gap-trees in the prediction of gene function.
All this work prompted us to analyse the related metabolism of sulfur (A. Sekowska, thesis dissertation), still poorly described in most organisms, and this will be a central area of the research in functional genomics developed in the next few years.

History

Publications

Home