Research activity 2000-2005
In year 2000, after 14 years, the Unit Regulation of Gene Expression terminated its activity mostly devoted to the Bacillus subtilis genome programme. In april 2000 the Director of the Unit went to Hong Kong (China) to create the HKU-Pasteur Research Centre Ltd, a joint venture between the University of Hong Kong and the Institut Pasteur. At the end of that year a new Unit was created in Paris: Genetics of Bacterial Genomes centered on research focused on functional genomics of bacteria of general interest.
Summary of the 1995-2000 activity
Created in 1986, the Regulation of Gene Expression Unit analysed the
nature of heredity, which stores the information required to generate
life, and focuses on the determination of gene functions in reference
genomes, coupling prediction with computers (experiments in silico)
to experiments in vivo. The scientists in the Unit investigate how
the thousands of genes in the chromosome of a cell co-operate in an
organised manner in an ever-changing environment. Their studies have been
guided both by the results of experiments in vivo, which allowed the
scientists to identify gene placed higher and higher in the hierarchy of
genetic controls of the cell life, and by the spectacular progresses of
molecular biology. Two reference micro-organisms are used: Escherichia
coli, the most long-standing genetic model; and Bacillus
subtilis, a source of numerous enzymes used by industry, often found
on the surface of leaves, and abundant in soil. The studies developed in
the Unit identify the genes which are critical to the overall adaptation
of the bacterium to its environment, and are particularly investigating
the metabolism of molecules essential for the cell's construction, that of
sulfur and polyamines. Two-dimension gel electrophoresis of all the
proteins in the bacteria is used to describe the co-variations in the
concentrations of particular proteins as a function of the growth status
of the bacteria, their environment and changes in the genes studied.
Associated to the analysis of the whole set of transcripts (expression
profiling) together with analysis of the genome sequences, typical of the
new field of research now called "genomics", this approach provides a
wealth of information. It has shown that there are large groups of genes
within the cell that are regulated in the same way. A mass of information
is generated by sequencing genomes, and many of the newly identified genes
are enigmatic in nature. To contribute to their understanding, molecular
genetic studies in the Unit are being complemented by research involving
the most up-to-date techniques in computer data management, statistics and
mathematics.
Two specialized databases have been constructed in collaboration with the
University Paris 6 (Atelier de Bioinformatique) and the University of
Versailles (they are available on the World-Wide Web: SubtiList,
and Indigo).
Biological and naturalist aspects of the work are being emphasised, to
identify the major functions of the living organisms. In particular, the
first analyses of the genomes has led to a remarkable observation: the
order of genes on the chromosome relates to the cell's architecture.
Indeed, the gene order in genomes is not random, and there are
experimental hints suggesting that the map of the cell may be directly
related to the chromosome structure. The first results of the in vivo,
in vitro and in
silico investigations aiming at understanding the selection
pressure that underlies these architectural constraints suggest the
systematic existence of supra-macromolecular complexes. Their components
have their genes distributed in a non-uniform way along the chromosome,
and they probably constitute structures of 10 to 50 nanometers that form
the core of the cell's organization.
A genome view of the coordination of gene expression
1. The Bacillus subtilis genome sequence (P. Glaser, MF Hullo, with several students for variable periods of time)
In 1996, the very short genomes of two bacteria had been published by TIGR, and the yeast genome sequence was about to be completed. Started almost ten years earlier, the B. subtilis genome program was well on its way, and a BIOTECH grant from the European Union was supporting a consortium of European laboratories for completing the sequence, expected to be finished in the end of year 1998. The Japanese consortium was also well on the way. However, it appeared that it would be important to speed up our efforts to be present on the international scene at a moment when many laboratories began to be interested in the outcome of genome programs. Together with Frank Kunst, the European coordinator of the program, we decided to speed up the procedure, by involving laboratories which had been part of the yeast genome program in the sequencing effort. This was made somewhat difficult because many regions of B. subtilis DNA, as with all A+T rich Gram positives, are impossible to clone in standard E. coli recipients. We therefore combined standard cloning procedures in a special E. coli strain constructed for this purpose (TP611), cloning into B. subtilis itself, and Long Range PCR (without cloning) for the most difficult regions. This permitted us to possess the complete genome sequence in april 1997, well before the time expected, and to distribute it to the members of the consortium. In addition, we chose to distribute to the yeast teams some regions where we suspected the presence of errors, so that they would be sequenced again, ending with an excellent accuracy (of course, this was not said to the relevant sequencing groups, to avoid useless conflicts inside an exemplary collaborative effort). The complete sequence was presented at the International Bacillus Meeting in Lausanne mid-july 1997, and the sequence was made public in parallel with its publication in november of that same year.
2. Data bases and genome annotation platforms (Maude Klaerr-Blanchard, Claudine Médigue, Ivan Moszer, Eduardo Rocha, in collaboration with Louis Jones at the Service Informatique Scientifique, and several laboratories external to the Institut Pasteur)
Derived from its prototype,Colibri, for the E. coli
genome, the sequence and annotation is displayed in the relational
database SubtiList, which meets several thousands queries per day, more
than two years after the sequence has been published.
Annotating a genome is a never ending process. Indeed, SubtiList is
regularly updated, and the last update, just after the Genome 2000
International Meeting at the Institut Pasteur in April 2000, provided
identification for several hundreds new genes. To prevent misannotation
and propagating errors, we have assigned a special code name to all genes
which have not been explicitely identified by their function (i.e.
experimentally, in vivo or in vitro). In agreement with
Amos Bairoch (SwissProt), we chose for these gene names that they all
begin with a "y" letter. The code we have used follows as closely as
possible Demerec's rule for gene nomenclature, despite much discussion
from the community of B. subtilis scientists who often stick to
old names, often without other reasons than purely anecdotal. We think
that harmonizing nomenclature is very important for the future of the
genetics of genomes.
Careful annotation asked for an elaborate approach in terms of computer
sciences. In collaboration with Alain Hénaut and Jean-Loup Risler from the
Université de Versailles Saint-Quentin, François Rechenmann from INRIA
Rhône-Alpes and Alain Viari from the Atelier de BioInformatique at the
University Paris 6, the Unit created an original strategy allowing genome
annotation in silico. In this strategy the concept of "neighbourhood"
has been favoured as a way to help discovery.
This strategy developed a succession of three relatively independent
levels. Each level comprised as generic, and a specific level. The goal
for the creation of the process was conceptual. It aimed at the prediction
of essential biological functions using the genomic text, together with
the associated biological knowledge distributed in scientific publications
and data libraries. The precise goal was to identify crucial experiments
(to be performed in "wet" laboratories), or to falsify the prediction.
They were illustrated (see below) in the case of polyamine metabolism.
The three levels of the process were:
o 1. sequence data and annotation management: SubtiList and Colibri
o 2. a platform for sequence annotation: Imagene
o 3. a platform as a help to discover (technique of "neigborhoods"):
Indigo
Each level was made of a generic computer software engine, together with
specific data. The aim of the process was to define a set of three coupled
software engines. Each specific application of the process gave as many
valuable results as sequences, annotations or predictions which are
created.
1. SubtiList is constructed from an engine for the management of
genomic databases. It is composed of three parts:
1.1. A data scheme structure, GenoList;
1.2. A data base management system (4th Dimension for stand alone
applications and Sybase for the WWW database);
1.3. An user interface, eventually with specific procedures for data
exploitation (e.g. Blast, Fasta, and other rapid methods for sequence
analysis). The interface can be reconstructed knowing simply the
World-Wide Web access to it, but this can only be done properly knowing
1.1. This explains why, to our knowledge, there are not yet equivalent
bacterial specialized databases.
To construct a specialized database (SubtiList, Colibri, TubercuList, and recently PyloriGene) it was necessary to introduce sequence data and their annotations in the GenoList engine. This input required a generic procedure. The value of the specialized databases comes from two sides, on the one hand from the genericity of the GenoList engine and of its user-friendly and biologically-oriented construction, and on the other hand from the quality of the sequences (above all, of their annotation). This value diminishes (respectively increases) as time elapses if annotation are not (respectively, are) curated. Curation of a set of annotations allows an important appreciation, in parallel with the creation of a know-how that is extremely difficult to reproduce. Our Unit is curating the B. subtilis annotations.
2. Imagene is a
generic engine which allows management and strategic organisation of both
biological objects (sequences, annotations, images, …) and methods for
analysis or management, within the same platform. It is meant to make an
in-depth analysis of genomes locally, for a fine description of their
properties. It has been validated on the specific example of B.
subtilis, by permitting identification of all its coding sequences
and regions of transcription termination. It was used
to predict regions that carried errors due to the sequencing process.
These regions were PCRed out of the chromosome and resequenced by an
independent team.
2.1. The platform was constructed in such a way as to allow one to plug-in
easily any methods for genome analysis (even methods for which the source
code is not available, or methods located far away but available through
the Internet).;
2.2. It allows the chaining of methods, the definition of strategies, and,
if needed, the ability to go reverse during the chaining of methods;
2.3. It possesses a generic visual interface (APIC) permitting one to
start and control the progress of methods and of their results. APIC
allows one to superimpose the results of entirely independent methods on
the same screen. It permits direct access to their results.
Two special features give added value to Imagene as time elapses. On the one hand the data can be organised in such a way as to construct efficient specalised strategies for genome annotation. On the other hand, the number of methods for analysis that are plugged-in can increase without limitation. If they are accessed through the Internet the engine will know how to start them and recover their results. It will also know how to integrate them into its strategies. As a consequence the rational integration into new strategies of old and new methods will increase with time and use. We can notice, among the methods, the special case of data managenement: it is therefore quite possible to think about plugging-in to Imagene specialised databases. This will be seen by Imagene as a special task, data management. One also can think about creating relationships between sequence and annotation data (neighbourhoods, see Indigo, next paragraph).
3. Indigo is a
prototype platform used as a help
for discovery, meant to find difficult to predict neighbours between
the various functions related to genes (P. Nitschké, C. Hénaut, in
collaboration with P. Guerdoux-Jamet and A. Hénaut).
Indigo is organized in a simple way around a hierarchy of flat files, all
centered around gene names, and corresponding to homogenous classes of
features (such as codon usage bias, proximity in the chromosome, in
metabolism, in isoelectric point of the gene products, in functional
class, in literature articles, etc). It is clear that many other types of
neighbourhoods should be considered as well, including quite elaborate
ones. As an immediate goal for the improvement of Indigo one must create a
data structure for the neighbourhood relationships. The published
prototype is only meant to demonstrate the feasability of the approach. It
illustrates the possibility to make interesting discoveries, even with the
limited means allocated at present. Indigo is superficially organised as
is GenoList. It possesses an engine (written in Java), that overlaps with
the user interface. It is applied to specific data (at the present time E.
coli, B. subtilis and Arabidopsis thaliana). One
must therefore note that, even more than in the case of specialised
databases, the value of a specialised Indigo is directly linked to the
quality of the data included. The corresponding information results from
annotation steps (statistical analysis for example, that could of course
be produced by a strategy included in Imagene), but also from the
extraction, at this time manual, of literature neighbours. The creation of
appropriate files of this type could rapidly acquire a great value (if
they are not publicly available). Finally, because Indigo is a method, it
can be, in principle, plugged-in to Imagene.
This set of coordinated approaches has been used to set up an
international network financed by the European Science Foundation.
3. The map of the cell is in the chromosome (I. Moszer, E. Rocha, in collaboration with A. Hénaut and A. Viari)
Knowledge of whole genome sequences is a unique opportunity to study the
relationships between gene and gene products at the global level of the
cell's architecture. Part of the difficulty of this study comes from the
fact that — contrary to a generally accepted intuitive idea — there is
often no predictable link between structure
and function in biological objects. However, as the outcome of
natural selection pressure, there must exist some fitness between gene,
gene products and the survival of the organism. This indicates that
observing biases in features which would conceptually be thought of as to
be unbiased, is the hallmark of some selection pressure.
This prompted us to study global properties of complete genomes. A first
analysis on the word content of genome texts suggested that they are not
all managed in the same way. We therefore concentrated on long exact
repeats, and discovered that, in contrast to what could be expected,
the shortest genomes (the Mycoplasmas) had the highest repeat frequency.
Also, genomes of comparable sizes such as those of E. coli and B.
subtilis have an entirely different way to manage repeats. They are
present everywhere in the former genome, while they are very rare, and in
close proximity (ca 10 kb) in the latter. In constrast, when we studied
the distribution of words, bases or codons in the leading strand as
compared to the lagging strand, we made an extremely surprising discovery.
There is such a strong
bias in one strand as compared to the other (the leading strand is
G+T-rich, while the lagging strand is A+C-rich), that the bias is
reflected in the amino-acid composition of the proteins encoded by each
strand (valine-rich for the leading strand, isoleucine+threonine-rich for
the lagging strand)! This bias is not present in all genomes (it seems to
be absent from genomes of bacteria having an important proportion of
membranes, such as the methanogens or the cyanobacteria), but, when
present, it is universally the same.
Among other consequences, all these observations tell us that genes do not
move as frequently, or as easily as it is often implicitely assumed. There
must exist, therefore, constraints in the gene organisation of a
chromosome.
Because the genetic code is redundant, coding sequences can be studied by
analysing their codon usage. If there were no bias, all codons for a given
amino-acid should be used more or less equally. In contrast, it has long
been observed in E. coli that genes could be split into three
classes according to the way they use codons. The same was true for B.
subtilis. Yet, random mutations should somehow smooth out
differences. This is not the case: indeed, for leucine, where six codons
are used, we find that the CUG codon is used more than 70% of the cases in
genes that are expressed at a high level during exponential growth
conditions, while CUA is expressed in less than 2% of the cases. What is
the source of such biases? There might exist a systematic effect of
context, some DNA sequences being favoured or selected against. While this
could be true for some codons, this cannot be generalized. We know that
translation of mRNA into proteins requires the action of transfer RNA
adaptor molecules. Because there is less tRNAs specific for a given
amino-acid than the number of codons, some tRNAs must read several codons.
A bias in the concentration of tRNAs might thus result in a bias in codon
usage. Therefore we must analyse selection pressure occuring at the level
of tRNA synthesis. This is the generally accepted reason to account for
the codon usage biases. Unfortunately, two reasons go against this
interpretation. Firstly, in much the same way as that there would be all
reasons to smooth out biases in codon usage, similar constraints would
smooth out biases in tRNA synthesis. For example if a tRNA gene had a
strong promoter, spontaneous mutations would tend to lower its efficiency,
making transcription of this particular tRNA similar to its other
counterparts. This is true, unless there is selection pressure for the
converse. The second reason is that, while explanation for the strong bias
in a given class of genes could be explained in this way, the same
explanation cannot hold for a strong bias in another class of genes.
However we know, both from the study of the E. coli and B.
subtilis genomes, that two classes of genes display extremely
strong, but different biases. And a same tRNA molecule cannot be both
expressed at a high level, and not expressed at a high level…
This requires looking for another explanation. The cytoplasm of a cell is
not a tiny test tube. One of the most puzzling feature of the organisation
of the cell cytoplasm is that it must accomodate the presence of a very
long thread molecule, DNA, and that this molecule must be transcribed as a
multitude of RNA threads that usually have a length of the same order of
magnitude as the length of the whole cell. This asks for some organisation
of transcription, translation and replication so that mRNA molecules and
DNA are not mixed up together all the time. The volume occupied by a
ribosome is a cube with an 200 Å edge. In an E. coli cell growing
exponentially in a rich medium there are at least 15,000 ribosomes. Thus,
the fraction of the cell volume occupied by ribosomes is at least 12 %.
The actual volume of the cell free of ribosomes is in fact significantly
smaller if one takes into account the volume occupied by the chromosome
and by the transcription and the replication machineries. If one now
counts that the translation machinery asks for an appropriate pool of
elongation factors, tRNA synthetases and tRNAs, it becomes clear that the
cytoplasm behaves like a gel. In addition, simply counting the number of
tRNA molecules sitting around a ribosome, it appears that one cannot speak
about the concentration of such molecules, but only about a small, finite
number. Compartmentalisation has been demonstrated to be important even
for small molecules, despite the fact that they could diffuse quickly. As
a consequence, a translating ribosome acts as an attractor of a certain
pool of tRNA molecules. In such a case diffusion should only be considered
locally. The cytoplasm becomes therefore a ribosome lattice, displaying
relatively slow movements with respect to local diffusion of small
molecules as well as macromolecules. This provides an efficient selection
pressure leading to adaptation of the codon usage of the translated
message as a function of its position in the cell's cytoplasm. If the
codon usage changes from mRNA to mRNA, this indicates that these different
molecules do not see the same ribosomes in the usual life cycle of the
organism. In particular if two genes have very different codon usage this
indicates that the corresponding mRNAs are not made from the same part of
the cell (it is indeed difficult to see how ribosomes sitting next to each
other could attract different tRNA molecules).
Several models of transcription account for a process where the
transcribed regions are present at the surface of the chromoid, so that
RNA polymerase does not have to circle the double helix it is unwinding
and transcribing. Thus mRNA threads, usually structured at their 5' end,
are pulled off DNA by the lattice of ribosomes, going from one ribosome to
the next one, as does a thread in a wiredrawing machine (this is exactly
the opposite view of textbooks translation, where ribosomes are supposed
to travel along fixed mRNA molecules). In this process a nascent protein
is synthesized on each ribosome, spread throughout the cytoplasm by the
linear diffusion of the mRNA molecule from one ribosome to the next one,
avoiding the requirement for the much slower 3D diffusion of the protein.
Polycistronic operons ensure that proteins with related functions are
co-expressed locally, permitting channelling of the corresponding
substrates and products. It seems likely that the structure of mRNA
molecules is coupled to their fate in the cell, and to their function in
compartmentalisation. The fate of mRNA is therefore an important feature
of gene regulation. We have therefore investigated the degradation process
of mRNAs, comparing data extracted from the genomes of B. subtilis
and E. coli. This led us to identify a main function of the
elusive enzyme polynucleotide phosphorylase, as producing CDP needed for
DNA synthesis, thus coupling translation, transcriptiona and replication
together. If we consider genes translated sequentially in operons as
physiologically and structurally relevant, we should also analyse mRNAs
that are translated parallel to each other. Indeed if there is correlation
of function and/or localisation in one dimension, there should also exist
a similar constraint in the orthogonal directions. How would this be seen?
This is where codon usage comes again. Indeed if ribosomes act as
attractors of tRNA molecules, this implies a local coupling between these
molecules and the codons they can use in the message they read. Obviously,
this requires that the same ribosome mostly translates mRNAs having
similar codon usage. This has the consequence that as one goes away from a
strongly biased ribosome, there is less and less availability of the most
biased tRNAs. In turn, there would be selection pressure for a gradient of
codon usage bias as one goes away from the most biased messages and
ribosomes. Transcripts are nested around central core(s), formed of
transcripts for highly biased genes. This fits with what is seen of the
general organisation of genes in the chromosome. In particular this agrees
with the observation that the distance between E. coli genes
oriented in the same direction on the chromosome is positively correlated
to the expression level of the downstream gene.
Finally, the chromosomes must separate from each other and migrate in each
of the daughter cells. There must exist some kind of repulsive force that
pushes DNA strands away from each other. While there are probably gene
products involved in this process, ribosome synthesis, in particular from
regions near the origin of replication, performs exactly what is needed,
by continuously creating new ribosomes. Continuous synthesis of ribosomes
in between the replicating forks would also provide a mechanical stress on
the bacterial wall in the middle of the cell. Koch has convincingly argued
that the bacterial wall is indeed a stress-bearing fabric. If ribosome
sources are organisers of the cell, mRNA for genes highly expressed under
exponential growth conditions should be located near the center of these
organisers, while other mRNAs should be translated in nested layers, all
the way to the ribosomes that are located near the cytoplasmic membrane,
and that would be involved in cotranslational membrane protein
localisation. Organisation of the genes in the chromosome should therefore
show regularities that are linked to this architecture, as we have indeed
observed. This gives us strong reasons to propose that genes along the
chromosome specify the map of the cell, a kind of celluloculus.
A geneticist's view: master genes and intermediary metabolism
1. Cyclic AMP and adenylate cyclases: the discovery of a fourth cyclase class (M.-P. Coudart-Cavalli, P. Trotot, P. Biville, O. Sismeiro)
Cyclic AMP is a mediator of catabolite repression in bacteria. Curiously, despite the interest for this important process, not much was known on the rather elusive enzymes, adenylate cyclases, which make this molecule from ATP. In 1996, the work in the Unit had already discovered three main classes of these enzymes, which were apparently unrelated phylogenetically. Very remarkably, this work demonstrated that Gram negative bacteria could differ in the nature of the adenylate cyclase they harboured: enterobacteria had one type, while myxobacteria, or rhizobia had another type (a more ancestral form, presumably, since it is phylogenetically similar to the enzymes found in Eukarya). In the course of a screening for adenylate cyclases in bacteria related to enterobacteria, but differing from them, we made the surprizing discovery that A. hydrophila harboured a fourth adenylate cyclase type, an enzyme much related to proteins found in Archaea. This protein was found in all species of A. hydrophila investigated, but not in other Aeromonas sp. The counterpart of the gene was found in the Y. pestis genome, and shown to express adenylate cyclase activity (unpublished). The reason for this extraordinary variety in adenylate cyclases in not known.
2. Global analysis of the H-NS protein function (P. Bertin, F. Hommais, O. Soutourina, C. Tendeng and several trainees)
To study the global regulation of bacterial metabolism, in particular in
pathogenic microorganisms, we used the hns mutation in Escherichia
coli as a reference system. Indeed, the H-NS protein is known to be
involved in numerous fonctions in the cell and to affect the expression of
genes regulated by environmental factors (temperature, osmolarity, ...).
Three main topics have been developped since 1996.
Motility and/or flagellum biosynthesis have been frequently associated
with virulence in various microorganisms. In enterobacteria, this process
requires the expression of numerous genes scattered on the chromosome and
organised in an ordered cascade. The fliC mRNA coding for
flagellin and the FliC protein itself are absent in an hns mutant, which
results in a loss of motility. Moreover, using transcriptional fusions, we
showed that an hns mutation results in a 3-fold decreased
expression of flhDC, the master operon which controls all other
flagellar genes. This was the first example of positive control by H-NS so
far described. Similar observations were made in a crp mutant,
providing evidence that, like H-NS, the cAMP/CAP complex plays a role of
activator on flagellar gene expression. To know whether these regulators
could affect flhDC expression by interacting with its promoter, we
performed gel shift experiments using purified proteins. The results
demonstrated that the flhDC promoter region is preferentially
retarded in the presence of H-NS or CAP. Moreover, DNAse footprinting
experiments allowed us to determine precisely their binding sites on the
flhDC regulatory region. In vitro transcription assays were performed in
collaboration with S. Rimsky and A. Kolb (Unité de Physico-Chimie des
Macromolécules Biologiques). Surprisingly, H-NS seems to repress
flhDCtranscription while the cAMP/CAP complex activates its expression.
Finally, in a crp mutant, motility is restored in the presence of
wild-type CAP protein but not in the presence of protein mutated in region
I involved in the interaction with RNA polymerase. This suggests that the
cAMP/CAP complex positively regulates flagellum synthesis by a direct
interaction with the C-terminal part of the RNA polymerase a subunit. In
contrast, the binding of H-NS to the same region cannot explain its
positive control observed in vivo on flagellum synthesis. In this respect,
the existence of a long non-coding region between the +1 transcriptional
start site and the ATG translational codon seems to play a crucial role in
the control of the master operon by H-NS. Finally, to know whether a
similar mechanism of flhDC regulation could be extrapolated to
other organisms, we analysed the promoter region of an homologous operon
recently identified in Photorhabdus luminescens, using a method allowing
direct determination of the nucleotide sequence from genomic DNA.
Our results demonstrated the presence of a cAMP/CAP binding site and of a
non-translated region (unpublished observations). This suggests that, in
this organism, the mechanism of flhDC regulation could be similar
to that in E. coli.
The pleiotropic effect of the hns mutation led us to analyse the
role of H-NS on bacterial physiology using large scale technologies. In
collaboration with C. Laurent-Winter (Laboratoire de Physico-Chimie des
Macromolécules) and J.P. LeCaer (Laboratoire de Neurobiologie et Diversité
Cellulaire, ESPCI, Paris), we demonstrated that the synthesis and/or the
accumulation of about 60 proteins was specifically altered in an hns
mutant on two-dimension gel electrophoresis. Many of them were identified
by microsequencing or by mass spectrometry. They are found to be involved
in bacterial response to various stresses (pH, osmolarity, ...). Moreover,
to study the global effect of H-NS on gene expression in E. coli,
we analysed, in collaboration with A. Malpertuy (Unité de Génétique
Moléculaire des Levures), the transcriptome of an hns strain using
DNA arrays. These experiments showed that the expression level of 200
genes was modified in a mutant strain (unpublished). Again, most of them
are known to be involved in stress response. In particular, the high
expression level of several genes induced by high osmolarity or low pH
resulted in a strong increased resistance to both stresses in the hns
strain. Moreover, many H-NS target genes with unknown function were
predicted to encode fimbriae which could play a major role in virulence
processes. These observations provide evidence that an hns
mutation cannot be simply considered as a loss of function but can provide
a selective advantage to the cell with respect to some stressful
conditions. Finally, these observations suggest that the main role of hns
could be to control the proton availability in the periplasm of many
gran-negative bacteria.
Until recently, H-NS had been only characterised in enterobacteria. In
collaboration with S. Goyard (Unité de Biochimie des Régulations
Cellulaires), an H-NS-like protein was identified in Bordetella
pertussis, the aetiological agent of whooping-cough. Its structural
gene was isolated and sequenced. Its product showed a significant
similarity with H-NS, in particular in the C-terminal domain. Moreover,
the screening of databases allowed us to identify a related protein in Rhodobacter
capsulatus. In silico analysis of their amino-acid sequence
(secondary structure prediction, presence of hydrophobic clusters,
...) in collaboration with R. Brasseur (Centre de Biophysique Moléculaire
Numérique, Gembloux, Belgium) suggests that these proteins are
structurally related. Moreover, amino-acid sequence alignment demonstrated
the existence of a consensus in their DNA binding domain. The structural
gene of these proteins was cloned after PCR amplification and proteins
were expressed in an hns strain of E. coli. These
experiments showed that all proteins are able to complement the phenotypic
alterations in such a strain (loss of motility, reduction in growth rate,
serine susceptibility, ...). Gel retardation experiments performed with
purified proteins revealed a preferential binding to curved DNA similar to
that of H-NS. Cross-linking experiments showed that, despite a low
amino-acid conservation in their N-terminal domain, these proteins are
able to dimerise in vitro. These observations are the first demonstration
that proteins structurally and functionnally related to H-NS are
widespread in Gram-negative bacteria. Moreover, by complementation of the
serine susceptibility of hns mutants in E. coli, we recently isolated and
characterised an hns-like
gene in Vibrio cholerae, the agent of cholera disease.
Similarily, in collaboration with P. Glaser (Laboratoire de Génomique des
Microorganismes Pathogènes), we identified two H-NS-like proteins in P.
luminescens, an entomopathogenic bacterium whose genome sequencing
is currently in progress at the Pasteur Institute. These results further
supports the existence of a large family of H-NS-like proteins in
microorganisms.
3. Pyrophosphate effects on Escherichia coli: a link with iron metabolism (F. Biville, E. Turlin, M. Perrotte, C.-K. Wun, and several trainees)
In the course of the study of cAMP synthesis in E. coli, the effect of pyrophosphate, a product of the reaction producing cAMP from ATP was investigated. A first series of experiments demonstrated that, in a phosphate-rich minimal medium pyrophosphate had a surprising stimulating growth effect. This effect resulted in a significant modification of the expressed proteome pattern of the cells. This could not be due to a phosphate starvation, and the first hypothesis which came to mind was that energy from the energy-rich bond of the molecule was somehow recovered by the cell. However all experiments meant to explore this hypothesis were unsuccessful. In particular the non hydolysable analog methylene diphosphate had an effect similar to that of pyrophosphate. Analysis of the metabolic activities which varied upon pyrophosphate addition suggested that the tricarboxylic acid cycle was somehow involved. Further exploration demonstrated that the pyrophosphate effect is mimicked by addition of excess iron to the medium. This demonstrated first that, even in a medium supplemented by 5 mM iron, there is still some iron deficiency in a phosphate rich minimum medium, and, second, that the pyrophosphate molecule somehow helps the cell to scavenge existing iron in the environment in a way which permit it to strive on a low iron level (M. Perrotte thesis). Work in progress demonstrates that a phosphorelay system (two-component regulator) of unknown function is involved in this process. When unraveled this will add interesting information on a set of genes of unknown function in the genome of E. coli and will contribute to improve its annotation.
4. Functional analysis of the B. subtilis genome: polyamines and sulfur metabolism (JY Coppée, P. Glaser, M.-F. Hullo, I. Martin-Verstraete, E. Presecan, A. Sekowska, C.-K. Wun)
Among the aims of genomes functional analysis is the possibility to
rapidly reconstruct entire metabolic pathways. This cannot be done using in
silico analysis alone, because many proteins have a common descent.
This results in the fact that related activities often share similar
sequences (e.g. a decarboxylase specific for a given amino-acid must be
similar to its counterpart specific for another amino-acid). We have
therefore constructed relatively rapid tests on plates with molecules or
ions that could help us to trace as efficiently as possible genes involved
in integrated metabolic pathways. Amino acid metabolism is not well
described in B. subtilis, and although quite a few gene
similarities point to expected enzyme activities, it is necessary to
validate the hypotheses derived from these similarities. We used
amino-acid analogs or certain types of antibiotics is a way to achieve
this goal. In addition, we set up several growth condition tests (in
particular for swarming or gliding on plates) to test for more subtle
phenotypes (A. Sekowska, thesis dissertation).
In the course of this systematic analysis, we remarked the importance of
intermediary metabolism activities. In particular, polyamines, although
dispensable under routinely used laboratory growth conditions, are
extremely important for the cell. They are involved in macromolecular
syntheses, and in particular in modulating the accuracy of translation, at
steps which may be essential for survival of the cell populations. Their
importance is reflected by the fact that their biosynthesis is energy
costly. This is especially true for the larger molecules, such as
spermidine, spermine and their analogues. In particular, spermidine
synthesis requires S-adenosylmethionine (AdoMet) as a precursor.
Surprisingly, AdoMet is not used as such in the reaction but is first
decarboxylated to 3-aminopropyl-S-adenosine (dAdoMet). The aminopropyl-
moiety of the substrate is subsequently transferred onto one of the
amino-terminal ends of putrescine, to generate spermidine. A further
transfer on spermidine yields spermine in some organisms.
Transamination and decarboxylation are ubiquitous steps in intermediary
metabolism. They are generally achieved by enzymes carrying pyridoxal
phosphate as a co-enzyme. However, a noteworthy feature of the known
AdoMet decarboxylation reaction is that it is achieved by an enzyme
carrying not a pyridoxal but a pyruvoyl group as the catalytic residue.
Pyruvoyl enzymes perform a limited number of varied decarboxylation
reactions; comprising the decarboxylation of AdoMet in Eukarya and
Gram-negative bacteria. Combining gene disruption experiments and
biochemical identification of polyamines, we unravelled the main features
of polyamine biosynthesis in B. subtilis, showing that the
predominant pathway proceeds from arginine via agmatine. We also observed
that, in contrast to E. coli, B. subtilis does not
maintain a significant intracellular pool of putrescine under conditions
where the level of spermidine is similar to that found in E. coli.
We further identified the pathway leading to the addition of an
N-propylamine group to putrescine, creating spermidine. This reaction
yields the sulfur-rich molecule, methylthioadenosine (MTA) as a
by-product. We identified the nucleosidase encoded by the mtn (yrrU)
gene as the first enzyme implicated in its recycling. By gene disruption,
in vitro mutagenesis, cell-free protein synthesis and biochemical analysis
of polyamines, we showed that the unknown gene ytcF, renamed speD,
codes for the decarboxylase. Analysis of the phylogenetic relationships
among bacterial enzymes demonstrated that the B. subtilis enzyme
is very similar to several predicted proteins of unknown function from
Archaea. The MJ0315 gene, which presumably encodes an AdoMet decarboxylase
of Methanococcus jannaschii, was used to complement B.
subtilis ytcF and E. coli speD mutants and was
expressed in a cell free system and we could thus identify for the first
time the nature of the corresponding gene and protein in Archaea.
While the number of genome sequences increases exponentially it remains
difficult to identify gene functions explicitely. Automatic annotation
procedures rest mostly on sequence comparisons. They are used to build up
phylogeny trees, where reference activities are assumed to spread to
neighbours by contiguity. The corresponding functions are thus described
tentatively as identical to that of the known reference. However, these
methods do not address the central question of enzyme recruitment for new
activities. Furthermore, genes and proteins are not simply sequences of
letters, they are made from chemicals deriving from the cell metabolism,
and a single gene alteration may result in a general base or amino-acid
content bias, changing the "style" of an organism, possibly altering its
place in calculated phylogenies, thus leading to wrong assignments in
enzyme activities. Ouzounis and Kyprides constructed an interesting
evolutionary tree of agmatinases, with emphasis on their universal
presence. Since this seminal work, many new sequences have been obtained
and annotated by their similarity with the known sequences. We undertook a
comparative analysis of the corresponding set of sequences. Genes that
were deemed important were cloned and attempts were made to identify their
functions. We first considered the usual types of phylogeny trees
constructed on the variation of the amino-acid sequence in these proteins,
without taking into account the presence of gaps in the sequences. Several
discrepancies with respect to the expected position of some organisms in
the trees were found. In a second approach, we reconstructed trees based
only on the presence and evolution of gap-containing regions in the
sequences, because gaps would be much less sensitive to genetic drift or
amino-acid metabolism. The crucial enzyme activities that presumably
evolved from ancestral ureohydrolases were validated by cloning,
expressing and measuring activity of the corresponding enzymes. The
emerging picture is consistent with a bacterial origin of hydrolases
(ureohydrolases and related activities), which later evolved to those of
the Archaea and the Eukarya. Our experiments therefore validate the use of
gap-trees in the prediction of gene function.
All this work prompted us to analyse the related metabolism
of sulfur (A. Sekowska, thesis dissertation), still poorly described
in most organisms, and this will be a central area of the research in
functional genomics developed in the next few years.