From deciphering genomes to synthetic biology: an embodiment of the Landauer principle
Before extracting part of the contribution to general knowledge of the work I developed for several decades, let me recall the last word of The Delphic Boat, to try and prevent misunderstandings: I am well aware that, in contrast to Art, Science should not have names. This short presentation is a way of explaining the « style » of a scientist's work, not to promote a narcissistic view.
The central question I have been exploring over the last few decades is this. Is there a general principle that explains the fact that biological chemistry seems to be somehow 'animate'? This led to the second question. Is it possible to discover rules that explain the fact that genes function as a whole in the cell and contribute to its coherent and reproducible development? If we isolate some of the important trends in this research, we get a picture that culminates in what can be considered as « symplectic biology », a biology in which the relations between objects have a greater conceptual importance than the objects themselves (see a view independently proposed by Murray Gell-Mann ***). This means that the embodiment of abstraction is essential for understanding what life is. A critical consequence of this constraint is that, because the atoms of life have intrinsic properties reflected in the Mendeleieff table, which have nothing to do with the abstract world to which they are related, many features of life will look like anecdotes. Consequently, life forms are very diverse. This makes it quite difficult to identify the underlying laws. When this fact is understood, the idea that it will be possible to reconstruct life, and even to construct material objects with living properties, from building blocks different from those in existing living organisms will gain ground. Synthetic biology is no longer a dream, it is becoming an unprecedented achievement. Yet it is becoming essential to identify what makes life so special.
Living organisms produce young offspring. However, their offspring come from parents who have already grown old. This implies that somehow the parents have recruited or created some kind of new information. Information is an essential currency of our physical world, it is physical. In 1961, Rolf Landauer established that computation is reversible, with the consequence that the creation of information does not require energy dissipation. However, resetting the process used to create information again requires erasing the memory of past events. This is energy intensive. Charles Bennett, in 1988, illustrated reversible computing by showing how to construct a simple arithmetic operation, division. He showed that the result of a division is obtained when the intermediate states are erased, leaving the remainder of the division as the prominent and 'valuable' result of the calculation. In this process, the erasure of memory dissipates energy. With this description, Bennett did not explain how the remainder of the division could be distinguished from the remaining bits that were to be erased. To make the choice, one needs some kind of additional (contextual) information: I have proposed that this is where energy dissipation comes in. Energy is used to prevent the remainder of the division from being erased, while erasing the remaining memory. This puts the remaining memory back into a state that can be used for other calculations. How does this process take place? The work developed here is an attempt to understand this process within cells, after explicitly identifying the two steps of the Landauer principle:
1/ An information-laden (or 'tense') step, associated with the capture of an energy source, without energy dissipation, and providing a quantum of information (typically the selection of a specific molecule, in an environment containing many related molecules). In the case of enzymes, this typically manifests itself as a functional step triggered upon binding to a non-hydrolysable ATP analogue (APPNP or related molecules).
2 / A reset step, where energy is dissipated (usually the hydrolysis of an ATP or other nucleoside triphosphate to ADP and phosphate), so as to return the system to its ground state, allowing the process to start again.
These types of functions—which must be identified among the critical functions encoded in all genomes—play a role similar to that of Maxwell’s demons (MxDs) that can discriminate a substrate among similar ones, identify a specific position in a 3D structure, or a particular time in a smooth set of events. Some fifty functions of this type could be identified in the minimal set required for an autonomous life.
|G Boël, O Danot, V de Lorenzo, A Danchin
Omnipresent Maxwell’s demons orchestrate information management in living cells
Microb Biotechnol. (2019) 12: 210-242 doi: 10.1111/1751-7915.13378
A dozen of them are used to direct the correct folding and assembly of the reading head of the genetic message, the ribosome. This nanomachine is based on the spontaneous folding of a very long RNA by water, and as the number of incorrect conformations is very large, it requires agents capable of retaining only those that are finally functional by discarding or re-folding the others. There are also other functions that repair broken DNA molecules, calibrate the supercoiling of the double helix or export toxic components out of the cell while preserving essential ones, etc.
Before arriving at this hypothesis of the cause of the animation of life, the research I developed followed several tracks, which are summarised here as a way of understanding the experimental tracks that led to the hypothesis.
Main exploration tracks
A summary of our view of the minimal functions required to make a cell alive revealed many genes coding for unknown functions.
||A Danchin, G Fang
Unknown unknowns: essential genes in quest for function
Microb Biotechnol. (2016) 9: 530-540 doi: 10.1111/1751-7915.12384
Among these functions, we have highlighted the presence of three neglected functional categories: Cells are prone to errors and ageing. As a first key function, discrimination between clean and altered entities is indispensable. Discrimination requires the management of information, a genuine, but abstract, currency of reality. For example, proteins age, sometimes very quickly. The cell must identify and then get rid of old proteins without destroying the young ones. The implementation of discrimination in cells leads to the second set of functions, usually ignored. Being abstract, 'information' must nevertheless be embodied into material entities, with unavoidable idiosyncratic properties. This gives rise to new unmet functional needs. Thus, the growth of cells requires specific but clumsy material implementations, "kludges". Although difficult to identify this tinkering become essential in particular situations. Finally, a third functional category characterises the need for growth: metabolic implementations that allow the cell to organise the growth of its cytoplasm, membranes and genome, in different spatial dimensions. Solving this metabolic dilemma, which is essential for the engineering of new synthetic biology chassis, has led to the discovery of an unexpected role for CTP synthetase as a coordinator of non-homothetic growth.
||Ou Z, Ouzounis C,
Wang D, Sun W, Li J, Chen W, Marlière P, Danchin A
A path toward SARS-CoV-2 attenuation: Metabolic pressure on CTP synthesis rules the virus evolution
Genome Biol Evol (2020) 12: 2467-2485 doi: 10.1093/gbe/evaa229
Three overlooked key functional classes for building up minimal synthetic cells
Synthetic Biology (Oxford) (2021) 6: ysab010 doi: 10.1093/synbio/ysab010
In 1986, I chose to explore the sequencing of a whole bacterial genome in an attempt to understand the basic principles of both its construction and its role. At the time, this undertaking was seen by most biologists as a waste of time and resources, and was unlikely to yield much new information. My idea was to explore the coupling between the coordination of gene expression and the physical organisation of the genome, on the basis that a genome was not just a collection of genes. After a complex set of political obstacles, impossible to summarise here (see Why sequence genomes? The Escherichia coli imbroglio or The Delphic Boat) I was eventually driving the sequencing of a large segment of the Bacillus subtilis genome and, together with the late Frank Kunst, in the scientific co-ordination of an international team for sequencing the genome of sztrain 168 of this organism. This led me to try and organize genome bioinformatics in France with the help of several colleagues at Universities, and national research agencies, through the creation of a nation-wide group, GDR 1029 (1991-1995) and subsequently through the coordination of the bioinformatics programme of the Groupement de Recherche et d'Etudes des Genomes (1992-1996, headed by Piotr Slonimski), and subsequently at the Comité de Coordination des Sciences du Vivant (1998-2000). As the director of the Department Genomes and Genetics at the Institut Pasteur until june 2009, I put an final point to the project by re-sequencing and re-annotating afresh the sequence of the reference genome of B. subtilis, as a tribute to the whole international community working on this model organism. This endeavour was renewed in 2013 and in 2018, and I am updating the annotation on a monthly basis, in parallel with that of Escherichia coli and Pseudomonas putida. In 1991, the B. subtilis program, in parallel with that of Yeast, discovered that a very large number of the genes making genomes were still of unknown function.
This same year the analysis of what was known of the E. coli genome led us to demonstrate, that, rather than be an anecdotal feature as previously thought, horizontal gene transfer accounted for a large proportion of bacterial genomes. Since then HGT has been found to be an essential component of the construction of most if not all, genomes. And indeed, the number of articles in the domain keeps increasing at a fast pace.
This paper shows, for the first time, that in the genome of the best known bacterium, Escherichia coli, one sixth of the genes originate elsewhere. This result, which demonstrates the considerable importance of horizontal gene transfer (HGT) in bacteria, also shows that replication fidelity is not the primary characteristic of species, but that error-correcting genes spread by horizontal transfer. This work highlights the role of mutator strains in the environment, giving horizontal gene transfer a prominent role in adaptation to a new niche, particularly during the evolution from commensalism to pathogenicity.
Amusingly, the comment of the Assistant Editor of Nature to exclude publication was: "I have discussed your manuscript with my colleagues, and while we appreciate the interest of your observation suggesting the existence of an apparent 'third class of genes', the inference of horizontal gene transfer seems rather tentative, and for this reason we feel that the manuscript would be better suited to publication in a rather more specialized molecular evolution or microbiology journal." And this is probably why this same popular magazine had to re-publish an updated view of HGT in year 2000 under the name of "lateral gene transfer"... Nominalism is still relevant
Cet article montre, pour la première fois, que dans le génome de la bactérie la mieux connue, Escherichia coli, un sixième des gènes provient d'ailleurs. Ce résultat, qui démontre l'importance considérable du transfert génétique latéral chez les bactéries, montre aussi que la fidélité de la réplication n'est pas le caractère premier des espèces, mais que les gènes de correction des erreurs se propagent par transfert horizontal. Ce travail met en évidence le rôle des souches mutatrices dans l'environnement, donnant au transfert génétique horizontal un rôle de premier plan pour l'adaptation à une nouvelle niche, en particulier au cours de l'évolution du commensalisme vers la pathogénicité
This observation led me to propose that, contrary to popular belief, there is a strong pressure for genomes to be long, not streamlined. Modelling the consequences of this hypothesis is underway:
Three overlooked key functional classes for building up minimal synthetic cells
Synthetic Biology (Oxford) (2021) 6: ysab010
In particular, it explains the surprising observation that deoxyribonucleotide synthesis starts from ribonucleoside diphosphates, not triphosphates.
||S Noria, A Danchin|
|Just so genome stories: what does my neighbor tell me|
|Proceedings of the Uehara Memorial Foundation Symposium: Genome Science: towards a new paradigm? H Yoshikawa, N Ogasawara, N Satoh, eds. Elsevier Science BV(2002) International congress series 1246: 3-13|
Cette conférence explore de façon inductive les voisinages de gènes variés chez les bactéries modèles. L'illustration principale montre un couplage fort entre la synthèse de l'ADN et la dégradation des ARN par le dégradosome. La cause principale de ce couplage est la façon dont est construit le métabolisme des pyrimidines: le CDP nécessaire à la synthèse du dCDP manque au cours de la synthèse de novo. Au contraire ce métabolisme crée de l'UDP qui serait un précurseur dangereux du dUDP. L'uridylate kinase doit donc être compartimentée, ce qui est observé en effet. Mais le gène correspondant (pyrH) est associé au gène codant le facteur du recyclage du ribosome. L'étude prédit donc que la fonction de ce facteur est régulée par la présence d'UTP dans la cellule, et sa carence au moment de la synthèse des tiges et boucles terminant la transcription des opérons
This lecture explores inductively the neighbourhood of several genes in model bacteria. The main illustration shows a strong coupling between DNA synthesis and RNA degradation by the degradosome. The main cause of this coupling is the way in which pyrimidine metabolism is set up in most cells: the CDP required for dCDP synthesis is absent from the de novo synthesis pathway. Instead, this pathway creates UDP, which is thought to be a dangerous precursor of dUDP. Uridylate kinase must therefore be compartmentalized, which is indeed observed. But the corresponding gene (pyrH) is cotranscribed with the gene coding for ribosome recycling factor. The study predicts that the function of this factor is regulated by the presence of UTP in the cell, and that local UTP starvation due to the synthesis of stems and transcription termination loops at the end of operons is involved in ribosome recycling.
Rocha, A Danchin
Base composition bias might result from competition for metabolic resources
Trends Genet (2002) 18: 291-294
|Le contenu en GC des génomes varie chez les bactéries de 25 à 75%. Nous montrons dans ce travail que le génome des bactéries qui dépendent d'un hôte pour survivre (pathogènes obligatoires ou symbiontes) tendent à devenir riches en AT. Mieux, l'analyse des bactériophages, des plasmides et des séquences d'insertion, qui peuvent être considérés comme des pathogènes intracellulaires, nous a montré qu'ils sont aussi bien plus riches en AT que leurs hôtes. Nous interprétons ce fait par le coût énergétique et la difficulté d'obtention de C et G par rapport à T/U et A, en raison de la construction des voies métaboliques. Ces conclusions s'appliquent aux virus eucaryotes comme ceux de la grippe ou du Sida||
The GC content of bacterial genomes varies from 25 to 75%, but the reason for this variation is unclear. Here, we show that genomes of bacteria that rely on their host for survival (obligatory pathogens or symbionts) tend to be AT rich. Furthermore, we have analysed bacterial phages, plasmids and insertion sequences, which might also be regarded as 'intracellular pathogens', and show that they too are significantly richer in AT than their hosts. We suggest that the higher energy cost and limited availability of C and G over T/U and A could be a basis for the understanding of these differences. The same constraints apply to eukaryotic viruses such as influenza or HIV
Our very early work of genomics in silico demonstrated for the first time that a fraction (at least one sixth) of the genes of E. coli are derived from horizontal gene transfer. It also showed that antimutator genes were likely to be propagated by HGT, suggesting that bacteria in the environment are often in a highly mutable state,ixed in a much more rigid (invariable) form when they meet a stable biotope. Another observation from this study was the clustering of HGT genes in relation with particular cell processes, suggesting that genomes are organised entities:
Guerdoux-Jamet, A Hénaut, P Nitschké, JL Risler, A Danchin
Using codon usage to predict genes origin: is the Escherichia coli outer membrane a patchwork of products from different genomes?
DNA Research (1997) 4: 257-265
That this observation is general would be demonstrated later on, with Bacillus subtilis. The importance of HGT is so well accepted nowadays that it has become common knowledge:I Moszer, EPC Rocha, A Danchin
Can we discover rules in the organisation of the genome? Our efforts have led to the identification of several rules, linked to the particularities of the building blocks of life:
1/ There is a universal bias in the composition of the genes present in the leading and lagging strands of DNA;
2/ Remarkably, the essential genes (identified experimentally after the B. subtilis sequencing project) are specifically encoded in the main DNA strand. Furthermore, the nature of DNA polymerase III plays a role in the overall organisation of the genome. Firmicutes, which have two such polymerases (DnaE and PolC), show a strong bias in gene distribution. Analysis of the genes co-evolving with these polymerases shows that the different bacterial clades have different origins. This has a consequence of considerable importance for the question of the origins of life, as it shows that there is no single ancestor, no LUCA, but a population of progenotes that merged and divided several times before giving rise to the species we know today.
||EPC Rocha, A Danchin|
|Ongoing evolution of strand composition in bacterial genomes|
|Mol Biol Evol (2001) 18: 1789-1799|
||EPC Rocha, A Danchin|
|Essentiality, not expressiveness, drives gene-strand bias in bacteria|
|Nature Genetics (2003) 34: 377-378|
||EPC Rocha, A Danchin|
|Gene essentiality determines chromosome organisation in bacteria|
|Nucleic Acids Res (2003) 31: 6570-6577|
Looking at genomes as whole entities, we have long known that there is a 10-11.5 period in the distribution of nucleotides, and this is true from prokaryotes to eukaryotes. This bias is present throughout a given genome, in both coding and non-coding sequences. Using a linear projection-based auto-correlation analysis technique, the sequences responsible for this bias were identified. These ubiquitous motifs were termed "flexible class A motifs". Each motif consists of up to ten conserved nucleotides or dinucleotides distributed in a discontinuous pattern. Each occurrence spans a region of up to 50 bp in length. There is limited fluctuation in the distances between the nucleotides comprising each occurrence of a given motif, suggesting that they are constrained by supercoiling and/or bending of the DNA. Taken together, these motifs cover up to half the genome in most prokaryotes. They generate the previously recognised 11 bp periodic bias. Based on the structure of the motifs, it has been suggested that they may define a dense network of protein interaction sites in chromosomes:
||E Larsabal, A Danchin|
|Genomes are covered with ubiquitous 11bp periodic patterns, the "class A flexible patterns"|
|BMC Bioinformatics (2005) 6: 206|
The corresponding constraints are visible in the amino-acid sequence of the proteins, suggesting that the sequence is more constrained by the genome organisation than by the protein function. These novel observations have considerable implications in terms of phylogenetic profiles when one analyses protein sequences:
||EPC Rocha, A Danchin|
|Base composition bias might result from competition for metabolic resources|
|Trends Genet (2002) 18: 291-294|
||G Pascal, C Médigue, A Danchin|
|Universal biases in protein composition of model prokaryotes|
|Proteins (2005) 60: 27-35|
This latter work characterises “orphan” proteins which form approximately 10% of any genome of a new species. These proteins are characterized by their enrichment in aromatic amino-acids. This work proposes that many among the represent the "self" of the species, by behaving as “gluons” which bring about an extra contribution is the stability of multiprotein complexes in the cell. This would bring an essential contribution to the functional stabilisation of complex intracellular structures. More generally the approach thus defined allowed the investigators to define the essentiality of a gene in a real context, by measuring its persistence in many species, not only in sequence but also in its place in the genome:
In summary, it appears that bacterial genomes are highly organised entities, contrary to a widely spread idea of a random « fluidity » of genomes. What are the selective constraints that support this organisation?
A general analysis of the conservation of syntenies in a large number of complete bacterial genomes has shown that two classes of genes tend to stay together. The way the class of persistent genes keep remaining grouped is organized in a way that is reminiscent of a scenario of the origin of life. This is why the corresponding set has been named the paleome. In the same way, genes that are rarely found in genomes make clusters that are easily horizontally transferred. The corresponding genes allow the bacteria to live in a specific niche. They are named, for this reason, the cenome (to indicate the fact that they are shared by a community living in a particular environment, and prone to be transferred):
A Danchin, A Sekowska
Physico-chemical prerequisites for the construction of a synthetic cell
in: Synthetic Chemistry, May 26th - 30th, 2008, in Bozen, Italy
Beilstein Institut for the Advancement of Chemical Sciences (2009) 1-13
In his seventeenth-century classic, Novum Organum, Francis Bacon wrote, “we cannot command nature except by obeying her” (Bacon, 2010). Although our knowledge of living systems is much improved since Bacon’s time, we are still far from understanding—or commanding—all the complex mechanisms of life. To take full advantage of living organisms for the benefit of mankind, we will need to understand those mechanisms to the furthest possible extent. To do so will require that the concept of information and the theories of information science take a more-prominent role in the understanding of living systems...
A Danchin, PM Binder, S Noria
Antifragility and tinkering in biology (and in business): Flexibility provides an efficient epigenetic way to manage risk
Genes (2011), 2: 998-1016; doi:10.3390/genes2040998
M Porcar, A Danchin, V de
Lorenzo, VA dos Santos, N Krasnogor, S Rasmussen, A Moya
The ten grand challenges of synthetic life
Systems and Synthetic Biology (2011) 5:1-9. doi: 10.1007/s11693-011-9084-5.
||CG Acevedo-Rocha, G Fang, M Schmidt, DW
Ussery, A Danchin
From essential to persistent genes: a functional approach to constructing synthetic life
Trends Genet. (2013) 29: 273-279. doi: 10.1016/j.tig.2012.11.001
||A Danchin, A Sekowska
Constraints in the design of the synthetic bacterial chassis
Methods in Microbiology (2013) 40: 39-68
||A Danchin, A Sekowska, S Noria
Chapter 5. Functional requirements in the program and
the cell chassis for next-generation synthetic biology pp.
In line with my involvement at the Centre Royaumont pour une Science de l'Homme, I organized in 1971 a weekly seminar, every wednesday's afternoon, at the Institut de Biologie Physico-Chimique, where, together with Philippe Courrège and Jean-Pierre Changeux we tried to delineate the limits of selection in biological processes. Our work explored the role of selective stabilization in learning and memory in the nervous system and in the immune system. This exploration predated the fashion for neuronal networks, but with a specific feature: synapses evolved in such a way that they could degenerate irreversibly. The outcome of the process was the carving of an image of the environment within the neuronal network.
Changeux, P Courrège, A Danchin
A theory of the epigenesis of neuronal networks by selective stabilization of synapses
Proc Natl Acad Sci U S A (1973) 70: 2974-2978
Abstract: A formalism is introduced to represent the connective organization of an evolving neuronal network and the effects of environment on this organization by stabilization or degeneration of labile synapses associated with functioning. Learning, or the acquisition of an associative property, is related to a characteristic variability of the connective organization: the interaction of the environment with the genetic program is printed as a particular pattern of such organization through neuronal functioning. An application of the theory to the development of the neuromuscular junction is proposed and the basic selective aspect of learning emphasized.
Subsequently I explored the general process of selective stabilization in the building up of cells as computers making computers. This process allows the embodiment of functional properties within material entities that will progressively be associated together as networks or organized in space, in operons, for example. This view led to emphasis on the role of information as an authentic currency of the physical world, and discovery of the key role of Landauer's principle in biology.
Bacteria as computers making computers
FEMS Microbiol Rev (2009) 33: 3-26
Understanding what life is has been the main quest of philosophers, in particular since the time of the Presocratic philosophers. In his Lives of Illustrious Men, Plutarch described the return of Theseus—whose relationship with the temple of Apollo at Delphi is well known, hence my Delphic Boat—from Crete to Athens, and the fate of his ship made by the Athenians. To keep the ship operational the Athenians kept replacing the rotting boards with new boards. Philosophers, subsequently, used this example to discuss permanence and change, some claiming that the ship was no longer the same, while others said the opposite: "The ship wherein Theseus and the youth of Athens returned had thirty oars, and was preserved by the Athenians down even to the time of Demetrius Phalereus, for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same." (translated by John Dryden).
Following the trend set by this profound question, the study of life must never be limited to the study of objects, but must study their relationships. This is why genomes can certainly not be considered as mere collections of genes. They are much more. How can we access this information? Looking at the current flow of published genomic sequences, two contrasting pictures emerge: at first glance, genes appear to be randomly distributed along the chromosome. On the other hand, their organisation into operons (or islands of pathogenicity) suggests that, at least locally, related functions are physically close. In an attempt to understand the organisation of the genome, it is therefore necessary to explore the distribution of genes along the chromosome, but by generalising the concept of neighbourhood to many other types of vicinities than the simple succession of genes in the genomic text.
The first observations of my laboratories at the Institut Pasteur (Regulation of Gene Expression and Genetics of Bacterial Genomes) were interpreted as establishing that this order was far from random, but was linked to the function of genes, in relation with the cell's architecture. These results were fragmentary, so they needed to be experimentally substantiated, combining in silico analysis of the genome (bioinformatics) of model organisms, such as Escherichia coli or Bacillus subtilis, with their study in vivo (reverse genetics and physiological biochemistry, in particular using transcription expression profiling and two-dimensions protein electrophoresis), as well as comparative studies with other genomes, with biochemical and structural analyses. If indeed the map of the cell is in the chromosome, this asks for some physical principle linking the succession of the genes - a symbolic text, carrying an information - and the cell's architecture - concrete (i.e. massive or inert) matter. We do not need to resort to the existence of a divine principle, and and this should be the consequence of a simple physical principle. The winning trio of darwinian natural selection (variation / selection / amplification) shows that evolution creates functions, that functions « capture » (recruit) structures (acquisitive evolution), so that structural analysis only becomes important when functions are understood.
The simplest way to evolve is to follow the arrow of time, to increase the overall entropy of the system. In water, this is the driving force behind the construction of many biological structures: it is the origin of the universal formation of helices, it is what allows the folding of proteins and the formation of viral capsids. And it should not escape us that the greatest increase in entropy of a molecular complex in water occurs when the surface/volume ratio is the highest: when a planar structure is formed, it orders the water molecules on both sides. Consequently, if this plane meets another, it will lose a layer of water molecules, and stick to it. The formation of flat layers should therefore be a very strong organising principle. Is it possible to know, simply by knowing the genomic text, whether a gene product will form such layers, if it simply forms hexagons, for example? This is even more unlikely than the fact that an amino acid sequence can tell us exactly how a protein folds, without knowing the pre-existing folds: pancreatic RNase would indeed fold, because selection has isolated it to do so (it is secreted in bile salts), but this would never be accepted as the paradigm for protein folding.
In silico analyses allow us to organise knowledge. To generate new knowledge, why not explore neighbourhoods of biological objects, considering genes as starting points, stressing that each object exists in relation to other objects? Inductive exploration will consist in finding all neighbors of each given gene. "Neighbour" has here the largest possible meaning. This is not simply a geometrical or structural notion. Each neighbourhood is meant to shed specific light on a gene, looking for its function as bringing together the objects of the neighborhood. A natural neighborhood is proximity on the chromosome: operons or pathogenicity islands show that genes neighbors from each other can be functionally related. Another interesting neighborhood is similarity between genes or gene products. The isoelectric point often gives a first idea of a gene product compartmentalisation. Also, a gene may have been studied by scientists in laboratories all over the world. And it can display features that refer to other genes: its neighbors will be the genes found together with it in the literature. Finally, there exists more complex neighborhoods, the study of which gives particularly revealing results: two genes may be neighbors because they use the genetic code in the same way. One can also study all genes that belong to the same neighborhood in the cloud of points describing codon usage of all the genes of the organism. I proposed this approach at the symposium for the 20th anniversary of the EMBO at the EMBL in Heidelberg in 1994, with the example of the possible role of a major enzyme, polynucleotide phosphorylase.
From the methodological standpoint this view for inductive research requires construction of neighborhoods tables (conveniently available to scientists in databases: a field of choice for bioinformatics). Finally, systematic investigation of history will identify literature neighborhoods, not only using title and abstracts, but the whole content of articles: "in biblio" analysis is an essential component of inductive reasoning. We do not possess heuristics permitting direct access to unknown functions, and apart from preliminary studies there does not exist many places where such in silico work is developed. There exists however an excellent illustration of the concept of neighbourhood, the software Entrez, created by David Lipman and colleagues at the NCBI.
All this has some flavour of a once fashionable field, Artificial Intelligence, a highly contentious but fascinating domain. This should also make clear to us that in silico analysis will never replace validation in vivo and in vitro: let us hope that propagation of erroneous assignments of functions by automatic interpretation of the genomic texts will not hinder discoveries. Knowing genome sequences is a marvelous feat, but it is the starting point, not the end.
To answer this very general question, a genetic selection and screening procedure in the model bacterium Escherichia coli was meant to isolate mutants that would orient future experiments along a rewarding track. The idea was to explore whether some signals which appear to us as redundant (i.e. look somewhat "useless" for the unprepared human mind) in macromolecular syntheses could be separated (i.e. by selecting mutants that would grow with only one active signal instead of several). The idea was that there exists some "secundary punctuation" in the expression of the genetic message allowing coupling between macromolecular syntheses and the bulk metabolism of the cell. Emphasis on this linguistic analogy came from my contribution to the reflection on the role of selective processes at the root of memory and learning. The study of the process of initiation of translation, which, in Bacteria, associates two independent signals (a metabolic signal that labels the first methionine of the nascent polypeptide with a one-carbon residue, and the structure of a special transfer RNA) led me, through experiments using genetics, to the discovery of a ubiquitous anomaly in metabolism, coupling replication, transcription, translation and cell division. The mutants affected in that process were analysed in succession. They involved transcription termination, translation initiation, the “stringent” coupling between these processes, the one-carbon metabolism, synthesis of cyclic AMP, a protein long proposed to be a bacterial histone, H-NS, and the biosynthesis pathway of branched-chain amino-acids. This apparently haphazard list, derived from the outcome of genetic experiments, accounts for the threads followed, one by one, to attempt to unravel this complicated network of interactions, finally understood in january 2006 with the role of the serine amino acid (this common amino acid is toxic in excess because of at least two processes: production of hydroxypyruvate, that makes dead-end products with thiamine, and of aminoacrylate / iminopropionate when it enters pathways such as cysteine and tryptophan biosynthesis). Mid-1980 the time was now ripe to explore this same question not through the study of individual genes, but rather to develop a global study of the genes from the knowledge of the complete genome texts. This required introduction of a large component of computer sciences, and experiments “in silico” were proposed to complement in vivo or in vitro experiments (this term was used for the first time in 1988-1989, in discussions with the European commission, meant to justify the setting up of genome projects). The question then became a simple conjecture, based on a former reflection of von Neumann about Turing machines: is there a link between the architecture of the cell and that of the genome? Work from the Unit Genetics of Bacterial Genomes showed that indeed genes are not randomly distributed in genomes. Whether this indicates a link with the architecture of the cell remains however, of course, an open question.
The involvement of cyclic AMP in the "serine effect" (wild type strains are sensitive to serine, but cya and crp mutants are more resistant) led us to a thorough study both in terms of genetics and biochemistry of adenylate cyclases. After having been the first laboratory to isolate and characterise in full the gene of an adenylate cyclase (that of Escherichia coli), the work was extended to the identification of adenylate cyclase toxins, present in the etiologic agents of whooping cough and anthrax. Having invented a multipartner cloning technique, the ancestor of the technique now known as “double hybrid” (patent EP0301954), the genes from the corresponding toxins were isolated and sequenced, the proteins analysed biochemically and the secretion process of the cyclases was characterised:
P Glaser, D
Ladant, O Sezer, F Pichot, A Ullmann, A Danchin
The calmodulin-sensitive adenylate cyclase of Bordetella pertussis: cloning and expression in Escherichia coli
Mol Microbiol (1988) 2: 19-30
P Glaser, H
Sakamoto, J Bellalou, A Ullmann, A Danchin
Secretion of cyclolysin, the calmodulin-sensitive adenylate cyclase-haemolysin bifunctional protein of Bordetella pertussis
EMBO J (1988) 7: 3997-4004
A symmetrical approach was used to clone the cDNA of mammalian calmodulins, showing that the method (double hybrid) is of wide efficiency:
As early as 1988, this work asked a series of ethical problems (recently revived under the name of « bioterrorism ») discussed in:
|Not every truth is good. The dangers of publishing knowledge about potential bioweapons|
|EMBO Rep (2002) 3: 102-104|
This led me to be appointed as a member of the Centre Consultatif National pour la Biosécurité (CNCB).
An overview of this first work on adenylate cyclases is summarised in:
This article created the international reference for the classification of adenylate cyclases. Three classes from different phylogenetic descent (convergent evolution) were first identified: Class I, cyclases from enterobacteria and related bacteria; Class II, secreted toxic cyclases; Class III, "universal" class present in Bacteria and in Eukarya (including higher vertebrates). A fourth class, also from a completely different phylogenetic origin, and perhaps involved in promiscuous activities, was discovered several years later in our research Unit:
O Sismeiro, P
Trotot, F Biville, C Vivarès, A Danchin
Aeromonas hydrophila adenylyl cyclase 2: a new class of adenylyl cyclases with thermophilic properties and sequence similarities to proteins from hyperthermophilic archaebacteria
J Bacteriol (1998) 180: 3339-3344
The "universal" cyclases class (class III) clusters together adenylate and guanylyl cyclases, and an original selection procedure allows one to go from one type of specificity to the other one (this was one of the very first experiments showing that it is possible to change the specificity of an enzyme for its substrate):
The setting up of the sequencing of the genome of Bacillus subtilis, first project of this type launched for conceptual and not technological reasons, was publicly proposed at the beginning of 1987. This resulted —in parallel with the same result obtained by the consortium sequencing the genome of Saccharomyces cerevisiae— in the first discovery of genomics, that found that many genes were completely unknown, not only in their sequence but also in their function and in the structure of their product:
P Glaser, F
Kunst, M Arnaud, M-P Coudart, W Gonzales, M-F Hullo, M Ionescu, B
Lubochinsky, L Marcelino, I Moszer, E Presecan, M Santana, E
Schneider, J Schweizer, A Vertes, G Rapoport, A Danchin
Bacillus subtilis genome project: cloning and sequencing of the 97 Kb region from 325o to 333o
Mol Microbiol (1993) 10: 371-384 [it is amusing to note that this article is listed at PubMed with a truncated authors' list: biologists were not, at the time, familiar with the long lists of authors that are frequent in physics]
This article showed that in a long DNA fragment sequenced in full, half of the genes did not look like anything known until then. This utterly unexpected result (the opponents to genome sequencing projects had « demonstrated » that we knew at least 95% of all possible gene classes and published this demonstration in the most fashionable journals), presented with a similar conclusion from the sequencing of the yeast's chromosome III, at the first genomics symposium organised by the commission of European Communities in Elounda in Crete in 1991, revealed the first major discovery obtained by genome projects.
Performed by a consortium associating Europe and Japan, the sequencing of the B. subtilis genome was completed in 1997, at the same time as that of E. coli. As early as 1995 the total length of continuous fragments from the organism was significantly larger than that of the genomes sequenced at the date. This was not much noticed however: Science has now become an activity in the domain of show business and advertisement. However this genome remained for five years the only example of its domain (the genomes of the Firmicutes were particularly difficult to sequence, because their DNA is usually toxic in the universal host then used to construct DNA libraries, E. coli, for biochemical reasons well understood by the authors of this project) :
F Kunst, N
Ogasawara, I Moszer, AM Albertini, G Alloni, V Azevedo, MG Bertero,
P Bessières, A Bolotin, S Borchert, R Borriss, L Boursier, A Brans,
M Braun, SC Brignell, S Bron, S Brouillet, CV Bruschi, B Caldwell, V
Capuano, NM Carter, SK Choi, JJ Codani, IF Connerton, NJ Cummings,
RA Daniel, F Denizot, KM Devine, A Düsterhöft, SD Ehrlich, PT
Emmerson, KD Entian, J Errington, C Fabret, E Ferrari, D Foulger, C
Fritz, M Fujita, Y Fujita, S Fuma, A Galizzi, N Galleron, SY Ghim, P
Glaser, A Goffeau, EJ Golightly, G Grandi, G Guiseppi, BJ Guy, K
Haga, J Haiech, CR Harwood, A Hénaut, H Hilbert, S Holsappel, S
Hosono, MF Hullo, M Itaya, L Jones, B Joris, D Karamata, Y Kasahara,
M Klaerr-Blanchard, C Klein, Y Kobayashi, P Koetter, G Koningstein,
S Krogh, M Kumano, K Kurita, A Lapidus, S Lardinois, J Lauber, V
Lazarevic, SM Lee, A Levine, H Liu, S Masuda, C Mauël, C Médigue, N
Medina, RP Mellado, M Mizuno, D Moesti, S Nakai, M Noback, D Noone,
M O'Reilly, K Ogawa, A Ogiwara, B Oudega, SH Park, V Parro, TM Pohl,
D Portetelle, S Porwollik, AM Prescott, E Presecan, P Pujic, B
purnelle, G Rapoport, M Rey, S Reynolds, M Rieger, C Rivolta, E
Rocha, B Roche, M Rose, Y Sadaie, T Sato, E Scalan, S Schleich, R
Schroeter, F Scoffone, J Sekiguchi, A Sekowska, SJ Seror, P Serror,
BS Shin, B Soldo, A Sorokin, E Tacconi, T Takagi, H Takahashi, K
Takemaru, M Takeuchi, A Tamakoshi, T Tanaka, P Terpstra, A Tognoni,
V Tosato, S Uchiyama, M Vandenbol, F Vannier, A Vassarotti, A Viari,
R Wambutt, E Wedler, T Weitzenegger, P Winters, A Wipat, H Yamamoto,
K Yamane, K Yasumoto, K Yata, K Yoshida, HF Yoshikawa, E Zumstein, H
Yoshikawa, A Danchin
The complete genome sequence of the gram-positive bacterium Bacillus subtilis
Nature (1997) 390: 249-256
The sequence was further updated three times:
||E Belda, A Sekowska, F Le Fèvre, A
Morgat, D Mornico, C Ouzounis, D Vallenet, C Médigue, A
An updated metabolic view of the Bacillus subtilis 168 genome
Microbiology (2013) 159: 757-770. doi: 10.1099/mic.0.064691-0
||R Borriss, A Danchin,
CR Harwood, C Médigue, EPC Rocha, A Sekowska, D Vallenet
Bacillus subtilis, the model Gram-positive bacterium: 20 years of annotation refinement
Microb Biotechnol. (2018) 11: 3-17 doi: 10.1111/1751-7915.12461
The distribution of the corresponding sequence and annotations to the international community was displayed in the form of a specialised database with no exact counterpart until now. Unfortunately this endeavour was recently suspended.
I Moszer, LM Jones, S Moreira, C Fabry, A
SubtiList: the reference database for the Bacillus subtilis genome
Nucleic Acids Res (2002) 30: 62-65
Several genome projects followed: Leptospira interrogans and Staphylococcus epidermidis, in collaboration with the Shanghai Genome Center, Photorhabdus luminescens, at the Institut Pasteur, and, to try and understand the impact of the temperature constraints on genomes, the genome of the Antarctica bacteria Pseudoalteromonas haloplanktis TAC125, in collaboration with the Genoscope and several universities in the world. Sequencing of the genome of Psychromonas ingrahamii followed as a collaboration with Monica Riley and her colleagues.
The functional organisation of the genes in genomes must result from the selection pressure of simple physico-chemical principles. Beside physical causes such as the structure of water (the study of the genome of P. haloplanktis is meant to have access to some of those), gasses and radicals, because they are highly diffusible, may play a major role in cellular compartmentalisation, and might be the cause of some of the organisation of the genes in genomes. Sulfur metabolism is particularly sensitive to gasses and radicals, and it is therefore important to understand how it is organised. A first study demonstrated that sulfur-related genes are organised into islands:
and a detailed analysis, mainly developed during the creation of the HKU-Pasteur Research Centre in Hong Kong permitted them to uncover the details of the “methionine salvage pathway”:
||A Sekowska, HF Kung, A Danchin|
|Sulfur metabolism in Escherichia coli and related bacteria: facts and fiction|
|J Mol Microbiol Biotechnol (2000) 2: 145-177|
A Sekowska, JY
Coppée, JP Le Caer, I Martin-Verstraete, A Danchin
S-adenosylmethionine decarboxylase of Bacillus subtilis is closely related to archaebacterial counterparts
Mol Microbiol (2000) 36: 1135-1147
||A Sekowska, L Mulard, S Krogh, JK Tse, A Danchin|
|MtnK, methylthioribose kinase, is a starvation-induced protein in Bacillus subtilis|
|BMC Microbiol (2001) 1: 15|
||A Sekowska, S Robin, JJ Daudin, A Hénaut, A Danchin|
|Extracting biological information from DNA arrays: an unexpected link between arginine and methionine metabolism in Bacillus subtilis|
|Genome Biol (2001) 2: RESEARCH0019|
||A Sekowska, A Danchin|
|The methionine salvage pathway in Bacillus subtilis|
|BMC Microbiol (2002) 2: 8|
The following work makes a synthesis of the catalytic activities involved in this ubiquitous cycle (it is also present in man and plants), which has the interesting feature that it systematically recruited proteins of diverse structures to lead to the completion of the cycle. One of these proteins is likely to be related to the ancestor of ribulose-phosphate carboxylase/oxygenase (RuBisCO), the most abundant enzyme on the planet (this opens fascinating questions on the origin of catalytic activities):
||A Sekowska, V Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin|
|Bacterial variations on the methionine salvage pathway|
|BMC Microbiol (2004) 4: 9|
Ashida, A Danchin, A Yokota
Was photosynthetic RuBisCO recruited by acquisitive evolution from RuBisCO-like proteins involved in sulfur metabolism?
Res Microbiol (2005) 156: 611-618
This remarkable metabolic cycle has the surprising property as shown in ourwork, under particular conditions, to lead the cell to synthesize carbon monoxide. As this cycle exists in man, this opens interesting perspective about possible controls mediated by CO, a gas different from nitric oxide, in the immune system and in the nervous system.
||A Sekowska, H Ashida, A Danchin
Revisiting the methionine salvage pathway and its paralogues
Microb Biotechnol. (2019) 12: 77-97 doi: 10.1111/1751-7915.13324
Metabolism can be seen as a pre-requisite for any scenario of the origins of life. I have explored several features of the question, based on surface metabolism, as advocated by Samuel Granick, Freeman Dyson and Günter Wächtershäuser.
Archives or palimpsests? Bacterial genomes unveil a scenario for the origin of life
Biological Theory (2007) 2: 52-61
From chemical metabolism to life: the origin of the genetic coding process
Beilstein J Org Chem. (2017) 13: 1119-1135 doi:10.3762/bjoc.13.111
Multiple clocks in the evolution of living organisms pp. 101-118
In: Molecular Mechanisms of Microbial Evolution (2018)
edited by Pabulo H. Rampelotto, Springer
This text, which strictly follows the rules we discussed in an article published in 2021, does not seem to be available, which justifies making it available here.
Murray Gell-Mann [© Complexus 1 (5) 1995-1996 (Out of