Vision en français
Natura nusquam magis est tota quam in minimis


Genome Studies as a Change of Paradigm in Biology

This text written in 2003 is now regularly updated

Training and technology transfer

The philosophy underlying genome studies is gaining momentum. It is interesting to help in its development in the Hong Kong region. This requires the setting up of integrated researches associating predictive work using computers (in silico experiments) with more usual experiments at the bench (in vivo and in vitro). Integration of in vivo, in vitro and in silico approaches requires team work, and permanent conceptual technology transfer from various disciplines and from various environments. An important point is to make it properly understood that technology transfer always require explicit acknowledgement of the sources of the technology, be they conceptual or technical so that the background is well understood (in addition, of course to the standards of scientific ethics). Work in genomics needs therefore permanent training of the personnel of the Centre with reference to work at other places, in particular at the Institut Pasteur in Paris. This means that we both have to be trained at the appropriate places in the world (including in-house), and to help training others. Several Faculties at the University of Hong Kong (in particular in highly conceptual departments, such as the Department of Mathematics) contribute to the overall efficacy of the training. Since the end of 2001, a working seminar has been initiated at the Department of Mathematics (Pr Ngaiming Mok) where students and scientists from the Hong Kong region mix up around Antoine Danchin on a reflection about Conceptual Biology (Meng Wah Building, Wednesday, 2 30 pm). An account of each meeting is sent to the participants as well as to all those belonging to the Stanislas Noria network (Causeries du jeudi) which will resume its current activities in Paris in April 2003.

In terms of experimental biology, the "Jacques Monod" practical courses organized each year at the Institut Pasteur in Paris are the model we are aiming to implement in the future. The contribution from the Institut Pasteur to the University of Hong Kong will draw on the resources maintained at the Institut Pasteur, when scientists will come for several weeks to organize lectures and practical courses at the Centre.

Once the Centre activity will be firmly set up we shall build up a training module which will use, as much as possible the possibilities raised by the very existence of the World-Wide Web, in particular providing E-education facilities.

(2) ...... to create a culture of basic research allowing scientists to monitor emerging diseases surveillance, prevention and cure....
(1) .... to create knowledge- and education-related information resources based on laboratory experiments and "in silico" analysis .....

(3) ..... to offer new areas for technology development used in large-scale industrial applications, fostering future research.......

Methods and Techniques

Science evolves by an intricate association between creation of concepts and techniques (see A Western Imbroglio) and a constant dialog back and forth between discoveries and applications. The future of genomics is therefore impossible to separate from the future of the associated techniques, among wich development of computer sciences and the mathematics of integers will play a key role:

Discoveries Theorems






Hypotheses (explicit)



Heuristics (implicit)






The normal way by which Science proceeds is the hypothetico-deductive method, which uses a model of Reality to confront it with actual performances of experiments. While it is efficient to set the stage and produce a strong theoretical background for the progress of science, used alone it cannot lead to discovery. Discovery cannot be planned. Discovery-driven research has therefore to combine this standard (Greco-Latin) way with the more Anglo-American Data-driven and the Chinese Context-driven approaches.

A metaphor: the Delphic boat

Amongst the questions asked by the Oracle of Delphi, Pythia, was a fundamental question directly related to the nature of the artefacts produced and used by living organisms – an enigma, as the Oracle’s questions always were. If we consider a boat made of planks, carefully fitted together, we may well ask, what is it that makes the boat a boat? This question is more than just a mind game, as is clear from the fact that as time passes, some of the planks begin to rot and have to be replaced. There comes a time when not one of the original planks is left.

The boat still looks like the original one, but in material terms it has changed. Is it still the same boat? The owner would certainly say yes, this is my boat. Yet none of the material it was originally built from is still there. If we were to analyse the components of the boat, the planks, we would not learn very much. We can see this if we take the boat to pieces: it is reduced to a pile of planks – but they are not the same ones as at the beginning! The physical nature of these objects plays some role of course – a boat made from planks of oak is different from a boat made from planks of pine – but this is fairly incidental. (It is very important to remember this when we think about the possibility of life existing elsewhere in the universe – there is absolutely no reason why it should be made of the same molecules as life on Earth.) What is important about the material of the planks, apart from their relative stability over time, is the fact that it allows them to be shaped, so that they relate to each other in a certain way. The boat is not the material it is made from, but something else, much more interesting, which organizes the material of the planks: the boat is the relationship between the planks. Similarly, the study of life should never be restricted to objects, but must look into their relationships. This is why a genome cannot be regarded as simply a collection of genes. It is much more than that.

Studying relationships is essentially what Georges Cuvier was doing – and what paleontologists still do – when he took a few bones of an long-extinct animal, or even sometimes a single tooth, and proposed a reconstruction of the entire creature. This importance of relationships is not a trivial property, to be noted in passing, but a hard fact with considerable practical and theoretical implications, and we will come back to this at length when we look at theories of biological information, in the next chapter. The fundamental importance of relationships, which represent a particular interpretation of form, was noticed more than 2,500 years ago by Empedocles and many of the pre-Socratic philosophers. St Thomas Aquinas also refers to it when he analyses the philosophical status of the concept of creation: “when motion is taken away, only different relations remain.”

A renewed future for Darwinism

When Darwin wrote The Origin of Species the concept of gene did not exist yet. The idea that evolution of species occured by progressive transformation had been developed by Lamarck, but this was within the pre-atomist paradigm of the four elements (Fire, Air, Water and Earth). Darwin reinvented the selective theory proposed by Empedocles, adding to the composition of variation and selection, that of the biological power of amplification through the multiplication of individuals, as then recently developed by Malthus.

The empedoclean, maupertuisian, darwinian trio

Variation / Selection / Amplification





states that material systems evolve, creating functions, which, to be implemented, capture (or recruits) existing structures (hence the "tinkering" aspect of life development). Molecular genetics, then genomics added to this general driving-pattern, the algorithmic nature of DNA sequences. A consequence of this is that the structure does not tell the function, in general. Therefore, to understand what life is, using the genome texts, we must include biological knowledge (including the life style of the organisms of interest) to our knowledge of genomes.

A new paradigm: genetics of genomes

Until the first genome sequences were deciphered, life was studied as bits and pieces: organisms, organs, cells, genes, transcripts, proteins, metabolites... This analytic attitude, which was often named "reductionist", was similar to that of the clockmaker disassembling a clock: the heap of pieces does not make the clock work (nor allows it to be understood). The first analysis of the genome text has shown that the order of genes in genomes is not random (see these references for first examples - A, B - of non-random distribution of sequences in bacterial DNA). It is therefore no longer possible to study simply individual genes or proteins if one wishes to understand the processes of life. We need to use large-scale techniques, where one monitors simultaneously the fate of many cells, genes, transcripts, proteins or metabolites. Subsequently, it become necessary to integrate these data into a consistent picture which leads to an explanation of what we witness. This is the goal of genomics (the widely used "post-genomics" is a useless oxymore, meaning in fact "post-sequencing").

Since genomes, not genes, become the objects of interest, we have to study them as a whole, and to compare genomes with other genomes, not simply genes with genes or proteins with proteins. We now have the catalog of basic metabolites (a few hundred are needed to make a cell work: this is not much more than Mendeleiev's table of atoms), and of basic functions. However, one of the main prediction of the genetics of genomes is that it is algorithmic, therefore able to create new products (metabolic, genes, functions...) as time elapses. Life is open ended, not self-limiting.

Genomics integrates the study of the organisms in their life condition:

in vivo

it requires to discover all the components they are made of, and must study their structure and dynamics:

in vitro

finally it must now use experiments with computers to study the genome as a cyphered text written in an unknown language:

in silico

This starts a cycle where biological knowledge is integrated with the genome text analysis, permitting the scientist to make predictions, which must be tested in vivo by reverse genetics (where altered genes are made to replace their normal parent in situ) and in vitro by characterizing the gene products and their interactions.

Genome analysis will yield major insights into the chemical definition of the nucleic acids and proteins involved in the construction of a living organism. Further insight comes from the chemical definition of the small molecules that are the building blocks of organisms, through the generation of intermediary metabolism. And, because life requires also in its definition the processes of metabolism and compartmentalisation, it is important to relate intermediary metabolism to genome structure, function and evolution. This requires elaboration of systems for constructing actual metabolic pathways and when possible, dynamic modelling of metabolism, and to correlate pathways to genes and gene expression. In fact, most of the corresponding work cannot be of much use because the data on which it rests have been collected from extremely heterogeneous sources, and most often are obtained by in vitro studies. The initial need is to collect, organize and actualise the existing data. In order to be effective and lasting, collecting the data should proceed through the creation of specialized databases. To manage the flood of data issued from the programs aiming at sequencing whole genomes specialized databases have been developed. They make it possible not only to bypass a meticulous and time-consuming literature searches, but also to organize data into self-consistent patterns through the use of appropriate procedures which are aiming at the illustration of collective properties of genes or sequences. In addition to sequence databases, it has then become important to create databases where the knowledge progressively acquired on intermediary metabolism could be symbolised, organized and made available for interrogation according to multiple criteria.

Using organized data, it will become possible to make in silico assays of plausible pathways or regulation before being in a position to make the actual test in vivo. Such well-constructed databases will also permit investigation of properties of life which go back to the origin of life, placing biology in a situation which is not unlike that of cosmology.

Fortunately, in silico analysis permits one to organise knowledge. To generate new knowledge, why not explore neighborhoods of biological objects, considering genes as starting points, stressing that each object exists in relation with other objects? Inductive and abductive exploration will consist in finding all neighbors of each given gene. "Neighbor" has here the largest possible meaning. This is not simply a geometrical or structural notion. Each neighborhood sheds specific light on a gene, looking for its function as bringing together the objects of the neighborhood. A natural neighborhood is proximity on the chromosome. Another interesting neighborhood is similarity between genes or gene products. Genes can have similar codon usage bias: this is another neighborhood, as is the similarity in molecular mass or isoelectric point. Also, a gene may have been studied by scientists in laboratories all over the world and it can display features that refer to other genes: its neighbors will be the genes found together with it in the litterature (genomics "in libro", which will require construction of new software apt to extract information from articles automatically). We do not possess heuristics permitting direct access to unknown functions, and this should make clear to us that in silico analysis will never replace validation in vivo and in vitro: let us hope that propagation of erroneous assignments of functions by automatic interpretation of the genome texts will not hinder discoveries. Knowing genome sequences is a marvelous feat, but it is the starting point, not the end.

This global view of genomes means that it becomes impossible to study genes in isolation. Once genomes are known it is necessary to understand the collective behavior of gene products. This means expression-profiling experiments both at the level of transcripts and at the level of protein products. In parallel, the dual gene system is the metabolite system: one must monitor the fate of all metabolites in the cell, as a function of the genetic and physico-chemical environment of the cell. This corresponds to the development of "functional genomics" techniques: all require highly-evolved mathematical and statistical approaches.





Root site