Natura
nusquam magis est tota quam in minimis
Plinus |
Genome Studies as a Change of Paradigm in Biology
•
This text written in 2003 is now regularly updated
![]() |
Training and technology
transfer
The philosophy underlying genome studies is gaining momentum. It is interesting to help in its development in the Hong Kong region. This requires the setting up of integrated researches associating predictive work using computers (in silico experiments) with more usual experiments at the bench (in vivo and in vitro). Integration of in vivo, in vitro and in silico approaches requires team work, and permanent conceptual technology transfer from various disciplines and from various environments. An important point is to make it properly understood that technology transfer always require explicit acknowledgement of the sources of the technology, be they conceptual or technical so that the background is well understood (in addition, of course to the standards of scientific ethics). Work in genomics needs therefore permanent training of the personnel of the Centre with reference to work at other places, in particular at the Institut Pasteur in Paris. This means that we both have to be trained at the appropriate places in the world (including in-house), and to help training others. Several Faculties at the University of Hong Kong (in particular in highly conceptual departments, such as the Department of Mathematics) contribute to the overall efficacy of the training. Since the end of 2001, a working seminar has been initiated at the Department of Mathematics (Pr Ngaiming Mok) where students and scientists from the Hong Kong region mix up around Antoine Danchin on a reflection about Conceptual Biology (Meng Wah Building, Wednesday, 2 30 pm). An account of each meeting is sent to the participants as well as to all those belonging to the Stanislas Noria network (Causeries du jeudi) which will resume its current activities in Paris in April 2003.
In terms of experimental biology, the "Jacques Monod" practical courses organized each year at the Institut Pasteur in Paris are the model we are aiming to implement in the future. The contribution from the Institut Pasteur to the University of Hong Kong will draw on the resources maintained at the Institut Pasteur, when scientists will come for several weeks to organize lectures and practical courses at the Centre.
Once the Centre activity will be firmly set up we shall build up a training module which will use, as much as possible the possibilities raised by the very existence of the World-Wide Web, in particular providing E-education facilities.
(2) ...... to create a culture of basic research allowing scientists to monitor emerging diseases surveillance, prevention and cure.... | ![]() |
(1) .... to create knowledge- and
education-related information resources based on laboratory
experiments and "in silico" analysis .....
|
(3) ..... to offer new areas for technology development used in large-scale industrial applications, fostering future research....... |
Science evolves by an intricate association between creation of concepts and techniques (see A Western Imbroglio) and a constant dialog back and forth between discoveries and applications. The future of genomics is therefore impossible to separate from the future of the associated techniques, among wich development of computer sciences and the mathematics of integers will play a key role:
Discoveries | ![]() |
Theorems | ![]() |
Predictions | |||||||||||||||||||
![]() |
![]() |
||||||||||||||||||||||
Techniques | ![]() |
Applications | |||||||||||||||||||||
Given | Facts | ||||||||||||||||||||||
Hypotheses (explicit) | |||||||||||||||||||||||
Approach |
Hypothesis-driven |
||||||||||||||||||||||
Heuristics (implicit) | |||||||||||||||||||||||
Data-driven | |||||||||||||||||||||||
Constructed | Deduction | Context-driven |
|||||||||||||||||||||
Induction Abduction
|
The normal way by which Science proceeds is the hypothetico-deductive method, which uses a model of Reality to confront it with actual performances of experiments. While it is efficient to set the stage and produce a strong theoretical background for the progress of science, used alone it cannot lead to discovery. Discovery cannot be planned. Discovery-driven research has therefore to combine this standard (Greco-Latin) way with the more Anglo-American Data-driven and the Chinese Context-driven approaches.
A metaphor: the Delphic boat
Amongst the questions asked by the Oracle of Delphi,
Pythia, was a fundamental question directly related to the nature of the
artefacts produced and used by living organisms – an enigma, as the
Oracle’s questions always were. If we consider a boat made of planks,
carefully fitted together, we may well ask, what is it that makes the boat
a boat? This question is more than just a mind game, as is clear from the
fact that as time passes, some of the planks begin to rot and have to be
replaced. There comes a time when not one of the original planks is left.
The boat still looks like the original one, but in material
terms it has changed. Is it still the same boat? The owner would certainly
say yes, this is my boat. Yet none of the material it was originally built
from is still there. If we were to analyse the components of the boat, the
planks, we would not learn very much. We can see this if we take the boat
to pieces: it is reduced to a pile of planks – but they are not the same
ones as at the beginning! The physical nature of these objects plays some
role of course – a boat made from planks of oak is different from a boat
made from planks of pine – but this is fairly incidental. (It is very
important to remember this when we think about the possibility of life
existing elsewhere in the universe – there is absolutely no reason why it
should be made of the same molecules as life on Earth.) What is important
about the material of the planks, apart from their relative stability over
time, is the fact that it allows them to be shaped, so that they relate to
each other in a certain way. The boat is not the material it is made from,
but something else, much more interesting, which organizes the material of
the planks: the boat is the relationship between the planks. Similarly,
the study of life should never be restricted to objects, but must look
into their relationships. This is why a genome cannot be regarded as
simply a collection of genes. It is much more than that.
Studying relationships is essentially what Georges
Cuvier was doing – and what paleontologists still do – when he took
a few bones of an long-extinct animal, or even sometimes a single tooth,
and proposed a reconstruction of the entire creature. This importance of
relationships is not a trivial property, to be noted in passing, but a
hard fact with considerable practical and theoretical implications, and we
will come back to this at length when we look at theories of biological
information, in the next chapter. The fundamental importance of
relationships, which represent a particular interpretation of form, was
noticed more than 2,500 years ago by Empedocles and many of the pre-Socratic
philosophers. St Thomas Aquinas also refers to it when he analyses
the philosophical status of the concept of creation: “when motion is
taken away, only different relations remain.”
A renewed future for Darwinism
When Darwin wrote The Origin of Species the concept of gene did not exist yet. The idea that evolution of species occured by progressive transformation had been developed by Lamarck, but this was within the pre-atomist paradigm of the four elements (Fire, Air, Water and Earth). Darwin reinvented the selective theory proposed by Empedocles, adding to the composition of variation and selection, that of the biological power of amplification through the multiplication of individuals, as then recently developed by Malthus.
The empedoclean, maupertuisian, darwinian trio
Variation / Selection / Amplification
Evolution
Function
Structure
Sequence
states that material systems evolve, creating functions, which, to be implemented, capture (or recruits) existing structures (hence the "tinkering" aspect of life development). Molecular genetics, then genomics added to this general driving-pattern, the algorithmic nature of DNA sequences. A consequence of this is that the structure does not tell the function, in general. Therefore, to understand what life is, using the genome texts, we must include biological knowledge (including the life style of the organisms of interest) to our knowledge of genomes.
A new paradigm: genetics of
genomes
Until the first genome sequences were deciphered, life was studied as bits and pieces: organisms, organs, cells, genes, transcripts, proteins, metabolites... This analytic attitude, which was often named "reductionist", was similar to that of the clockmaker disassembling a clock: the heap of pieces does not make the clock work (nor allows it to be understood). The first analysis of the genome text has shown that the order of genes in genomes is not random (see these references for first examples - A, B - of non-random distribution of sequences in bacterial DNA). It is therefore no longer possible to study simply individual genes or proteins if one wishes to understand the processes of life. We need to use large-scale techniques, where one monitors simultaneously the fate of many cells, genes, transcripts, proteins or metabolites. Subsequently, it become necessary to integrate these data into a consistent picture which leads to an explanation of what we witness. This is the goal of genomics (the widely used "post-genomics" is a useless oxymore, meaning in fact "post-sequencing").
Since genomes, not genes, become the objects of interest, we have to study them as a whole, and to compare genomes with other genomes, not simply genes with genes or proteins with proteins. We now have the catalog of basic metabolites (a few hundred are needed to make a cell work: this is not much more than Mendeleiev's table of atoms), and of basic functions. However, one of the main prediction of the genetics of genomes is that it is algorithmic, therefore able to create new products (metabolic, genes, functions...) as time elapses. Life is open ended, not self-limiting.
Genomics integrates the study of the organisms in their life condition:
in vivo
it requires to discover all the components they are made of, and must study their structure and dynamics:
in vitro
finally it must now use experiments with computers to study the genome as a cyphered text written in an unknown language:
in silico
This starts a cycle where biological knowledge is integrated with the genome text analysis, permitting the scientist to make predictions, which must be tested in vivo by reverse genetics (where altered genes are made to replace their normal parent in situ) and in vitro by characterizing the gene products and their interactions.
Genome analysis will yield major insights into the
chemical definition of the nucleic acids and proteins involved in the
construction of a living organism. Further insight comes from the
chemical definition of the small molecules that are the building blocks
of organisms, through the generation of intermediary metabolism. And,
because life requires also in its definition the processes of metabolism
and compartmentalisation, it is important to relate intermediary
metabolism to genome structure, function and evolution. This requires
elaboration of systems for constructing actual metabolic pathways and
when possible, dynamic modelling of metabolism, and to correlate
pathways to genes and gene expression. In fact, most of the
corresponding work cannot be of much use because the data on which it
rests have been collected from extremely heterogeneous sources, and most
often are obtained by in vitro studies. The initial need is to collect,
organize and actualise the existing data. In order to be effective and
lasting, collecting the data should proceed through the creation of specialized
databases. To manage the flood of data issued from the programs
aiming at sequencing whole genomes specialized databases have been
developed. They make it possible not only to bypass a meticulous and
time-consuming literature searches, but also to organize data into
self-consistent patterns through the use of appropriate procedures which
are aiming at the illustration of collective properties of genes or
sequences. In addition to sequence databases, it has then become
important to create databases where the knowledge progressively acquired
on intermediary metabolism could be symbolised, organized and made
available for interrogation according to multiple criteria.
Using organized data, it will become possible to make in
silico assays of plausible pathways or regulation before being in
a position to make the actual test in vivo. Such well-constructed
databases will also permit investigation of properties of life which go
back to the origin of life, placing biology in a situation which is not
unlike that of cosmology.
Fortunately, in silico analysis permits one to organise knowledge. To generate new knowledge, why not explore neighborhoods of biological objects, considering genes as starting points, stressing that each object exists in relation with other objects? Inductive and abductive exploration will consist in finding all neighbors of each given gene. "Neighbor" has here the largest possible meaning. This is not simply a geometrical or structural notion. Each neighborhood sheds specific light on a gene, looking for its function as bringing together the objects of the neighborhood. A natural neighborhood is proximity on the chromosome. Another interesting neighborhood is similarity between genes or gene products. Genes can have similar codon usage bias: this is another neighborhood, as is the similarity in molecular mass or isoelectric point. Also, a gene may have been studied by scientists in laboratories all over the world and it can display features that refer to other genes: its neighbors will be the genes found together with it in the litterature (genomics "in libro", which will require construction of new software apt to extract information from articles automatically). We do not possess heuristics permitting direct access to unknown functions, and this should make clear to us that in silico analysis will never replace validation in vivo and in vitro: let us hope that propagation of erroneous assignments of functions by automatic interpretation of the genome texts will not hinder discoveries. Knowing genome sequences is a marvelous feat, but it is the starting point, not the end.
This global view of genomes means that it becomes impossible to study genes in isolation. Once genomes are known it is necessary to understand the collective behavior of gene products. This means expression-profiling experiments both at the level of transcripts and at the level of protein products. In parallel, the dual gene system is the metabolite system: one must monitor the fate of all metabolites in the cell, as a function of the genetic and physico-chemical environment of the cell. This corresponds to the development of "functional genomics" techniques: all require highly-evolved mathematical and statistical approaches.