'Αρχαἱ οὖν ἰατρικῆς τρεῖς ἡ μὲν εὑρσεως, ἡ δὲ εκ τοῦ συστησασθαι τὴν τεχνην, ἡ δὲ ὑφηγησεως.


The principles of medicine are therefore three: that of the discovery, that of the formation of the technique and that of the explanation.

Introduction to The Physician by GALEN

Other years
This page follows a page developed first in Hong Kong at the HKU-Pasteur Rearch Centre then at the Unit Genetics of Bacterial Genomes in Paris. It provides information, some of which is original (this site and this page are free to use but, as with programs in open access Copyleft-protected to guarantee this freedom), but also links that may help you to trace back other relevant information and insight in the topics you are interested in. Understanding biology requires to be able to write or speak about biological facts and concepts. Some reading may be useful. Links to the World-Wide Web are provided to help finding out relevant information. In addition, we refer to our own publications meant to be used as media for communication both of basic and highly specialized knowledge. A page is devoted to genomics, but broader information can be found in The Delphic Boat (2003, Harvard University Press, Cambridge, USA) and in popularisation articles which are cited as needed.

Questions about genes, genetics and genomes

Last update: 5 july 2012

Note that this page is sometimes modified, following reactions of the public. You can be informed of changes by using facilities at this link.

Show all answers | Questions only

Click on the blue pin to get the answer. For internal links show all answers first
blue What is a gene?
A gene is a piece of a long molecule, DNA (standing for DeoxyriboNucleic Acid), which is made of the chaining together of four similar chemicals named nucleotides (or abbreviated into the common but somewhat misleading expression "bases"). Four types of nucleotides, noted A, T, G and C form the DNA molecule. The DNA molecule is made of two strands twisted around each other as in an helical staircase:


The chain makes a long thread, which can be written as a text, for example: TAATTGCCGCTTAAAACTTCTTGACGGCAA etc.
This is because the building blocks (which can be referred to as "letters") are of four, and only four, types.

With this definition of DNA, one can propose an abstract (formal) definition of a gene:
A gene is a piece of DNA with identified borders, which, when altered (that is, when the text is changed, either by transposition, inversion, substitution, insertion of deletion of one or more letters) will change either immediately or in the future the general properties of the organism which has the new (mutated) gene.
For example, genes define the colour of the eyes, skin, hair etc.
This definition is that used in formal genetics. In practice, scientists tend to restrict the use of the term "gene" to a part of the DNA text, for the following reasons. Changes in the DNA sequence do not always result in visible modifications of the organism as compared to the "normal" type. In some cases nothing is visible, and this is the basis of the conjecture, proposed by some, that certain modifications are "neutral" i.e. have no effect whatsover. In fact, there is polymorphism in the DNA sequence (i.e. relatively small variations — on average one letter in one thousand in the human genes). This polymorphism is not without consequences however, but the consequences are often only visible after many generations, during the evolution of the species. A gene is therefore usually thought to be a collection of related DNA sequences, alleles, which usually can perform equivalently in the organisms of the same species. This becomes therefore an abstract notion, including not only the population of existing genes, but also those which would control the same aspect or behaviour of the organism.
Furthermore, DNA does not behave equally in its role in the organism. Part of the sequence in fact specify agents, named RNAs (another type of nucleotide polymer, similar, but not identical to DNA) or proteins (a chain of motifs of 20 different types of building blocks, named amino-acids), which perform all the tasks the cell has to deal with (constructing its architecture, managing its nutrients, controlling its behaviour...). There is a general tendency to consider only these sequences as genes. They are those named "genes" by journalists. However, the DNA text may have many roles which do not fit with this restricted definition. In particular, some regions may be important for controlling the synthesis of this or that product, while other regions would work as spacers (for the folding of the DNA into the nucleus of the cell) or timers (for the correct time control of syntheses). Therefore, one more appropriately speaks of CoDing Sequences (CDSs) for those regions of DNA which code for proteins. It is certainly misleading to restrict genes to CDSs, and this explains quite a number of quarrels in which philosophers interested in biology are easily caught. We shall nevertheless indulge in this simplification, which has the merit to provide a clearcut, usually unambiguous definition. The actual working metaphor which is more appropriate is to see the DNA text as a computer program, where the CDSs perform as routines. Of course, it is well known that a program consists of many more elements: control section, comments, etc... Deleting a control region, for example, may prevent the action of a routine and the behavior of the program would be identical to that of a program lacking the routine. The typical number of genes (in the restricted sense of CDSs, which is the sense used in DataBases) in the usual microbes of our environment (bacteria) is between 2,000 and 5,000. The number of human genes is unknown, and there is much controversy at the present time about this number. The majority of scientists favour a number around 30,000, counting from the number of genes identified in sequenced chromosomes. In contrast, if one assumes that genes having more or less the same function differ from each other in each cell type (there are more than 200 cell types in the human body) then the actual number may be higher. It will still take some time to be sure about a right figure. In any case, of course, it is not the number of genes that makes the complexity of an organism, but the relationships they share.
blue What is a CDS?
One speaks of CoDing Sequences (CDSs) for those regions of DNA which code for proteins. It is certainly misleading to restrict genes to CDSs, and this explains quite a number of quarrels in which philosophers interested in biology are easily caught. We shall nevertheless indulge in this simplification, which has the merit to provide a clearcut, usually unambiguous definition. The actual working metaphor which is more appropriate is to see the DNA text as a computer program, where the CDSs perform as routines.
blue What is a genome?
The standard definition of a genome is that it is the whole collection of the genes of an organism. However, a genome is much more. It is not only the set of all genes present in the nucleus of the cells of the organism, but also the way these genes are organised next to each other. If you consider a recipe for your favourite dish, you can easily understand that having only the list of ingredients would not be enough to allow you to prepare the dish! A genome is the whole text of the DNA present in every cell of a living organism. It therefore comprises regions corresponding to CDSs, as well as all the regions in between ("intergenic regions").
The genome organisation is so important that in fact one can often tell something about the order of the genes in the genome, genes with related functions being often in some sort of proximity.
blue When is a genome completely sequenced? When, on june 26th, 2000, the Human Genome Project (HGP) representatives together with the leaders of the company Celera announced that the project was completed, they stated that they have achieved to produce a draft of the human genome sequence. What does this means? Is the genome completely sequenced? Will it really be completely sequenced? In fact, not at all. And this explains much of the questions asked by many about the real reasons underlying this annoucement. A draft is simply a random coverage of the genome sequence, determining the local sequence of pieces of the genome, on the basis of a 4- to 5-fold redundancy (on average the same base has been sequenced in 4 to 5 different, more or less overlapping fragments). The declaration referred to draft sequence data mostly in the form of 10,000 base pair fragments, which have been approximately located on each chromosome of the human genome. This is very far from having completed the Human Genome Sequence.
A genome is said to have been completed when high-quality sequences, overlapping most of the genome, are obtained. It is here very important to differentiate between the bacterial genome programmes, where a complete genome sequence is often announced only when the genome sequence is really completed, and all other programs, where scientists announce completion well before the whole sequence is known! As a matter of fact, when in december 1999, the 56 Mb of the sequence of human chromosome 22 was declared completed, only 33.5 Mb had been really sequenced, in several pieces separated by gaps.

2022 The complete sequencing of the human genome is now complete, twenty years after the announcement of its completion.

In the same way, when in april 2000, it was announced that the complete sequence of the Drosophila melanogaster genome was known, only 120Mb had been sequenced, in discontinuous pieces, out of more than 160 Mb for the whole genome. The same is true for the Caenorhabditis elegans genome, and even for the Saccharomyces cerevisiae genome sequence, which lacks the highly repetitive region containing about 150 to 200 genes coding for ribosomal RNA.
In order to obtain such a "complete" genome sequence, one has to start from the draft, close gaps, reduce ambiguities and order sequences with respect to each other. It is generally admitted today that a complete sequence does not tolerate more than one error in 10,000 bases. At this step one has usually, with the standard random sequencing approaches (named "shotgun" approaches), a coverage between 10 and 15 per base. But these sequences lack many regions where the DNA either cannot be cloned, or is so repetitive that it is impossible to assemble the sequenced fragments with respect to each other. In animal genomes these regions include the central part of the chromosome (the centromeres) and their ends (the telomeres). There are also, distributed along the chromosome, other highly repetitive regions, usually visible in light microscopy as differently coloured (heterochromatic) when compared with the regions comprizing most of the genes. One cannot, however, exclude the presence of some genes (CDSs) in these regions. It is therefore clear that there will be much discussion about the time when the Human Genome sequence will be completed!
For updated reflections on the completion of the genome sequence, see here. Note that second generation sequencing machines changed entirely the way genomes are sequenced, with frequent coverage passing 100 x.
blue What is a genetic disease?
Diseases, in the usual sense, are caused by foreign agents: chemicals (poisons), microbes or viruses. Some diseases however appear spontaneously, without an external causing agent. In a way, the simple fact of aging is such a disease (although it is perceived as a normal process). Unlucky persons sometimes see their health suddenly deteriorate, for example because their muscle progressively appear to wear out, until a time when they no longer can walk. Others become suddenly deaf or blind (or can even be deaf or blind at their birth). This can appear spontaneously in a sporadic way, but often one observes that other persons in the same family were affected at some point of their life with the same ailments. This is an indication (but not a proof!) that the disease is transmitted in an inherited way.
Because we inherit our genes from each of our parents we normally have two copies of each gene (except for the genes which are found on the sex chromosomes, X and Y). In most cases (not always) one functional copy is sufficient to make our body behave normally. However, it may happen that the two copies have been affected by some mutation, so that the function of the gene can no longer operate in its normal way. In some other cases (fortunately rarer) a mutated copy of a gene can supersede the action of its normal counterpart. In such situations, we are facing a genetic disease. Because the potential disease can be hidden by the normal copy throughout many generations, genetic diseases may jump over generations and appear after one had thought they were no longer present. In fact, the situation is even more surprising: because we have so many genes (more than 5,000 genes leading to genetic diseases have already been identified or suspected in Man), each of us is the carrier of several putative genetic diseases! But our progeny is generally unaffected, because the probability to find the same gene mutated in our spouse is usually very low.
This is not so however if we marry close cousins. This is also not so in small populations which intermarry often. And indeed, this is where one finds most frequently the presence of genetic diseases. This is also what explains the observation that some genetic diseases are more frequent in Asians or Caucasians for example. As an interesting consequence, interethnic marriages are usually much less prone to genetic diseases than more inbred marriages... This observation may account for many marriage rules in Human societies where incest is generally prohibited, and there are strong rules which tend to equilibrate endogamy (marriage inside the community) and exogamy (marriage outside the community).
blue What is pharmacogenomics?
The definition of what is a gene has told us that it is an abstract concept comprising a collection of related sequences, named alleles. This means that there is variation from individual to individual of the same species, corresponding to some polymorphism in the gene sequence. Most often this polymorphism has no known or visible effect. However, it may correspond to a subtle difference in the behaviour of a gene product depending on the environment. This may result for example in different persons having different behaviour towards nutrients. In general this is not a problem, because everybody tends to adapt one's environment (food in particular) to one's needs. There is a case however when this cannot be controlled easily. It is when one has to take drugs. In fact the fate of drugs differs somewhat from person to person (this is why, on the label of drugs, there are instructions for use, and words of caution describing contra-indications). Sometimes this yield very dangerous situations: the drug may be highly toxic to some person, while it is quite innocuous for others (this is the situation when one is allergic to a drug, for instance). Because this is due to the presence of some gene allele in a person, it would be useful to be able to know beforehand how everyone would react to such or such drug. The main purpose of pharmacogenomics is precisely to link gene polymorphism with drug use, so that one could chose the proper treatment for a person, minimizing or counteracting the possible adverse effects.
blue What is biotechnology?
Biotechnology is a hybrid concept created to summarize progresses in technologies that use living organisms or biological systems as the basis of any type of technological process (usually in agro-food industry or in medicine, but also in other kinds of technologies). Men have been using biotechnology since the discovery of fire, but, as a reasoned technology this probably began with the neolithic age, some ten thousand years ago. At that time Man invented domestication of cattle, planned growth of herbaceous plants and of trees and began to conscienciouly produce fermented food (dairy products, beer and wine), as well a plant derived textiles, paper and wood processed from planted trees.
Most familiar food items are derived from biotechnological processes: this is the case of cheese and yoghourt, of dry sausage, of fermented cabbage (sauerkraut, and fermented cabbage found in northern China), of bread, of sauces and vinegar, and of many types of beverages, beer and wine in particular. All of these products are produced with the help of cultured microorganisms: bacteria and fungi (yeasts for beer and wine and molds for cheese). However, the term biotechnology has been taken by many, journalists in particular, to be restricted to mean the use of genetic engineering and associated techniques, in a variety of applications from medicine to agriculture. With this connotation people usually forget that biotechnology is one of the oldest technology used by Man.
In the process of fermentation, single cell organisms, such as yeasts, molds or bacteria, grow in media enriched with sugars, starches or wastes derived from plants (such are corn steep). During this process they produce alcohol, lactic acid, acetic acid, carbon dioxide as well as a large variety of other by-products specific to the species and growth media used. The foam and alcohol in beer are the result of this process, as are the holes in bread and some cheeses or the acidity of yoghourt. Specific fermentation processes
blue What is genetic engineering?
Genetic engineering is the process by which the normal genome of an organism is modified by inactivation or alteration of some of its own genes as well as introduction of other natural or artificial genes. A well-known application of genetic engineering is when natural components of the body are produced in aforeing host in sufficient quantities to use therapeutically after appropriate gene sequences have been placed in the production organism (often bacteria or yeasts). The pharmaceuticals produced in this way are meant to be identical to the naturally occurring materials. By contrast to genetic engineered products, traditional drugs are produced through synthetic organic chemistry and are often less adapted to the human host in their activity. This can result in numerous side effects limiting the utility of the drug. These products are usually proteins and have a very specific physiological role. They are to be preferred to artificial products because they will have fewer undesirable side effects associated with them. This is the case, for example, of human insulin used by diabetic patients and produced by bacteria or yeasts, which is better tolerated than the former insulin isolated from pig.
To produce these products scientists use techniques of DNA recombination to introduce into bacteria, yeast or cultured animal cells the information needed to produce a human protein that has therapeutic potential as DNA pieces. This is done by isolating the DNA and identifying the sequence of the gene of interest, cloning it (i.e. recovering it in high quantities from appropriate cells) or synthesizing it chemically, and then placing it back into the cultured cells by an appropriate "cut and paste" process.
Once engineered, the reprogrammed cells can be grown in large quantities often using the technique of fermentation. Engineered cells will thus produce the protein of interest in large quantities. This recombinant protein will either be found inside the cells or in the surrounding medium depending upon the way in which the cell was engineered. It is subsequently extracted and used as a medicine or for other industrial processes.
blue What is a drug target?
Life is defined by the processes of metabolism and compartmentalization. This means that molecules are constantly transformed inside cells, some are being built up while other are destroyed, generating both energy and building blocks for construction. A poison is a molecule which interferes with one of these processes. A drug is a category of poison which is meant to interfere only with the metabolism of a special set of cells, which may be patient's cells (for example tumor cells) or microbes. This interference, which prevents some part of metabolism, is what gives specificity to the drug. It is the place where it interacts with some object - usually an enzyme, a receptor or an enzyme complex - needed for the cell metabolism. A drug target is a particular subset of these places, chosen because interference at this place will be both effective to prevent metabolism of the target cell, and be relatively innocuous to the host. The most common and efficient situation is that of pathogenic microbes. Indeed, in their case, the metabolism is often so different from that of man or animals that many enzyme exists which do not have counterpart in the latter, while they are essential for the microbe to multiply or to survive. This is the basis of most antibiotics. For example, the molecules of the penicillin family interact with enzymes essential for the building up of the microbe envelope, so that the microbe will not be able to survive in the body fluids. The enzyme do not exist in man. This is why the antibiotics of this family have been so successful and widely spread. Obviously the situation is much more difficult with cancer cells, because these cells, being host cells, are not very different from the normal ones. This is why anticancer drug are extremely toxic and dangerous, requiring highly specific protocols (chemotherapy). Fortunately these protocols become more and more efficient as scientists discover better ways to deliver the drugs, and to identify appropriate drugs for appropriate targets. As can be understood, a drug target is therefore a specific biological object which has been studied in context, so that it can be selected as specifically interferred with under the normal life conditions of the patient. In fact there are not many possible drug targets, precisely because life is so uniformly similar from one species to the other (this may look strange, but this is so, and this is the reason why it is possible to synthesize human proteins in bacteria, for example). At the present time there are probably no more than one hundred and fifty possible targets, with our current knowledge. However new concepts are emerging, where one will no longer use a single target, but rather use several at the same time, each one being, if considered alone, entirely ineffective. Another effective new concept for a target is not to poison the target cell, but to make it innocuous, either because it will fit in the environment (making a cancer tumor into a benign tumor) or it will fail to interact with tissues (for example pathogenic bacteria will no longer be able to colonize an epithelium if they are fed an appropriate diet). This latter concept fits well with the popular view that eating appropriate food may have a role in preventing diseases.
blue How new diseases emerge?
You may think that the ordinary flu is not a dangerous disease. This is not so. It is one of the diseases which kills the most people in the world. In 1919, the flu virus H1N1 killed tens of millions of people throughout the whole world. Since then medical doctors are afraid that a similar form may appear again. For this reason a network of surveillance has been set up by the World Health Organisation, using the contribution of many Research Centres in the World. Here in Hong Kong, Dr WL Lim of Queen Mary Hospital fortunately isolated in humans the dangerous H5N1 mutant, which was already known to be spread among poultry, triggering a series of measures (to which Pr KY Yuen participated with other scientists at Queen Mary Hospital and at the WHO) which prevented the spreading of this letal disease. One can say that this contribution and the very fast and efficient action of the Hong Kong Government, saved the lives of thousands if not millions of people in China (and probably elsewhere in the world as well).
The real goal of any living organism is to occupy some place on the Earth (its biotope). A way to do so is to multiply and either get rid of competing organisms, or to cooperate with them. The human body is an interesting biotope, since it is regularly provided with food, is of a more or less constant temperature, and is well protected from most of the harshness of the environment. For this reason the human body (the body of plants and other animals as well), is invaded by a variety of organisms which tend to strive in it. As a matter of fact, we harbour in our gut (not to say anything about the surface of our skin) a large number of microbes (in fact, the normal situation is that we host ten times more microbial cells than the number of our own cells!). Normally, these inhabitants are innocuous, and even beneficial (they make nutrients that are rare in food). But, occasionally, some may turn out to be invasive and they then cause diseases. A fine balance between our immune system (which put them at bay) and their virulence properties usually protects us against diseases. However, from time to time, microbes can have a pathogenic behaviour and multiply in an uncontrolled way. If they have to go on, once they have infected a person, they must propagate from that person to another one. This tells us immediately that contact between people or between people and animals is important for diseases to manifest. This contact may be direct (skin contact, butchery, aerosols...) or indirect (requiring an intermediate vector, such as a flea, a louse, a fly or a mosquito).
It is therefore evident that human behaviour is a the root of (re) emerging diseases pdf. A some point it was believed that tuberculosis appeared when men domesticated cows (it is a cow disease), but the truth is rather the reverse: the disease long existed with humans or pre-humans, who may have communicated it to cattle. Flu, which is a bird disease, quite innocuous for birds, usually goes first to pigs, then to humans. And it originates usually in China, where for a long time peasants used to breed swine and ducks more or less together. The disease, although highly contagious (by aerosol), did not propagate very much, until big cities or large human concentrations allowed it to spread as fire. AIDS is a very poorly contagious disease, which should have stayed where it was, probably in apes. But the practice of butchery infected humans (some people ate ape meat). There it did not spread much, until the human behaviour changed, destroying the former social rules, and introducing both a widespread use of blood products, needles and a sexual promiscuity which had had no general counterpart in the past. At this point, one may predict that the destruction of most of the biosphere will liberate, in an uncontrolled way, animals, vectors, microbes and viruses which, until recently, were confined to restricted areas. Human concentrations in cities will favor fast spreading of any type of disease. It can also be expected that the growing sexual promiscuity which seems to pervade all countries will create a new biotope where live experiments made by microbes (exchanging genes and recombining with each other) will produce new virulent variants... Diseases are indeed very much linked to the structure of societies. It is therefore very important to monitor effectively the presence of pathogens in the general population and begin, early on, to take prophylactic measures. However, curiously enough, while this was done for tuberculosis or syphilis a century ago, it has now been out of fashion. Even worse, it could be that human beings do not react in a sensible way to the progresses of science (see Nature and Artifice), and that, for example, people are not reluctant to use animal organs in transplantation: this would pave the way for the construction of new viruses which would then propagate to Mankind!
After this page had been set up a dangerous disease, the Severe Acute Respiratory Syndrome (SARS) spread from Hong Kong to the rest of the World, demonstrating, unfortunately that the lines above are quite relevant.
For a site monitoring (re) emerging diseases in the Hong Kong region, see the public Health and Disease Surveillance SAR HK site.
blue Is there a risk of bioterrorism?
This text was written at the creation of the Centre, well before suspected terrorists attacks using anthrax, demonstrating, alas, that our world is not as good as it should be....
Unfortunately, human beings are often enflamed by wrong or even plainly destructive ideas. This gives rise to aggressive behaviour. Biological warfare existed for a long time. The Spanish invadors of America used it, and, more recently, there have been quite a few attempts in China by Japanese militaries. One cannot, therefore, dispell the idea as far-fetched. However, it must be understood that biological warfare, in contrast to all other means used in wars, is difficult, if not impossible, to control once it has been launched. In particular, agents such as Bacillus anthracis, which have been used for that purpose, remain in the environment for decades or even centuries. A test bomb, exploded by the British in Gruinard Island, and full of B. anthracis spores, resulted in contamination of the island for 50 years, and one had to scrape the surface of its soil and spread strong bactericides to get rid of most of the spores! This does not mean of course that this type of warfare will not be used, but it is clearly a very mad way to proceed. Please note here the symptoms of anthrax.
Can we trust the mental sanity of human beings? To counteract this possibility the best is certainly to control as much as possible information on the matter, and refrain to spread it. Clearly, freedom does not mean freedom to kill. Misinformation is another possibility, but it is difficult to control.
blue How did Life originate on Earth?
See information here.
blue How do scientists make discoveries?
(under discussion) (for further discussion see A Western Imbroglio)

Return to the Journalist's page