A Rattling Good History: the Story of the Human Genome Project

Πάντων χρημάτων μέτρον ἐστὶν ἄνθρωπος, τῶν μὲν ὄντων ὡς ἔστιν, τῶν δὲ οὐκ ὄντων ὡς οὐκ ἔστιν

ΠΡΩΤΑΓΟΡΑΣ

Related Themes

Antifragility
Pioneering genome studies 1988
Genomes 1992
Maxwell's demon
Origin of genomics in silico 1989
Our genome projects
The SARS episode

First conjectures on SARS

A Rattling Good History: The Story of the Human Genome Project

by Antoine Danchin 唐善 • 安東 (translated by Alison Quayle)

The translation in Spanish is provided at a commercial site that tries to help spreading scientific knowledge via the interest of users for computer games

The Human Genome programme was born of a political initiative, but very soon became inseparable from the commercial issues which surround it. Since 1995 the spotlight has been on that thorn in the flesh (and the mind) of the research community, Craig Venter. We certainly haven’t yet seen the last of this "Joker in the pack".

Five years later the Synthetic Biology effort was launched at the MIT, and in parallel Craig Venter developed several studies that may lead to the explicit demonstration that in a living cell the genetic program is separated from the machine which runs the program (the "chassis" in the Synthetic Biology vocabulary). Cells can be seen as computers-making-computers and information can be proposed to be an authentic currency of reality.

On March 14th 2000, Tony Blair and Bill Clinton published a short joint declaration in which they "applaud the decision by scientists working on the Human Genome Project to release raw fundamental information about the human DNA sequence and its variants rapidly into the public domain". The declaration ends with an enigmatic phrase in which Blair and Clinton "commend other scientists around the world to adopt this policy" of rapid publication. It goes without saying that it is unusual for heads of state to intervene in scientists’ decisions to publish. Incongruous as this is, the declaration is a salutary if stern reminder that the Human Genome Project is based on a political initiative, not a scientific one.

Immediately after Japan had been crushed by the atomic bombs, the USA initiated a policy of intensive co-operation with the defeated country, to hold off the growing threat of communism. Genetics had a central place in the arena of scientific collaboration. Amongst other aims, this allowed the Americans to salve their conscience by showing an interest in the future of the residents of Hiroshima and Nagasaki. This explains how the US Department of Energy (DOE), the federal agency (equivalent to a ministry) responsible for the USA’s nuclear programmes, very soon became involved in research which at first sight appears well outside its natural jurisdiction. The main areas of research were the mechanisms of mutagenesis and identifying the effects of radiation on the genes. In 1947 this led to the creation of the Atomic Bomb Casualty commission (ABCC), financed by the Atomic Energy commission (which soon became the DOE). Genetics made up an important part of its research.

The mutagenic effects of radiation had been discovered by Hermann Joseph Muller in 1927. For this work, which led him to make appallingly alarmist predictions, he was awarded the Nobel Prize in 1946. In 1954, the ABCC published a report by James Neel and William Schull on the first genetic findings on more than 75 000 births in Hiroshima and Nagasaki. The results were reassuring, but they dealt with only the first generation of children born since the bomb, and were based on an analysis which was still rudimentary. 1954 was only a year after the structure and mode of replication of the DNA molecule had been discovered. A generation later, more sophisticated studies which analysed protein mobility in an electric field did not contradict this early work (1,2). But to be really sure of what kind of mutations radiation might have caused, it was necessary to find out what happens in the DNA sequence, right down to the level of the nitrogen bases.

However, in the meantime the political context had changed beyond recognition. In the mid 80s the cold-war rhetoric gave way to concern about a new adversary. Japan’s economic power threatened America’s leadership in technology. The federal agencies were mobilised to encourage the setting up of new companies and to protect intellectual and industrial property (3).

It is impossible to give a precise date for the beginning of the Human Genome Project. Some writers date it from the Alta summit in Utah in December 1984, organised by the DOE. The aim of this summit, in which James Neel took part, was to discuss what strategies should be used to detect mutations in the generations after Hiroshima and Nagasaki, in the context of the DOE’s vision for life sciences. Discussion focused on the state-of-the-art technologies which the DOE would be able to deploy, and all sorts of potential models for identifying mutations were reviewed. Direct sequencing of the DNA involved was already considered to be one of the most obvious methods. The original motives were soon forgotten.

In fact, the Human Genome Project could not have been imagined without efficient DNA sequencing and the constant progress that has been made in this technique. Neither would it have been possible without the systematic development of computer science, both in terms of hardware and software. This is another aspect where the DOE’s contribution is most obvious. In the summer of 1975, Frederick Sanger of the Medical Research Council (MRC) in Cambridge had announced that he had found a way to identify a gene’s sequence (the chain of bases which make it up) by reproducing DNA replication in a test tube. Immediately several laboratories in Europe, the USA and Japan tried their hand at automating these methods. "Fluorescent" sequencing, introduced by Leroy Hood’s team at Caltech in 1986, was a remarkable improvement.

In 1981 Hood had set up Applied BioSystems, which specialised in laboratory equipment for molecular biology. This company developed at remarkable speed, thanks to sales of its DNA sequencers, until it was bought by Perkin-Elmer in 1997, just as its model 3700 capillary sequencer was coming onto the market. This sequencer was behind the considerable acceleration in sequencing speed worldwide. The technique, imitated elsewhere in the world, has continued to be improved and developed both by its promoters and by its competitors. It led to a ten-fold improvement in laboratory performance between 1995 and the end of 1997, and by a factor of ten again at the end of the century.

The DOE’s investigators contributed to another improvement – the use of "cell sorter" methods, where in a mixture of cells, those marked by the presence of a fluorescent molecule can be separated from unmarked cells. This method was extended to chromosome sorting, and it thus became possible to purify human chromosomes and to establish specific DNA banks for each chromosome. As there are 22, plus the two sex chromosomes, this meant a considerable reduction in the size of sequencing projects. Using this method, the French national sequencing centre at Evry, near Paris, is now finishing the sequencing of chromosome 14, which at just under 100 megabases (1 megabase = 1 million bases) will represent France’s contribution (only 3%) to the international project.

This progress would not have been possible without parallel developments in computer memory and calculating speed. As early as 1978, it had been clear that computer support would rapidly become necessary, to allow the scientific community to build the sequences into a continuous text which they could then interpret. A study undertaken by Rockefeller University and the European Molecular Biology Laboratory (EMBL) at Heidelberg led to the idea of the creation of a databank for gene sequences. It became clear very early on that the possession of this information was of vital importance, with political implications. Frequent discussions, sometimes heated, took place between Europe and the USA, to decide where these databanks would be, and how they would be structured. Who would be responsible for sequence quality – its producer or the database? Who would produce the annotations? This is clearly no small matter – a bad annotation is tantamount to disinformation. It is unfortunately now clear that major annotation errors have spread via data banks through the entire scientific community. Two banks were established, in competition but also in touch with each other – one at Heidelberg, the other, the first GenBank, at one of the DOE’s laboratories, the Los Alamos National Laboratory (LANL). After the Alta summit, Robert Sinsheimer, then Chancellor of the University of California at Santa Cruz, proposed this project as an appeal for funds. He brought together a group of well-known investigators to discuss the idea in May of the following year (1985), but he was unable to raise the funds needed. Independently, Renato Dulbecco, of the famous Salk Institute, proposed using the human genome sequence to discover the causes of cancer. He published this idea in Science in 1986. (4)

The same idea was being developed at the same time at France’s Centre for the Study of Human Polymorphism, (the Centre d'étude du polymorphisme humain or CEPH), set up by Jean Dausset to collect the entire genetic blueprint of families whose genealogy was well known. Daniel Cohen, a very active investigator at Dausset’s laboratory, who had realised the value of the genetic heritage that this unique collection represented, developed an industrial-scale approach which would result in the sequencing of large segments of the genome. Finally, Charles DeLisi proposed, independently, that this project should be carried out at the DOE (5). DeLisi, who had worked on computational models of biology at the National Cancer Institute, one of the National Institutes of Health (NIH), had taken on the task of understanding the meaning of the sequences, and had worked on this with investigators from LANL.

DeLisi was at the time one of the project leaders in biological research at the DOE, which enabled him to cost the project, and make the first practical propositions. In 1987 he persuaded the DOE to redirect 5.5 million dollars intended for other projects to his programme. In 1988, under the influence of Pete Domenici, the Senator for New Mexico, the programme was considered by the American Senate and brought into the White House discussions on large-scale scientific projects. David Galas, a pioneer in molecular genetics, soon became a keen supporter.

In France, Daniel Cohen and Jean Dausset obtained a preliminary budget heading under which to explore the feasibility of the project, using the CEPH’s human DNA libraries. More importantly, Cohen managed to persuade the Minister of Research that the CEPH, with its private structure, could begin a sequencing programme more easily than public bodies could, if it had direct help from the ministry. As early as 1989 onwards, the CEPH was recruiting scientists and engineers, and purchasing robots and industrial equipment, to begin to map and sequence the human genome on a large scale. At the same time, the EEC granted Eureka funds to the CEPH and Bertin, a private company (in association with two British partners) with the aim of creating an industrial supplier for the necessary equipment. This project, called Labimap, was to supply oligonucleotide synthesisers, robots and reactors for automatic plasmid preparation, sets for large-scale molecular hybridisation, and miniature electrophoresis gels for sequencing. Daniel Cohen had already seen quite clearly that genome projects would have to develop molecular biology techniques on a large scale. It would be interesting to analyse Labimap’s total failure, as it could have given Europe the equivalent of what Applied BioSystems and Perkin-Elmer gave the USA.

Progress was too slow to suit Daniel Cohen. By a happy coincidence, Bernard Barataud, the energetic president of the French Muscular Dystrophy Association (l’Association française contre les myopathies) had organised an unexpectedly successful Telethon in France in 1987. He planned to use the money collected each year to finance an ambitious programme in human genetics. Cohen realised just how far he could turn this to his advantage, and he convinced Barataud that sequencing the human genome would speed up the identification of genetic diseases considerably. Barataud chose Evry, not far from where he lived, as the site for the substantial laboratories which would be needed. The first Genethon was established at the end of 1990, with the first prototypes built by Bertin for Labimap. It very soon became clear that it was too early to sequence the human genome, given the size of the task (a huge number of large chromosome segments have to be cloned, which is very difficult.). So to begin with, both in France and elsewhere, the projects were reoriented towards gene mapping (locating markers spread out along the chromosomes).

Genethon had three major programmes. Under Daniel Cohen, Yeast Artificial Chromosome banks (YACs) carrying random fragments of human chromosomes. Under Jean Weissenbach, then at the Pasteur Institute, the construction of a detailed physical map, and under Charles Auffray, the creation of a complete set of human complementary DNA. To international astonishment, in spring 1992 Daniel Cohen presented the first complete map of chromosome 21 at the annual meeting of the Cold Spring Harbor Laboratory in the USA, and in the autumn of the same year he published the first contiguous sequence map of YACs, containing up to 1 megabase of human DNA. This map, made using the computer facilities of INRIA (with Guy Vaysseix and Jean-Jacques Codani), placed France at the forefront of genomics. (6) This is not the place to discuss the reasons for the rapid collapse of the French lead, except to say that it was largely the result of a serious error of scientific judgement on the part of certain decision-makers, acting behind the scenes, and of a skilful manipulation of the ministerial structure at the time. (7, 8)

At the same time, a fierce struggle was going on for the ownership, administration and scientific management of GenBank, the database which holds all the data on DNA sequences worldwide and which had been taken over by the National Institutes of Health (NIH). This was between the DOE, which had founded GenBank at the Los Alamos laboratory which it financed, and the NIH, which financed the National Center for Biotechnology Information (NCBI). The DOE went as far as to finance a rival bank, Genome Sequence Data Base (GSDB). This bank was managed by the National Center for Genome Resources, a non-profit-making foundation created at the end of 1992 on the initiative of Senator Domenici. The fact that data entry into the different banks was not synchronised, and the inconsistent labelling of the data they held, put scientists all over the world in an almost impossible situation.

Clearly it is not possible to look into the detail of these power struggles here. As often happens, they appeared as the dominant players began to lose ground. This was the case with the DOE, which was witnessing a slowdown in research programmes based on nuclear energy, and ran the risk of soon finding itself bled dry financially if it could not put forward to the federal government a long-term programme which would be expensive in terms of manpower and funds. So the evaluation of its projects took place in an highly-charged atmosphere, not very conducive to that national and international collaboration which would certainly have led to the success of the project in a much shorter time. Luckily the situation improved in 1997 when the bank financed by the DOE turned commercial, ending its position as a competitor to GenBank. The informal association between GenBank and its European and Japanese counterparts, which had existed since 1990 and which later became official, also brought stability. On the European side were the EMBL, first at Heidelberg, then at its outstation at Hinxton, south of Cambridge, and the European Bioinformatics Institute (EBI), and on the Japanese side the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics (NIG) at Mishima. Effectively, there is now one single DNA sequence data bank for the whole world, with three entry points at the NCBI, the EBI and the NIG.

In reality, it was not the end of the 1980s but 1995 which was the most significant turning point for the Human Genome programme, not through its creation in the form of the Human Genome Initiative, but because of an outsider who burst onto the scene. This turning point stemmed from a method similar to that used by Daniel Cohen, but more successful. In that year, Craig Venter and his colleagues at The Institute for Genome Research (TIGR) near Washington, published the sequences of two very small bacterial genomes one after the other in Science. Craig Venter was not particularly interested in bacteria. He had been an NIH investigator. With his interest in technological progress, he was tempted by the challenge of sequencing the human genome very early on, after having been involved in locating the gene for a neurotransmitter receptor on human chromosome 15, right at the beginning of the 1990s. He immediately realised that the scale on which molecular biologists were used to working would have to change if projects of this kind were to be successful. They would have to "think big", on an industrial scale. Craig Venter also understood that working with public bodies involved a long and difficult struggle with red tape, even in the USA, and that if he wanted rapid success that route was out of the question. He would have to create a tailor-made organisation, from scratch. Cleverly, instead of setting up just one, he created two, together with his colleague William Haseltine. Venter was to manage the non-profit-making organisation, TIGR, while Haseltine would manage the commercial organisation, Human Genome Sciences (HGS), which had first industrial property rights over the whole of TIGR’s work. TIGR would thus benefit not only from advances of funds from HGS’s capital, but also from the contracts it entered into with those two old rivals the NIH and the DOE.

Craig Venter also understood intuitively that, faced with a riot of different genome sequencing projects and all the battles and ego-trips they brought with them, it was essential to establish a presence, and a reputation for reliability, very quickly. After the first meeting on the sequencing of micro-organisms organised by David Galas, he understood that he needed a powerful computer infrastructure. He also realised that TIGR’s industrial-scale set-up meant he could contemplate sequencing the whole of a bacterial genome, provided it was not too big, by using a random fragmentation procedure called the "shotgun" technique. Hamilton Smith of Johns Hopkins University in Baltimore, close to TIGR, also realised this. He had shared the Nobel prize with Werner Arber and Daniel Nathans, for their discovery of restriction enzymes, the enzymes which had made the birth of genetic engineering possible. These enzymes enable scientists to cut DNA at specific points, and thus to juggle the "cut and paste" methods which are the basis of molecular biology. As a bacteriologist and biochemist, Smith was familiar with a pathogenic bacterium, Haemophilus influenzae, which produces restriction enzymes. With his usual flair, Craig Venter realised that he could soon be the first to have sequenced a complete genome!

And so, at a meeting organised by the Wellcome Trust in April 1995 at Dormy House near Oxford, Craig Venter announced that he and his team of about forty had succeeded in sequencing the entire genome of H. influenzae. He also announced that he had practically finished the sequence of the smallest known genome of any living organism, that of Mycoplasma genitalium. Even though these were very small genomes, it was still quite an achievement.

Meanwhile, the Human Genome Project was getting organised. It involved not only the two principal American federal agencies, the DOE and the NIH, but also many other countries from around the world. In Britain, the powerful Wellcome Trust, a private charitable foundation, had founded the Sanger Centre at Hinxton, south of Cambridge, in 1994, where later an outstation of the EMBL was set up. An informal international association, the Human Genome Organization, shared out as best it could the task of sequencing the human genome, chromosome by chromosome, between laboratories around the world, with a target date of late 2005. The story of the power struggles and dramatic exploits which, one after the other, left their mark on the way the programme was organised, would fill a book. A look at the comments in almost every issue of Science and Nature over the last five years, as well as the information given on various websites will show not only the struggle between the federal agencies, but also between personalities within those agencies, and between countries.

In 1998, in one of those dramatic coups he is so good at, Craig Venter once again changed the face of genomics. He is the reason behind the unexpected Blair-Clinton declaration. On June 24th 1997, Venter broke off the agreement between TIGR and HGS. He freed himself to scale up his approach to genomics and in early 1998 he announced that together with Perkin-Elmer he had created a new company, Celera ("fast" in Latin), with the aim of sequencing the human genome within three years. The plan was to use the "shotgun" approach, without preliminary separation of the chromosomes, using supercomputers to reassemble the fragments, thanks to a high-speed algorithm invented by Gene Myers. The sequencing was to be carried out using several hundred of Perkin-Elmer’s capillary sequencers, and the planned "coverage" of the genome was ten-fold, that is 30 billion bases. A quick calculation shows that this figure is not impossible, but is difficult to reach. One machine can sequence 96 templates in three hours, reading more than 500 bases (these machines now routinely go up to 650 to 700 bases). Allowing for poor quality templates, this means 300 000 bases per day, or 300 megabases in three years. In addition, Venter proposed to demonstrate the feasibility of his approach by sequencing the entire genome (nearly 150 megabases) of the geneticist’s favourite subject, the fruit-fly Drosophila, in collaboration with Gerry Rubin’s group at Berkeley, by the end of 1999. They pulled it off. It is worth pointing out again Venter’s remarkably clear thinking. It has long been clear that the drosophila genome should have been chosen in the first place as the model organism. Not only are the genetics of this insect by far the best known in the world, but also its development is, strange as it might seem, remarkably similar to that of vertebrates such as the mouse or man. Venter could thus rely on data obtained from the drosophila to help him identify many of the most important human genes, at least as a first approximation. At the same time as he perfected the scaling up of his shotgun technique, he could be preparing to annotate the human genome. Celera is a private company and its aim is obviously to make a profit. Venter therefore announced that he would not immediately release his sequences into the public domain, and that in any case any use of his sequences for profit would attract royalties. In the circumstances, the organisations engaged in the Human Genome Project, the Sanger Centre, which planned to produce a third of the sequences, and the groups involved in Europe and Japan, reacted strongly. They began by speeding up sequence production considerably, aiming to producing a "working draft", a "coverage" of the unassembled genome, by summer 2000, and the complete sequence by 2003, two years before the date originally proposed. Very soon, the consortium published the sequence of chromosome 22. (9) They also launched a high-profile public debate about the fact that in assembling its sequences, Celera made extensive use of public domain sequences, and that for the company to want to make a profit from this constituted an abuse. In March 2000, letters between Francis Collins and Craig Venter were passed to the national newspapers, in an attempt to force Venter to cooperate with the public project and to make his sequences available to investigators throughout the world without charge. It is to this exchange of letters that the Blair-Clinton declaration alludes.

At the end of this all-too-short account, what do we find? An explosive mixture of the values which make up science – not only the love of knowledge of course, but also political rivalry, the search for glory, and the intrusion of the commercial world. In the beginning it was an entirely American game, inspired by the struggle against communism, then against the technological supremacy, real or imagined, of Japan. It led to twenty years of support for innovative private business, in a policy which is nowadays imitated on this side of the Atlantic. This makes the Blair-Clinton declaration, which seems to take the opposite view to the previously established position, all the more surprising, as if suddenly the free market and its corollary, the protection of intellectual property, were considered a threat to free access to knowledge.

The most widely-shared value today is the profit motive. There is already an area in which Perkin-Elmer is quietly piling up the profits – the sale of its sequencers and other laboratory equipment. The stir that Celera has excited has been an immense success for that if for nothing else. From this point of view, it is not the gene sequences themselves which are valuable, but their annotation, the discovery of their meaning, and the inventions which may result from all this. Patenting genes does not make sense, not for moral reasons – after all, we patent arms, which does not necessarily mean that we agree with their use – but because they are not something which has been "invented". On the other hand, understanding a biological function can lead to the discovery of a therapeutic target and thus to a treatment. Equally, awareness of a function can lead to the discovery of a basis for diagnosis, and, yes, the use of this could be patented. Gaining time means gaining a better chance to make intelligent annotations on the genome texts, and this is what Celera is doing. The motivation of those who prepared the Blair-Clinton declaration is not sound. What really needs to be monitored is how knowledge of the genomes will be used in the future. That is where the real moral problem lies, but who is paying any attention?

A. D.

Further reading

• Text of the exchanges between the NIHs and Celera, plus the Blair-Clinton Declaration

• For the scientific reasons for genome sequencing, see Antoine Danchin La Barque de Delphes, Odile Jacob, 1998, updated and adapted in The Delphic Boat. What genomes tell us. Harvard University Press, 2003
• A tribute to Hiroshi Yoshikawa

References

(1) J. Neel, Physician to the Gene Pool : Genetic Lessons and other Stories, Wiley, 1994.
(2) W.J. Schull, Song among the ruins, Harvard University Press, 1990.
(3) L Roberts, « Watson versus Japan », Science, 246 , 576, 1989.
(4) R. Dulbecco, « A turning point in cancer research: sequencing the human genome », Science, 231 , 1055, 1986.
(5) C. DeLisi, « The Human Genome Project », American Scientist 76 , 488, 1988.
(6) P. Rabinow, French DNA, Trouble in Purgatory, The University of Chicago Press, 1999.
(7) Read the 6 issues published by the Groupement de recherche et d'études des génomes (GREG), La Lettre du Greg.
(8) A. Danchin, A brief history of genome research and bioinformatics in France. Bioinformatics. 16, 65, 2000.
(9) I. Dunham, N. Shimizu, B.A. Roe, S. Chissoe, et al . « The DNA sequence of human chromosome 22 », Nature, 402, 489, 1999.

THE HUMAN GENOME PROJECT