|
This text is adapted from an article published
in a brochure in honour of Prof. Hiroshi Yoshikawa, from the
Nara Institute for Science and Technology, Nara, Japan: For
the Love of Genome p. 4-9. We wish to associate our late
colleague Frank Kunst, who passed away
at the end of the re-sequencing project, to this tribute.
The prehistory of the sequencing of the Bacillus subtilis genome
goes back to 1985, at a time when the idea to sequence the human
genome began to be discussed in the United States. Robert Sinsheimer,
Renato Dulbecco and Charles DeLisi, each in his own way, proposed
to sequence the human genome in 1985-1986. Their project was,
at that early time, presented as a technical program whose outcome
might be to help solve some of the problems of human health.
Immediately after Japan had been crushed by the atomic bombs,
the USA initiated a policy of intensive co-operation with the
defeated country, in order to hold off the growing threat of
communism. Genetics had a central place in the arena of scientific
collaboration. Amongst other aims, this allowed the Americans
to salve their conscience by showing an interest in the future
of the residents of Hiroshima and Nagasaki. This explains how
the US Department of Energy (DOE), the federal agency responsible
for the USA’s nuclear programmes, very soon became involved in
research which at first sight appears well outside its natural
jurisdiction. The main areas of research were the mechanisms
of mutagenesis and identifying the effects of radiation on the
genes. In 1947 this led to the creation of the Atomic Bomb Casualty
commission (ABCC), financed by the Atomic Energy commission (which
soon became the DoE). Genetics made up an important part of its
research.
The mutagenic effects of radiation had been discovered by Hermann
Joseph Muller in 1927. For this work, which led him to make appallingly
alarmist predictions, he was awarded the Nobel Prize in 1946.
In 1954, the ABCC published a report by James Neel and William
Schull on the first genetic findings on more than 75,000 births
in Hiroshima and Nagasaki. The results were reassuring, but they
dealt with only the first generation of children born since the
bomb, and were based on an analysis which was still rudimentary.
1954 was only a year after the structure and mode of replication
of the DNA molecule had been discovered. A generation later,
more sophisticated studies which analysed protein mobility in
an electric field did not contradict this early work. But to
be really sure of what kind of mutations radiation might have
caused, it was necessary to find out what happens in the DNA
sequence, right down to the level of the DNA bases. This was
not yet possible then.
However, in the meantime the political context had changed beyond
recognition. In the mid 80s the cold-war rhetoric gave way to
concern about a new adversary. Japan’s economic power threatened
America’s leadership in technology. The federal agencies were
mobilised to encourage the setting up of new companies and to
protect intellectual and industrial property. And we witness
today the consequences of this policy in the high number of computer,
software and biotech companies in the USA.
It is impossible to give a precise date for the beginning of
the Human Genome Project. Some writers date it from the Alta
summit in Utah in December 1984, organised by the DOE. The aim
of this summit, in which James Neel took part, was to discuss
what strategies should be used to detect mutations in the generations
after Hiroshima and Nagasaki. The summit succeeded in fulfilling
the DoE’s vision for life sciences. In discussions of state-of-the-art
technologies and DoE’s capability for using them, all sorts of
potential models for identifying mutations were reviewed. Direct
sequencing of the DNA involved was already considered to be one
of the most obvious methods. The original motives were soon forgotten.
In fact, the Human Genome Project could not have been imagined
without efficient DNA sequencing and the constant progress that
has been made in this technique. Neither would it have been possible
without the systematic development of computer science, both
in terms of hardware and software. This is another aspect where
the DoE’s contribution is most obvious. In the summer of 1975,
Frederick Sanger of the Medical Research Council (MRC) in Cambridge
had announced that he had found a way to identify a gene’s sequence
(the chain of bases which make it up) by reproducing DNA replication
in a test tube. Immediately several laboratories in Europe, the
USA and Japan tried their hand at automating these methods. "Fluorescent" sequencing,
introduced by Leroy Hood’s team at Caltech in 1986, was a remarkable
improvement.
In 1981 Hood had set up Applied BioSystems, which specialised
in laboratory equipment for molecular biology. This company developed
at remarkable speed, thanks to sales of its DNA sequencers, until
it was bought by PE Biosystems in 1997, just as its model 3700
capillary sequencer was coming onto the market. This sequencer
was behind the considerable acceleration in sequencing speed
worldwide. The technique, imitated elsewhere in the world, has
continued to be improved and developed both by its promoters
and by its competitors. It led to a ten-fold improvement in laboratory
performance between 1995 and the end of 1997, and by a factor
of ten again at the end of the century.
This progress would not have been possible without parallel
developments in computer memory and calculating speed. As early
as 1978, it had been clear that computer support would rapidly
become necessary, to allow the scientific community to build
the sequences into a continuous text which they could then interpret.
A study undertaken by Rockefeller University and the European
Molecular Biology Laboratory (EMBL) at Heidelberg led to the
idea of the creation of a databank for gene sequences. It became
clear very early on that the possession of this information was
of vital importance, with political implications. Frequent discussions,
sometimes heated, took place between Europe and the USA, to decide
where these databanks would be, and how they would be structured.
Two banks were established, in competition but also in touch
with each other – one at Heidelberg, the other, the first GenBank,
at one of the DoE’s laboratories, the Los Alamos National Laboratory
(LANL). After the Alta summit, Robert Sinsheimer, then Chancellor
of the University of California at Santa Cruz, proposed this
project as an appeal for funds. He brought together a group of
well-known researchers to discuss the idea in May of the following
year (1985), but he was unable to raise the funds needed. Independently,
Renato Dulbecco, of the famous Salk Institute, proposed using
the human genome sequence to discover the causes of cancer. He
published this idea in Science in 1986.
In the same way, in 1986 André Goffeau had proposed to the European
commission a program aiming at sequencing the yeast genome as
a typical illustration of the principle of subsidiarity (i.e. demonstration
of synergy between different European countries, needed to obtain
support by the European commission). In april 1996, the sequence
of the baker's yeast genome had been made public: 16 chromosomes,
representing more than 12 megabases had been sequenced by a consortium
of more than 100 laboratories and 641 scientists throughout the
world. This was the most remarkable because this feat had been
achieved two years ahead of previsions, and that the corresponding
genome was much larger than that of the two deciphered genomes
of Haemophilus influenzae (1.8 Mb) and Mycoplasma
genitalium (0.58 Mb) sequenced by TIGR a year before. However,
in all these cases, the longest contiguous DNA sequences remained
shorter than 2 Mb. This was because obtaining large continuous
DNA segments without gaps is an extremely hard task. The difficulty
in assembling sequences and the probability to meet with unclonable
DNA regions and repeated sequences increases with length. However
the two model bacteria used in the world, Escherichia coli and Bacillus subtilis,
had a genome more than 4 Mb long. To get their complete
sequence would therefore be a much more difficult endeavour,
in particular when the project reaches the end of the sequencing
process, when the final unsequenced gaps remained to be closed.
At that early time, the central question asked by the resarch
led in my laboratory consisted to try to understand how genes
are collectively expressed together, in a harmonious fashion.
Witnessing the first successes of the sequencing of viral genomes,
it appeared to me quite natural, and even necessary, to attempt
to understand this fundamental feature of living cells by analysing
the complete text of genomes. This supposed that one would be
able to get the whole genome sequence. This also assumed fulfilment
of two technical prerequisites: that of experiment at the bench
(one had to determine experimentally the sequence of
several milllions of base pairs) and that of computer sciences
(one needed to assemble and analyse this sequence, and this would
certainly be impossible manually, without automated means).
In order to fulfil the first condition I had met, during the
summer of 1986, Pierre Prentki, a young and brilliant scientist
who worked in the United States with David Galas (who presently
became responsible for the genome programs at the DoE). Pierre
and I had agreed that he would come and set up in my research
unit a laboratory for sequencing and analysis of the gene functions
of a model genome. I did not know at that time that he would
soon meet a tragic end…
As for the second prerequisite, things were also difficult to
set up. Existing genome programs did not aim at answering a specific
biological question. They were just descriptive. In constrast,
the background was initially different, since the reason why
I proposed to sequence the genome of B. subtilis, at
the spring meeting of the Société Française de Microbiologie
in 1987, was a conceptual one, based on the computer mediated
analysis I had recently performed with Olivier Gascuel. Indeed,
for several years I had been regularly meeting computer scientists
and biochemists, involved for a long time in the computer-mediated
analysis of DNA and protein sequences, and more generally of
biological knowledge. This had convinced us of the importance
of computer sciences for setting up genome programs. The underlying
assumption of research in my laboratory was that collective gene
behaviour should be revealed as a prominent feature of the genome
(which could therefore never be perceived as a simple collection
of genes), and this could be tackled with "in silico" approaches.
As expected, the reaction to my proposal to sequence a bacterial
genome was almost universally dubitative, when not plainly negative,
in particular when I proposed to sequence the genome of the second
model bacterium, Bacillus subtilis (since, at that time,
rumors
were spreading that its genome might be very soon completed, I did not propose to sequence the E. coli genome). Fortunately,
Simon Wain-Hobson, who had recently sequenced the genome of HIV,
the AIDS virus, was interested and ready to start a sequencing
program. We proposed together to our local and ministry authorities
to sequence the genome of a universally spread sexual disease
agent Chlamydia trachomatis, but we did not meet with
any successs. In june of this same year Raymond Dedonder, then
the director of the Institut Pasteur de Paris, attended the regular meeting
on the biology of B. subtilis in California. There,
James Hoch proposed to the community of specialists of this bacterium
to sequence its genome. Back from the San Diego meeting, Dedonder
remembered my proposal of the beginning of the year and asked
me whether I was still interested. He was willing to create a
program if I would take charge of it.
Philippe Glaser was just completing the sequence of a long piece
of DNA which we had identified as coding for the toxic adenylate
cyclases of Bordetella pertussis. He was recruited to
set up a sequencing laboratory in my research unit.
This is how the E. coli geneticist that I was became
involved in working on another model bacterium, B. subtilis.
Here, I must introduce a parenthesis. Knowing the situation
fifteen years later (the company Genset announced, at the end
of 1997, that it had sequenced the genome of two Chlamydia species,
including C. trachomatis), it appears clearly that this
history is not a simple anecdote but should be analysed as a
an interesting feature of the sociology of science. This made
me discover on this occasion that there is such a strong compartmentalization
in science that scientists working in a narrow domain do not
see with enthusiasm the intrusion of outsiders in their field,
and in fact try to deter it by all means. But, above all, it
showed that sociological constraints are very strong in orientating
research. Chlamydia trachomatis is the most widely spread
sexual disease in many countries, and it is the first cause of
female sterility. It is easy to cure (with a generic antibiotic,
therefore not profitable to companies), but its diagnosis is
difficult. Unfortunately, the venal interest for using in
vitro human egg fertilization techniques is such (this is
the only way to circumvent C. trachomatis induced
sterility) that nobody would care to cure the disease…
Let us come back to B. subtilis. Thanks to his talent
as an organizer, Dedonder rapidly set up an international meeting
where it was decided that a consortium of five European and five
American laboratories would cooperate to sequence the B.
subtilis genome, as soon as appropriate funds would have
been collected. The adventure started well. By chance in november
of the same year, in Gif sur Yvette near Paris, at a meeting
of the scientific council of the Centre de Génétique Moléculaire
of the French national research agency directed by
Piotr Slonimski, I met André Goffeau, who had already begun to
seriously initiate the yeast genome sequencing project. And,
after a moment of interrogation — the European funds were limited,
and this supposed therefore an implicit competition between the
projects — we both got persuaded of their complementary interest,
if they could be financed by the European Community. André Goffeau
promised his support. Early in 1988, I was commissioned by the
directorate Biology, division Biotechnology of the commission
of the European Communities, to write the introductory text for
their white paper meant to present the Biotechnology Action Program
for sequencing genomes. This work was asked to provide a conceptual
justification for research in genome sequencing to the politicians
of the European government. All this triggered the creation of
the B. subtilis genome program, an history in itself:
starting from a collaboration between five European and five
American laboratories, it ended as a collaboration between Europe
and Japan, with a tiny participation of two American groups!
Indeed, as we shall now see, this is where Hiroshi Yoshikawa
enters the scene, in a truly seminal way: without him, I am afraid
that the B.
subtilis genome
program would never have existed.
With a praiseworthy tenacity, without ever being discouraged,
Dedonder had, from his own side, approached the appropriate direction
at the commission, and he got some financing of an exploratory
step. This part would be financed by the program named "Science" of
the EEC. Raymond Dedonder was its administrative coordinator
and I created the sequencing laboratory, within my research unit,
under the direction of Philippe Glaser. Unfortunately, things
were no going so well in the United States: fights about priorities
to give to genome programs, stirred up by personal animosities
and by the emphasis placed on the sequencing of the E. coli genome,
led to the lack of support of the federal agencies approached.
This was not without consequences for the European project: in
these conditions, a simple questioning by the German advisor
of the grant committee resulted in that the B. subtilis program
was not retained for the "Biotechnology" action, which
was following the "Science" program! The European support
stopped at the end of 1990, in spite of the many efforts of Dedonder
who tried all types of interventions to trigger some support.
Fortunately, as is often the case, it was possible to extend
the Science program for one year — of course without supplementary
funds — and also to maintain the contacts with the yeast BAP
program to which I was invited to participate as an observer.
Luckily, an unexpected event came to change entirely the course
of history. At the international meeting on B. subtilis,
held that year in july, our Japanese
colleague Hiroshi Yoshikawa took the floor to say in a vehement
way (this is quite unusual for a Japanese person!) that he did
not understand why Japan had not been considered from the start
as a possible partner in the project. This very healthy reaction
decided of the future: instead of the United States, why not
attempt the adventure with Japan? Hiroshi Yoshikawa knew he could
obtain the appropriate support. And this is how a new project
was submitted to the EEC Biotechnology Program, in which it was
indicated that a team of European laboratories (including a Swiss
one, with support of the Helvetic Confederation!) would sequence
two thirds of the genome, while Japan would sequence the remaining
third.
How did the B. subtilis program fare subsequently?
The first long sequence of the B. subtilis genome (almost
100 kb long) was presented at Elounda, in Crete at the same time
as the complete sequence of the yeast chromosome III in
1991. The main observation then was that about half of the newly
identified genes were of unknown sequence and function, a truly
surprising discovery. At the meeting, Piotr
Slonimski joked about
the European advance, as compared to the situation in the USA
(where it had been much talked about the Human Genome sequencing,
or about the sequencing of E. coli genome, but not produced
much results until that date) by proposing to name the enigmatic
genes that had just been discovered in quantity "Elusive,
Esoteric, Conspicuous" genes (that is, genes that had been
unobtrusive, but really expressed and typical, with the acronym "EEC
genes" — where one immediately recognizes the acronym of
the European Union at that time). And the representative of the
National Institutes of Health which financed an important part
of the genome programs in the USA reacted in a way that is not
unusual, by superbly ignoring the European success, and by making
a list of what was about to happen in the United States (and
which indeed happened four years later).
Starting at this date, the progresses of the sequence were regularly
made public, both at meetings of the consortium and at international
genome meetings (mostly in the United States and in Japan). When
Dedonder retired, Frank Kunst, from his laboratory in Pasteur,
took the helm, and coordinated the program of the consortium
during three successive Biotechnology contracts (which programmed
the end of the sequence program for december 1998). Naotake Ogasawara,
from the Nara Institute of Science and Technology coordinated
the team of Japanese laboratories, with the constant support
of Hiroshi Yoshikawa. This allowed the consortium to sequence
the whole genome, and, as in the case of the yeast genome program,
to accelerate sequencing. In fact the sequence was completed
twenty months before its planned completion (and the Japanese
team was the first to complete its part) in april 1997, and presented
publicly in july 1997 in Lausanne (Switzerland) at a meeting
where Craig Venter described the numerous successes of TIGR.
It was finally published in november of this same year. The remaining
time was used to control the quality of the data, and to resequence,
by direct use of PCR on the chromosome, the regions where errors
were suspected. This allowed the consortium to get a sequence
with an error level thought ot be not higher than one base in
ten thousand (*).
Much more could be said about this program and about the various
persons who made it go to completion, but I think that it is time
again to place emphasis on the immense contribution of Hiroshi
Yoshikawa. As I am not a specialist of replication nor a very old
specialist of B. subtilis I shall not emphasize the role
of Hiroshi Yoshikawa in these domains: his list of publications
speaks by itself. However I wish to say that, in contrast to the
usual western belief that it is difficult to work with Japanese
scientists (as all beliefs it is something said by hearsay, and
by people who, of course, have no experience of the matter) I had
exactly the opposite impression. It is not the place to discuss
the difficulties of international scientific collaborations (there
are countries which act in a way that I would not hesitate to name "unethical",
as if it were a matter of fact), but I must say that I feel that
the European-Japanese collaboration has been a model for future
collaborations, and that I certainly wish to go on with as many
such future collaborations as possible. And this is certainly due
to the atmosphere created by Hiroshi Yoshikawa. Not only did we
freely exchange information and materials, but each of us tried
his best to work as fast as possible, and to make the project a
real team work. There have not be many successes in genomics aside
from those in the USA, but I feel that the Bacillus subtilis program
is one of those rare successes, and that it owes much to the scientific
insight and strong and kind personnality of Hiroshi Yoshikawa.
Let us greet him for that and wish him many future years of positive
impact in Science (and why not in entomology?)!

* A sequel: Sequencing the Bacillus
subtilis genome
was a very difficult task at the time, because it required cloning
DNA fragments in an Escherichia coli host, where B.
subtilis DNA is often so highly
expressed that it behaves as if it were toxic. As said above,
the genome project was a result of the work of a consortium.
Taken together these constraints resulted in a sequence which
could not be error-free. The genome has been entirely resequenced
and entirely reannotated in 2007-2008, with novel techniques which
do not ask for cloning. The present sequence is supposed to have
only a very low level of errors. It is accessible at the INSDC
entry point EMBL-EBI with the accession number AL009126.
A few days after this article was accepted for publication
our colleague and friend Frank Kunst passed away (april 2nd,
2009), and this article is dedicated to his memory.
V Barbe, S Cruveiller, F Kunst, P
Lenoble, G Meurice, A Sekowska, D Vallenet,
TZ Wang, I Moszer, C Médigue, A Danchin
From a consortium sequence to a unified sequence: The Bacillus
subtilis 168 reference genome a decade later
Microbiology-SGM (2009) in press 
Frank
Kunst was instrumental in the setting up and management of the Bacillus
subtilis genome project, and his tenacity
permitted us to get the project on track, despite multiple admnistrative
and technical hurdles. Without him, the project would not have
come to completion. Illustrating the difficulty of the entreprise,
the Bacilllus
subtilis genome
sequence, completed in 1997, remained for five years the only
sequence of an A+T-rich model Firmicute. Frank Kunst was at the
origin of many other genome projects and he also kept interested
in the outcome of the B. subtilis re-sequencing project.
Among his last contributions is his noteworthy role in the organisation
of a sequencing team for the fast identification of a new variant
of the chikungunya virus which invaded the southern french island
La Réunion. This made it particularly untimely and unwise his
retirement at a time when we are certainly not finished with
(re)-emerging diseases. Compulsory retirement was a trigger for
a dangerous depression that finally killed him. It is essential
to remember, when using knowledge created by investigators we
often tend to ignore, that discoveries are the result of a collective
entreprise. Probably more than fashionable scientists, Frank
Kunst in his way is thus at the origin of many more discoveries
than many would like to do, or that they tend to attribute to
their own merits.
Some references in genomics where
Frank Kunst was a leader:
F Kunst, A Vassarotti, A Danchin
Organization of the European Bacillus subtilis genome
sequencing project
Microbiology (1995) 141:249-255
F Kunst,
N Ogasawara, I Moszer, AM Albertini, G Alloni, V Azevedo, MG
Bertero, P Bessières, A Bolotin, S Borchert, R Borriss,
L Boursier, A Brans, M Braun, SC Brignell, S Bron, S Brouillet,
CV Bruschi, B Caldwell, V Capuano, NM Carter, SK Choi, JJ Codani,
IF Connerton, NJ Cummings, RA Daniel, F Denizot, KM Devine,
A Düsterhöft, SD Ehrlich, PT Emmerson, KD Entian,
J Errington, C Fabret, E Ferrari, D Foulger, C Fritz, M Fujita,
Y Fujita, S Fuma, A Galizzi, N Galleron, SY Ghim, P Glaser,
A Goffeau, EJ Golightly, G Grandi, G Guiseppi, BJ Guy, K Haga,
J Haiech, CR Harwood, A Hénaut, H Hilbert, S Holsappel,
S Hosono, MF Hullo, M Itaya, L Jones, B Joris, D Karamata,
Y Kasahara, M Klaerr-Blanchard, C Klein, Y Kobayashi, P Koetter,
G Koningstein, S Krogh, M Kumano, K Kurita, A Lapidus, S Lardinois,
J Lauber, V Lazarevic, SM Lee, A Levine, H Liu, S Masuda, C
Mauël, C Médigue, N Medina, RP Mellado, M Mizuno,
D Moesti, S Nakai, M Noback, D Noone, M O'Reilly, K Ogawa,
A Ogiwara, B Oudega, SH Park, V Parro, TM Pohl, D Portetelle,
S Porwollik, AM Prescott, E Presecan, P Pujic, B purnelle,
G Rapoport, M Rey, S Reynolds, M Rieger, C Rivolta, E Rocha,
B Roche, M Rose, Y Sadaie, T Sato, E Scalan, S Schleich, R
Schroeter, F Scoffone, J Sekiguchi, A Sekowska, SJ Seror, P
Serror, BS Shin, B Soldo, A Sorokin, E Tacconi, T Takagi, H
Takahashi, K Takemaru, M Takeuchi, A Tamakoshi, T Tanaka, P
Terpstra, A Tognoni, V Tosato, S Uchiyama, M Vandenbol, F Vannier,
A Vassarotti, A Viari, R Wambutt, E Wedler, T Weitzenegger,
P Winters, A Wipat, H Yamamoto, K Yamane, K Yasumoto, K Yata,
K Yoshida, HF Yoshikawa, E Zumstein, H Yoshikawa, A Danchin
The complete genome sequence of the gram-positive bacterium Bacillus subtilis
Nature (1997) 390: 249-256
Comment by Chet
Raymo in the Boston Globe
F Chetouani, P Glaser, F Kunst
DiffTool: building, visualizing and querying protein clusters
Bioinformatics (2002) 18: 1143-1144
P Glaser, C Rusniok, C Buchrieser, F
Chevalier, L Frangeul, T Msadek, M Zouine, E Couvé, L Lalioui,
C Poyart, P Trieu-Cuot,
F Kunst
Genome sequence of Streptococcus agalactiae, a pathogen
causing invasive neonatal disease
Mol Microbiol (2002) 45: 1499-1513
E Duchaud, C Rusniok, L Frangeul, C Buchrieser, A Givaudan,
S Taourit, S Bocs, C Boursaux-Eude, M Chandler, JF Charles,
E Dassa, R Derose, S Derzelle, G Freyssinet, S Gaudriault,
C Médigue, A
Lanois, K Powell, P Siguier, R Vincent, V Wingate, M Zouine,
P Glaser, N Boemare, A Danchin, F Kunst
The genome sequence of the entomopathogenic bacterium Photorhabdus
luminescens
Nat Biotechnol (2003) 21: 1307-1313
L Frangeul, P Glaser, C Rusniok, C Buchrieser, E Duchaud,
P Dehoux,
F Kunst
CAAT-Box, Contigs-Assembly and Annotation Tool-Box for genome sequencing
projects
Bioinformatics (2004) 20: 790-797
V Barbe, S Cruveiller, F Kunst, P Lenoble,
G Meurice, A Sekowska, D Vallenet, TZ Wang, I Moszer, C Médigue,
A Danchin
From a consortium sequence to a unified sequence: The Bacillus
subtilis 168 reference genome a decade later
Microbiology (2009) 155: 1758-1775 
|