Hard Times
Charles Dickens
Glossary of terms relevant to genomics
Note: Within the glossary, words in italics have their own entry
Algorithm
A description of a method to solve a problem in terms of elementary,
precise operations. When expressed in a particular language, the
algorithm is called a program. Cellular processes to make macromolecules
are algorithmic in the form “begin, do: [if Condition then Action,
check Control Points, repeat], end”
Allele
A particular variant of a gene, found in part of a population
of individuals of the same species. Most alleles of a given gene are
functionally equivalent (and therefore invisible, except when the gene
is sequenced)
Amino acid
The basic building block of proteins. Twenty aminoacids
are found to make all proteins of living organisms. They are
represented by the letters of the alphabet, except B, J, O, U, X and
Z. Aminoacids carry both an amino and a carboxyl chemical group. These
group can react together, forming a covalent bond after eleminating a
water molecule. This allows for the formation of a chain of
aminoacids, named a polypeptide (oligopeptide when it is short, i.e.
shorter than approximately twenty residues). There are a few
exceptions to this universality, and proline is an imino acid (or more
precisely a cyclic secondary amine) rather than an amino acid. Each
amino acid (except glycine) has two isomers. Nineteen of the amino
acids in proteins are made up of a particular isomer, denoted
L- (for "levorotatory")
Archaea (archaebacteria)
A class of prokaryotic organisms (i.e. organisms
without a well formed nucleus). Archaea often live in extreme
environments. They are only distantly related to eubacteria (now
domain Bacteria), in physico-chemical and genetic
terms. This distance, as well as certain features of their membrane
structure, defines them as a class in their own right. They make one
of the three domains (sometimes named kingdoms) of living things. The
most original feature of Archaea is that their membrane is built up on
a stereoisomer of glycerol-phosphate that differs from that in
Bacteria and Eukarya. This is the likely cause of the fact that
Archaea are never pathogens.
Automata
Abstract models of machines to perform computations from an input by
moving through a series of intermediate states. When an automaton sees
a symbol as input, it changes to another state according to an
instruction (given by a transition function). The stored-program
computer and the living cell are two concrete realizations of an
automaton
Bacteria
One of the three domains of life made of cells usually devoid of a
nucleus (i.e. where the chromosome(s) are free in the cytoplasm), with
a membrane with standard phospholipids, and frequently one or two
membranes surrounding the cytoplasm and rigidified by a complex
amino-acid/aminosugar sacculus (murein). At the present time, the
three domains of life are defined by the phylogenetic tree of the RNA
making the small subunit of the ribosome, the nanomachine
making proteins, because this tree can be split into three
major branching (Archaea, Bacteria and Eukarya).
Other nanomachines evolved in trees with different structures,
suggesting that there is no single ancestor of life as we know it
(contrary to the common "adamite" view, similar to the idea that there
is a single origin of man).
Base
The chemical element which distinguishes one nucleotide from
another and therefore carries the genetic information. It is made of
an “aromatic cycle” or ring of carbon and nitrogen atoms. There are
five main bases: two purines, adenosine (A), and guanine (G), and
three pyrimidines, cytosine (C), thymine (T) (found only in DNA),
and uracil (U) (found only in RNA). In the double helix of DNA in the
chromosomes, the bases face each other in complementary pairs joined
by hydrogen bonds: A pairs with T, G with C, C with G and T with A.
The length of a gene, a chromosome or a genome is given as a number of
base pairs, kilobases (kb) or megabases (Mb)
Cenome
Genes making the genome of an organism can be split into three
categories. The core genome, which permits life as we know it,
coupling reproduction of the cell machinery and replication of the
genetic program is coded by a set of persistent genes which, when
analysed in the way they are distributed in bacterial genomes, are
consistent with a scenario of the origin of life. This set of genes
has been named the paleome. It defines the operating system of
the cells of a given species. Another set of genes correspond to life
in context (managing the applications run by the operating system, the
paleome). When identified in the different strains of a given species
the genes permitting life in context add up and make a particular
category of genes, named for this reason the cenome (after
biocenose, a concept created by Karl Möbius in 1877, to express what
we tend to name today the ecosystem). A third component of the genome
codes for differentiated structures, such as those fixing nitrogen in
blue algae (Cyanobacteria), or spores in a multitude of organisms. It
is named the histome (from ἳστος, tissue)
Chloroplast
An organelle in the cells of the green parts of plants,
containing chlorophyll. It is responsible for fixing CO2
from the air and producing oxygen, by photosynthesis. Chloroplasts
derive from bacteria of the cyanobacteria type (also known as
blue-green bacteria, and formerly as blue-greed algae), which have
become established within the cells. Their chromosome includes
only a few genes: the rest have been transferred to the nucleus
Chromosome
The nucleus of the cell (in eukaryotes) or the cell itself (in
prokaryotes) contains the physical material of heredity, in the
form of one or several chromosomes. As their Greek name indicates
(χρωμα, chroma, means “color”), these structures can be stained to
make them visible with a microscope, and this is how they were
discovered, well before their function was understood. A chromosome is
made of a DNA molecule, whose sequence of bases
constitutes the organism’s genome. The very long molecule of
DNA is folded up in a complicated fashion, and is associated with all
sorts of proteins responsible for its compaction, replication
and transcription
Codon/Anticodon
A set of three successive nucleotides. Codons in messenger RNA
are read in phase with the start codon, and are recognized by a
complementary triplet of nucleotides (anticodon) of a transfer RNA,
which carries the specific amino-acid for that codon. With four
nucleotides, permutations of three out of four give 64 codons, of
which 61 code for specific amino-acids, using the rule of the
genetic code; one (AUG) codes for an amino-acid (methionine) and is
also a signal for the start of translation; and three (UAA,
UAG and UGA) are “stop” codons, with no corresponding amino acid
Coenzyme
A molecule associated to an enzyme and allowing and
orientating its catalytic role.
Cytoplasm
The internal cell medium surrounding the nucleus and organelles.
More like a gel than a watery solution, its organisation is not very
well understood, especially in bacteria
DNA
DNA or DeoxyriboNucleic Acid is a macromolecule made up of linked nucleotides
and forming the chromosomes. It is usually a double strand but
can be single-stranded in some organisms (viruses). The backbones of
the two strands twist around each other to form the famous double
helix, with the bases on the inside of the helix, facing each other in
complementary pairs: A pairs with T, G with C, C with G and T with A.
In the backbones, a phosphate group forms a phosphodiester bond (-
C-O-(O)P(O)(-O-C-) between the deoxyribose (sugar) molecules of two
successive nucleotides
Endoplasmic reticulum
A membranous network in the cytoplasm of eukaryotic
cells; generally associated with ribosomes during translation
Enzyme
An enzyme is a catalyst which is specific to a particular chemical
reaction. The molecules found in cells can react with each other in an
infinite variety of ways. However these reactions do not usually
happen spontaneously, or only happen very slowly, because of a
particular chemical constraint: the need to overcome an activation
energy barrier. Imagine two ponds, one higher than the another, and
separated by a dam. For the water to flow from one to the other, the
dam has to be lowered. The role of an enzyme is both to lower the
activation energy barrier, and to align the molecules involved in the
reaction (the substrates) in their correct positions to make the new
products concerned. Note that enzymes are so specific that the same
two substrates can result in different products, depending on the
nature of the enzyme, because this aligns them differently in relation
to each other. Most enzyme names end in -ase, (eg. DNA polymerase) but
some end in -in (eg. subtilisin)
Eubacteria
A class of prokaryotic organisms, including the most familiar
bacteria. One of the three kingdoms of living things (domain - kingdom
- name #Bacteria). Originally all organisms without a nucleus
were thought to belong to the same group, but now eubacteria are
considered to be as distinct from archaebacteria as from eukaryotes.
It was long thought that a particular class of bacteria, the
Planctomycetes, had a primitive nucleus. It has recently been shown
that this is an image artefact, as the cytoplasmic membrane folds
actually inside the cell, and is seen in projection as a kind of
nuclear envelope.
Eukaryote
A living organism whose cells have a nucleus. The processes of
DNA replication, its transcription into RNA and
translation into proteins are therefore physically
separated. One of the three main kingdoms of living things (domain -
kingdom - name #Eukarya), this group includes some
single-celled organisms, such as the yeasts, and most multicellular
organisms. It was thought for some time that bacteria of the family
Planctomycetes had a primitive nucleus. This structure is actually a
complicated folding of the bacterial envelope, which, in projection,
gives the impression that a nuclear envelope is present in the cell.
Exon / Intron
The genes of eukaryotes contain sections called exons
and introns: exons are the part of the DNA which is expressed;
introns are stretches of DNA that intervene between the coding parts.
Introns are transcribed along with the exons but are later removed
during the formation of a mature messenger RNA. The role of
introns is not well understood, but they are almost certainly not
“junk”. They certainly have a regulatory role, a role in the accuracy
of the DNA replication process, and may act as timers or spacers
Gene
The gene is the unit of heredity. It defines a product (either RNA
or a protein) as well as control elements required for the
organized synthesis of its product or products. The definition of a
gene has varied considerably (the original definition based on
Mendel’s work was an operational one, and made no reference to a
physical structure). It is often easy to define the coding part of a
gene: this is the part of the chromosome which codes for a product,
usually a protein. It is much more difficult to define its physical
limits. A great deal of debate has resulted from this, often heated.
These are not just academic quarrels, but a crucial problem if
efficient databases are to be set up to manage biological knowledge.
Precise definitions are essential if information is to be handled by
computer
Genetic Code
The rule by which the triplets of ribonucleotides (codons) in RNA
are translated into the amino-acids in proteins.
Virtually universal for all organisms, the code uses sixty-one codons
to specify twenty amino-acids (the code is redundant or “degenerate”).
One codon represents both an amino-acid (methionine) and the beginning
of translation. Three codons (UAA, UAG and UGA) have no amino-acid
counterpart and cause translation to stop. The genetic code
must not be confused with the concept of “lines of code” in a computer
program
Genome
The genome is the organized collection of all an organism’s DNA,
regarded as a text written in a four-letter alphabet. It includes not
only all the genes but also a large number of intergenic
regions with multiple functions, particularly regulatory and
architectural functions
Genome
transplantation
Experimental approach where a synthetic new genetic program replaces
entirely the genetic program of a bacterial cell. The fact that the
cells readily express the new chromosome, in the case of
Mycoplasma bacteria, shows that the genetic program is separated from
the cellular machine as in a computer. This comes up as another proof
of concept of the cell as a Turing Machine. Like in a
computer, in which a program does not run if it is not properly
recognized by the machine, one cannot expect any random genome
transplantation to be always productive
Genotype
The concept of genotype was invented well before the nature of DNA,
the hereditary material, was understood, especially its nature as an
alphabetic text. An abstract concept, genotype refers just to the set
of genes of an organism, taking account of the fact that in a
given species, the same gene can have several different variants (alleles).
It does not take into account the way they are organized into a
coherent text, and for this reason it is likely to become obsolete
quite quickly (except in the genotype / phenotype distinction), to be
replaced by genome (which is however a more concrete concept, and
refers to the text of the genetic program)
Isomer
The molecules that make up living organisms are essentially composed
of six atoms, carbon (C), hydrogen (H), nitrogen (N), oxygen (O),
phosphorus (P) and sulfur (S). We can therefore give them a first
characterization by the number of atoms of each type. Yet these atoms
can combine in various ways. Two isomers are compounds that have the
same chemical composition, but a different organization. Among the
isomers, some resemble each other like an object and its image in a
mirror, they are stereoisomers. Louis Pasteur discovered that tartaric
acid isolated from wine was a particular isomer, different from
the mixture of the two isomers produced by chemistry in the
laboratory. This made him discover an original character of life and
argue against spontaneous generation
Membrane
A structure formed from a double layer of asymmetrical molecules
(lipids – hydrophobes – with a hydrophilic head) and proteins,
and which separates the cell compartments. The cytoplasmic
membrane separates the inside of the cell from the outside. It may be
enclosed in a more complex envelope, with associated structures which
give the cell a firm shape, for instance the rod shape of some bacteria
(bacilli)
Metabolism
The sum of all the physico-chemical changes which take place within a
living organism. Most of the reactions involved are produced by the
action of enzymes. Metabolism stops with death. There is an
intermediate state, which can be called dormancy, where the organism’s
vital activity is suspended. It cannot be said to be alive until
metabolism begins again
Microbiota
Plants and animals establish structured interactions with dynamic
communities of viruses, bacteria, and fungi, collectively known as the
microbial flora or microbiota. This symbiotic relationship ranges from
commensal to mutualistic or pathogenic, depending on the composition
of the microbiota or the immune status of the host. In plants
microbiota make epiphytic (on the surface of the whole plant) or
endophytic (within the plant) interactions. The same is true for
animal where some members of microbiota make endosymbiotic
associations
Mitochondria
These are organelles found in most eukaryotic cells, and
responsible for energy management, via the use of oxygen. Mitochondria
are symbiotic bacteria which have degenerated (Paul Portier Les
Symbiotes (The Symbionts),1918) and their genome has
been reduced to a very small number of genes. The rest have been
transferred to the nucleus. Their core function, in fact, is
not management of energy but synthesis of iron-sulfur prosthetic
groups, essential for catalysis in many enzymes. The singular is
“mitochondrion”
Molecular chaperone
An auxiliary protein, of a family whose members are involved
in the correct folding of the amino-acid chain of most
proteins
Nucleotide
The basic component of nucleic acids. Each nucleotide is made up of a
sugar molecule with five carbon atoms (ribose for RNA and
deoxyribose for DNA); one of five bases composed of
carbon and nitrogen; and one to three phosphate groups (each group has
one phosphorous atom in the center of a tetrahedron of four oxygen
atoms). The number of phosphate groups determines how energy-rich the
nucleotide is. There are four deoxyribonucleotides, written dA, dC, dG
and dT, and four ribonucleotides, rA, rC, rG and rU, but the “d” or
“r” is omitted when there is no ambiguity (most of the time). A string
of a few nucleotides is called an oligonucleotide; a long string is
called a polynucleotide. The origin of nucleotides is one of the most
pressing question about the origin
of life
Nucleus
An organelle found in the cells of eukaryotic
organisms, formed from a complex envelope, and containing the chromosomes
Operon
In prokaryotes, transcription can lead to the synthesis of a
messenger RNA coding for several proteins, not just
one as is almost always the case with eukaryotes. A transcription
unit like this, with its regulation system, is called an operon
Organelle
Eukaryotic cells contain organelles, structures visible
with an optical microscope and which are generally easy to isolate
using appropriate physico-chemical means (especially centrifugation).
The most important ones are the nucleus (which contains the chromosomes),
the mitochondria (which contain a chromosome whose genome
codes for only a few genes), and, in plant cells, chloroplasts
(which also contain a chromosome). There are many other more varied
types of organelles, whose functions are less universal. The
ribosomes, very small organelles made of RNA and proteins,
and which are the site of translation, have long been
recognized in all cells, not just eukaryotes. Ribosomes
are mainly visible under electron microscopy, as they have a diameter
of about 20 nm (20 millionths of a millimeter)
Paleome
Because ubiquitous functions are not necessarily the result of the
same structures, there is no ubiquitous gene. Yet, when a gene codes
for a function that needs to be present everywhere, it tends to be
conserved in the progeny of the organism which harbours it. This means
that some genes tend to persist in a large number of genomes. Analysis
of gene persistence permitted identification of two major classes of
genes. Genes which cannot be inactivated without immediate loss of the
capacity to propagate life (this corresponds to most genes of the replication
/ transcription / translation machinery) and genes that appear
to be dispensable, at least for some generations. The class of
persistent genes can be organised in a network that recapitulates the
general features of a mineral scenario of the origin of life. It has
been named the paleome to emphasise this observation
Phenotype
The explicit manifestation of a genotype, in a given
individual. The phenotype is produced by all the individual’s genes
working together in combination with the effects of the environment.
Skin color, for instance, is the result of the activity of at least
eight genes (without counting those responsible for building the cells
of the epidermis), producing all the variety and gradation of color
seen in human skin. The difference between the concepts of genotype
and phenotype is illustrated when we get a suntan: the same set of
genes (the same genotype) can lead to different phenotypes – very pale
or very dark skin – depending on the amount of exposure to
ultra-violet radiation. It is therefore important not to identify a
genotype through a specific phenotype, nor to attempt to predict a
phenotype on the basis of explicit knowledge of just one gene
Prion
Diseases in the spongiform encephalopathy family (“mad cow disease”
for example) seem to be caused by an “unconventional” infectious
agent, a protein called a prion. All mammals, and even much
simpler cells such as brewer’s yeast, have a protein called PrP (prion
protein). There are two forms of this, the usual non-pathogenic form,
and an abnormal form (the prion itself), which has a different shape,
and clumps together easily, forming plaques which destroy nerve cells.
This abnormal form induces the normal form of PrP to convert the
pathogenic form. Although the mechanism of the final stages of the
disease seems to be well understood, the contagion mechanism is not
clearly established
Proofreading (possibly, kinetic)
The proper functioning of protein synthesis depends on the ability of
the ribosome to decode the messenger RNA with high fidelity. When a ribosome
incorporates amino acids in a ratchet-like manner, it selects one
wrong amino acid in 10,000. It achieves this low error rate
thanks to specific proteins that, acting like Maxwell’s
demons, test (proofread) if the amino acid presented to the ribosome
is the correct one
Prokaryote
An organism without a nucleus, usually single-celled. This
group covers both eubacteria and archaebacteria. Replication,
transcription and translation take place in the same
cell compartment
Protein
A chain of amino-acids, folded up in three dimensions. The
amino-acids are linked by the expulsion of a water molecule between
the carboxylate residue (-COO-) and the amino residue (-NH3+)
of each amino-acid. A short string of amino-acids linked in this way
is called a peptide. A polypeptide is a long string of amino-acids.
Several different levels of protein architecture can be distinguished.
The amino-acids forming the polypeptide chain make up its primary
structure. This chain then folds up, producing a small number of
different types of basic elements: helices, sheets and turns. This is
the protein’s secondary structure. These elements combine with each
other to form a three-dimensional conformation, the tertiary
structure. For instance the prion protein has two different tertiary
structures, with different helices and sheets, one of which is normal
and functional, the other toxic. Proteins often form functional
complexes made up of several individual polypeptide chains, and the
term quaternary structure describes the spatial form of these
structures made of several chains. Proteins carry out numerous
functions. As enzymes, they are first of all the catalysts in
the metabolism of both small molecules and those involved in replication,
transcription and translation. They are also
responsible for the transport of metabolites. Of course, because of
their shape, they have an essential architectural role, as can be seen
with silk proteins, or those of the hair and nails. They also play a
crucial role as control elements: it is normally proteins that
determine whether a certain gene is to be transcribed, and when.
Almost all the functions of an organism are thus based on proteins
Replication
Duplication of the chromosomal DNA molecule, by separating the
two strands of the double helix and building a new complementary
strand for each one, using the correspondence rule A -> T, T ->
A, G -> C and C -> G. Each new DNA molecule thus contains one
old strand and one new strand
Restriction enzyme
An enzyme which cuts DNA at specific sites (strings of
nucleotides) – for example the enzyme EcoRI cuts after the G in
the sequence GAATTC – and is used in making recombinant DNA. A
restriction map of a chromosome is a physical map which shows
the position of sites recognized by restriction enzymes. When several
restriction enzymes are used in combination, separation of the
resulting fragments after partial restriction may make it possible to
reconstruct the map. However this is a very difficult and sometimes
impossible exercise, and now that we have direct access to the genome
sequence it is no longer very useful
Ribosome
An organelle in the cytoplasm made of several kinds of
RNA, and about fifty proteins; messenger RNA is
drawn through it and translated in protein synthesis
RNA
RNA or RiboNucleic Acid is a macromolecule of a single strand of nucleotides.
As in DNA, the nucleotides are made up of a phosphate group, a
sugar, and a base, joined together by phosphodiester bonds,
but in RNA the sugar is a ribose and the bases are A, C, G and U.
There are different kinds of RNA, among them:
State machine
There is a whole hierarchy of state machines. The simplest one is a
Finite State Machine (FSM) or Finite State Automaton (FSA). It
has a finite number of states with an initial state, and transitions
triggered by conditions (inputs). Apart from the states reflecting its
current situation, the FSM has no mechanism for remembering past
operations. Going up the hierarchy ladder, more sophisticated state
machines are augmented with an increasingly versatile storage
facility. The Turing Machine is on top with no restriction on
its number of states. The information treated by a FSM resides in its
states and its inputs. In the case of a Turing Machine, the
information is also stored as symbols on the tape
Transcription
Rewriting of a stretch of DNA into RNA, using the
correspondence rule A -> U, T-> A, C -> G and G-> C. Note
that this is not exactly the same as in replication, where A
corresponds to T
Translation
The rewriting of an mRNA in the form of a string of amino-acids
(a protein). Successive codons of mRNA are read in
phase with a start codon (usually AUG), and the corresponding
amino-acid is added to the string in accordance with the genetic
code
Turing Machine
An abstract machine with a virtual head to read and write symbols
structured in words (sequences) from an infinite tape, focusing on one
symbol at a time. A ribosome translating the information from
a messenger RNA into a protein can be described as a
Turing Machine, with striking physical similarity
Vectors, phages and plasmids
In genetic engineering, a vector is an autonomous replicating unit
into which scientists insert (clone) fragments of DNA, to
amplify them. The vector is then introduced into a host cell (usually
E. coli, but also brewer’s yeast and other organisms), where it
replicates, together with the cloned DNA it carries. There are two
main types of vectors. They may be mini-chromosomes called
plasmids, most of which are double-stranded circles of DNA. These are
not essential to the life of the cell, but they give it particular
properties, such as resistance to an antibiotic. Or they may be
viruses, and when these infect bacteria they are called
bacteriophages, or phages for short. There are two types of phages:
virulent phages infect their host and multiply until they burst the
cell open, killing it (this is a lytic cycle). Temperate phages do not
always kill their host, and many of them have an original method of
multiplication. They can choose either to act like virulent phages for
a time, setting off a lytic cycle (and killing their host), or they
can remain hidden in their host in the form of a plasmid, or even by
integrating themselves into the host chromosome in the form of a
prophage (like a Trojan horse). In this case the lytic cycle will only
be set off if particular events occur in which the survival of the
phage is paramount. For instance if the host’s survival is threatened
by its environment, the lytic cycle is set off, allowing the virus to
escape instead of dying in the prophage state