Now, what I want is, Facts. Teach these boys and girls nothing but Facts. Plant nothing else, and root out everything else. You can only form the minds of reasoning animals upon Facts: nothing else will ever be of any service to them. This is the principle on which I bring up my own children, and this is the principle on which I bring up these children. Stick to Facts, sir!"

Hard Times
Charles Dickens

Related Themes

Glossary of terms relevant to genomics

A first version was extracted from The Delphic Boat, where detailed explanations of many concepts may be found. A few specific definitions were extracted from the i2CELL article exporing the cells as computers making computers model.

Note: Within the glossary, words in italics have their own entry

A description of a method to solve a problem in terms of elementary, precise operations. When expressed in a particular language, the algorithm is called a program. Cellular processes to make macromolecules are algorithmic in the form “begin, do: [if Condition then Action, check Control Points, repeat], end”

A particular variant of a gene, found in part of a population of individuals of the same species. Most alleles of a given gene are functionally equivalent (and therefore invisible, except when the gene is sequenced)

Amino acid
The basic building block of proteins. Twenty aminoacids are found to make all proteins of living organisms. They are represented by the letters of the alphabet, except B, J, O, U, X and Z. Aminoacids carry both an amino and a carboxyl chemical group. These group can react together, forming a covalent bond after eleminating a water molecule. This allows for the formation of a chain of aminoacids, named a polypeptide (oligopeptide when it is short, i.e. shorter than approximately twenty residues). There are a few exceptions to this universality, and proline is an imino acid (or more precisely a cyclic secondary amine) rather than an amino acid. Each amino acid (except glycine) has two isomers. Nineteen of the amino acids in proteins are made up of a particular isomer, denoted L- (for "levorotatory")

Archaea (archaebacteria)
A class of prokaryotic organisms (i.e. organisms without a well formed nucleus). Archaea often live in extreme environments. They are only distantly related to eubacteria (now domain Bacteria), in physico-chemical and genetic terms. This distance, as well as certain features of their membrane structure, defines them as a class in their own right. They make one of the three domains (sometimes named kingdoms) of living things. The most original feature of Archaea is that their membrane is built up on a stereoisomer of glycerol-phosphate that differs from that in Bacteria and Eukarya. This is the likely cause of the fact that Archaea are never pathogens.

Abstract models of machines to perform computations from an input by moving through a series of intermediate states. When an automaton sees a symbol as input, it changes to another state according to an instruction (given by a transition function). The stored-program computer and the living cell are two concrete realizations of an automaton

One of the three domains of life made of cells usually devoid of a nucleus (i.e. where the chromosome(s) are free in the cytoplasm), with a membrane with standard phospholipids, and frequently one or two membranes surrounding the cytoplasm and rigidified by a complex amino-acid/aminosugar sacculus (murein). At the present time, the three domains of life are defined by the phylogenetic tree of the RNA making the small subunit of the ribosome, the nanomachine making proteins, because this tree can be split into three major branching (Archaea, Bacteria and Eukarya). Other nanomachines evolved in trees with different structures, suggesting that there is no single ancestor of life as we know it (contrary to the common "adamite" view, similar to the idea that there is a single origin of man).

The chemical element which distinguishes one nucleotide from another and therefore carries the genetic information. It is made of an “aromatic cycle” or ring of carbon and nitrogen atoms. There are five main bases: two purines, adenosine (A), and guanine (G), and three pyrimidines, cytosine (C), thymine (T) (found only in DNA), and uracil (U) (found only in RNA). In the double helix of DNA in the chromosomes, the bases face each other in complementary pairs joined by hydrogen bonds: A pairs with T, G with C, C with G and T with A. The length of a gene, a chromosome or a genome is given as a number of base pairs, kilobases (kb) or megabases (Mb)

Genes making the genome of an organism can be split into three categories. The core genome, which permits life as we know it, coupling reproduction of the cell machinery and replication of the genetic program is coded by a set of persistent genes which, when analysed in the way they are distributed in bacterial genomes, are consistent with a scenario of the origin of life. This set of genes has been named the paleome. It defines the operating system of the cells of a given species. Another set of genes correspond to life in context (managing the applications run by the operating system, the paleome). When identified in the different strains of a given species the genes permitting life in context add up and make a particular category of genes, named for this reason the cenome (after biocenose, a concept created by Karl Möbius in 1877, to express what we tend to name today the ecosystem). A third component of the genome codes for differentiated structures, such as those fixing nitrogen in blue algae (Cyanobacteria), or spores in a multitude of organisms. It is named the histome (from ἳστος, tissue)

An organelle in the cells of the green parts of plants, containing chlorophyll. It is responsible for fixing CO2 from the air and producing oxygen, by photosynthesis. Chloroplasts derive from bacteria of the cyanobacteria type (also known as blue-green bacteria, and formerly as blue-greed algae), which have become established within the cells. Their chromosome includes only a few genes: the rest have been transferred to the nucleus

The nucleus of the cell (in eukaryotes) or the cell itself (in prokaryotes) contains the physical material of heredity, in the form of one or several chromosomes. As their Greek name indicates (χρωμα, chroma, means “color”), these structures can be stained to make them visible with a microscope, and this is how they were discovered, well before their function was understood. A chromosome is made of a DNA molecule, whose sequence of bases constitutes the organism’s genome. The very long molecule of DNA is folded up in a complicated fashion, and is associated with all sorts of proteins responsible for its compaction, replication and transcription

A set of three successive nucleotides. Codons in messenger RNA are read in phase with the start codon, and are recognized by a complementary triplet of nucleotides (anticodon) of a transfer RNA, which carries the specific amino-acid for that codon. With four nucleotides, permutations of three out of four give 64 codons, of which 61 code for specific amino-acids, using the rule of the genetic code; one (AUG) codes for an amino-acid (methionine) and is also a signal for the start of translation; and three (UAA, UAG and UGA) are “stop” codons, with no corresponding amino acid

A molecule associated to an enzyme and allowing and orientating its catalytic role.

The internal cell medium surrounding the nucleus and organelles. More like a gel than a watery solution, its organisation is not very well understood, especially in bacteria

DNA or DeoxyriboNucleic Acid is a macromolecule made up of linked nucleotides and forming the chromosomes. It is usually a double strand but can be single-stranded in some organisms (viruses). The backbones of the two strands twist around each other to form the famous double helix, with the bases on the inside of the helix, facing each other in complementary pairs: A pairs with T, G with C, C with G and T with A. In the backbones, a phosphate group forms a phosphodiester bond (- C-O-(O)P(O)(-O-C-) between the deoxyribose (sugar) molecules of two successive nucleotides

Endoplasmic reticulum
A membranous network in the cytoplasm of eukaryotic cells; generally associated with ribosomes during translation

An enzyme is a catalyst which is specific to a particular chemical reaction. The molecules found in cells can react with each other in an infinite variety of ways. However these reactions do not usually happen spontaneously, or only happen very slowly, because of a particular chemical constraint: the need to overcome an activation energy barrier. Imagine two ponds, one higher than the another, and separated by a dam. For the water to flow from one to the other, the dam has to be lowered. The role of an enzyme is both to lower the activation energy barrier, and to align the molecules involved in the reaction (the substrates) in their correct positions to make the new products concerned. Note that enzymes are so specific that the same two substrates can result in different products, depending on the nature of the enzyme, because this aligns them differently in relation to each other. Most enzyme names end in -ase, (eg. DNA polymerase) but some end in -in (eg. subtilisin)

A class of prokaryotic organisms, including the most familiar bacteria. One of the three kingdoms of living things (domain - kingdom - name #Bacteria). Originally all organisms without a nucleus were thought to belong to the same group, but now eubacteria are considered to be as distinct from archaebacteria as from eukaryotes. It was long thought that a particular class of bacteria, the Planctomycetes, had a primitive nucleus. It has recently been shown that this is an image artefact, as the cytoplasmic membrane folds actually inside the cell, and is seen in projection as a kind of nuclear envelope.

A living organism whose cells have a nucleus. The processes of DNA replication, its transcription into RNA and translation into proteins are therefore physically separated. One of the three main kingdoms of living things (domain - kingdom - name #Eukarya), this group includes some single-celled organisms, such as the yeasts, and most multicellular organisms. It was thought for some time that bacteria of the family Planctomycetes had a primitive nucleus. This structure is actually a complicated folding of the bacterial envelope, which, in projection, gives the impression that a nuclear envelope is present in the cell.

Exon / Intron
The genes of eukaryotes contain sections called exons and introns: exons are the part of the DNA which is expressed; introns are stretches of DNA that intervene between the coding parts. Introns are transcribed along with the exons but are later removed during the formation of a mature messenger RNA. The role of introns is not well understood, but they are almost certainly not “junk”. They certainly have a regulatory role, a role in the accuracy of the DNA replication process, and may act as timers or spacers

The gene is the unit of heredity. It defines a product (either RNA or a protein) as well as control elements required for the organized synthesis of its product or products. The definition of a gene has varied considerably (the original definition based on Mendel’s work was an operational one, and made no reference to a physical structure). It is often easy to define the coding part of a gene: this is the part of the chromosome which codes for a product, usually a protein. It is much more difficult to define its physical limits. A great deal of debate has resulted from this, often heated. These are not just academic quarrels, but a crucial problem if efficient databases are to be set up to manage biological knowledge. Precise definitions are essential if information is to be handled by computer

Genetic Code
The rule by which the triplets of ribonucleotides (codons) in RNA are translated into the amino-acids in proteins. Virtually universal for all organisms, the code uses sixty-one codons to specify twenty amino-acids (the code is redundant or “degenerate”). One codon represents both an amino-acid (methionine) and the beginning of translation. Three codons (UAA, UAG and UGA) have no amino-acid counterpart and cause translation to stop. The genetic code must not be confused with the concept of “lines of code” in a computer program

The genome is the organized collection of all an organism’s DNA, regarded as a text written in a four-letter alphabet. It includes not only all the genes but also a large number of intergenic regions with multiple functions, particularly regulatory and architectural functions

Genome transplantation
Experimental approach where a synthetic new genetic program replaces entirely the genetic program of a bacterial cell. The fact that the cells readily express the new chromosome, in the case of Mycoplasma bacteria, shows that the genetic program is separated from the cellular machine as in a computer. This comes up as another proof of concept of the cell as a Turing Machine. Like in a computer, in which a program does not run if it is not properly recognized by the machine, one cannot expect any random genome transplantation to be always productive

The concept of genotype was invented well before the nature of DNA, the hereditary material, was understood, especially its nature as an alphabetic text. An abstract concept, genotype refers just to the set of genes of an organism, taking account of the fact that in a given species, the same gene can have several different variants (alleles). It does not take into account the way they are organized into a coherent text, and for this reason it is likely to become obsolete quite quickly (except in the genotype / phenotype distinction), to be replaced by genome (which is however a more concrete concept, and refers to the text of the genetic program)

The molecules that make up living organisms are essentially composed of six atoms, carbon (C), hydrogen (H), nitrogen (N), oxygen (O), phosphorus (P) and sulfur (S). We can therefore give them a first characterization by the number of atoms of each type. Yet these atoms can combine in various ways. Two isomers are compounds that have the same chemical composition, but a different organization. Among the isomers, some resemble each other like an object and its image in a mirror, they are stereoisomers. Louis Pasteur discovered that tartaric acid isolated from wine was a particular isomer, different from the mixture of the two isomers produced by chemistry in the laboratory. This made him discover an original character of life and argue against spontaneous generation

A structure formed from a double layer of asymmetrical molecules (lipids – hydrophobes – with a hydrophilic head) and proteins, and which separates the cell compartments. The cytoplasmic membrane separates the inside of the cell from the outside. It may be enclosed in a more complex envelope, with associated structures which give the cell a firm shape, for instance the rod shape of some bacteria (bacilli)

The sum of all the physico-chemical changes which take place within a living organism. Most of the reactions involved are produced by the action of enzymes. Metabolism stops with death. There is an intermediate state, which can be called dormancy, where the organism’s vital activity is suspended. It cannot be said to be alive until metabolism begins again

Plants and animals establish structured interactions with dynamic communities of viruses, bacteria, and fungi, collectively known as the microbial flora or microbiota. This symbiotic relationship ranges from commensal to mutualistic or pathogenic, depending on the composition of the microbiota or the immune status of the host. In plants microbiota make epiphytic (on the surface of the whole plant) or endophytic (within the plant) interactions. The same is true for animal where some members of microbiota make endosymbiotic associations

These are organelles found in most eukaryotic cells, and responsible for energy management, via the use of oxygen. Mitochondria are symbiotic bacteria which have degenerated (Paul Portier Les Symbiotes (The Symbionts),1918) and their genome has been reduced to a very small number of genes. The rest have been transferred to the nucleus. Their core function, in fact, is not management of energy but synthesis of iron-sulfur prosthetic groups, essential for catalysis in many enzymes. The singular is “mitochondrion”

Molecular chaperone
An auxiliary protein, of a family whose members are involved in the correct folding of the amino-acid chain of most proteins

The basic component of nucleic acids. Each nucleotide is made up of a sugar molecule with five carbon atoms (ribose for RNA and deoxyribose for DNA); one of five bases composed of carbon and nitrogen; and one to three phosphate groups (each group has one phosphorous atom in the center of a tetrahedron of four oxygen atoms). The number of phosphate groups determines how energy-rich the nucleotide is. There are four deoxyribonucleotides, written dA, dC, dG and dT, and four ribonucleotides, rA, rC, rG and rU, but the “d” or “r” is omitted when there is no ambiguity (most of the time). A string of a few nucleotides is called an oligonucleotide; a long string is called a polynucleotide. The origin of nucleotides is one of the most pressing question about the origin of life

An organelle found in the cells of eukaryotic organisms, formed from a complex envelope, and containing the chromosomes

In prokaryotes, transcription can lead to the synthesis of a messenger RNA coding for several proteins, not just one as is almost always the case with eukaryotes. A transcription unit like this, with its regulation system, is called an operon

Eukaryotic cells contain organelles, structures visible with an optical microscope and which are generally easy to isolate using appropriate physico-chemical means (especially centrifugation). The most important ones are the nucleus (which contains the chromosomes), the mitochondria (which contain a chromosome whose genome codes for only a few genes), and, in plant cells, chloroplasts (which also contain a chromosome). There are many other more varied types of organelles, whose functions are less universal. The ribosomes, very small organelles made of RNA and proteins, and which are the site of translation, have long been recognized in all cells, not just eukaryotes. Ribosomes are mainly visible under electron microscopy, as they have a diameter of about 20 nm (20 millionths of a millimeter)

Because ubiquitous functions are not necessarily the result of the same structures, there is no ubiquitous gene. Yet, when a gene codes for a function that needs to be present everywhere, it tends to be conserved in the progeny of the organism which harbours it. This means that some genes tend to persist in a large number of genomes. Analysis of gene persistence permitted identification of two major classes of genes. Genes which cannot be inactivated without immediate loss of the capacity to propagate life (this corresponds to most genes of the replication / transcription / translation machinery) and genes that appear to be dispensable, at least for some generations. The class of persistent genes can be organised in a network that recapitulates the general features of a mineral scenario of the origin of life. It has been named the paleome to emphasise this observation

The explicit manifestation of a genotype, in a given individual. The phenotype is produced by all the individual’s genes working together in combination with the effects of the environment. Skin color, for instance, is the result of the activity of at least eight genes (without counting those responsible for building the cells of the epidermis), producing all the variety and gradation of color seen in human skin. The difference between the concepts of genotype and phenotype is illustrated when we get a suntan: the same set of genes (the same genotype) can lead to different phenotypes – very pale or very dark skin – depending on the amount of exposure to ultra-violet radiation. It is therefore important not to identify a genotype through a specific phenotype, nor to attempt to predict a phenotype on the basis of explicit knowledge of just one gene

Diseases in the spongiform encephalopathy family (“mad cow disease” for example) seem to be caused by an “unconventional” infectious agent, a protein called a prion. All mammals, and even much simpler cells such as brewer’s yeast, have a protein called PrP (prion protein). There are two forms of this, the usual non-pathogenic form, and an abnormal form (the prion itself), which has a different shape, and clumps together easily, forming plaques which destroy nerve cells. This abnormal form induces the normal form of PrP to convert the pathogenic form. Although the mechanism of the final stages of the disease seems to be well understood, the contagion mechanism is not clearly established

Proofreading (possibly, kinetic)
The proper functioning of protein synthesis depends on the ability of the ribosome to decode the messenger RNA with high fidelity. When a ribosome incorporates amino acids in a ratchet-like manner, it selects one wrong amino acid in 10,000. It achieves this low error rate thanks to specific proteins that, acting like Maxwell’s demons, test (proofread) if the amino acid presented to the ribosome is the correct one

An organism without a nucleus, usually single-celled. This group covers both eubacteria and archaebacteria. Replication, transcription and translation take place in the same cell compartment

A chain of amino-acids, folded up in three dimensions. The amino-acids are linked by the expulsion of a water molecule between the carboxylate residue (-COO-) and the amino residue (-NH3+) of each amino-acid. A short string of amino-acids linked in this way is called a peptide. A polypeptide is a long string of amino-acids. Several different levels of protein architecture can be distinguished. The amino-acids forming the polypeptide chain make up its primary structure. This chain then folds up, producing a small number of different types of basic elements: helices, sheets and turns. This is the protein’s secondary structure. These elements combine with each other to form a three-dimensional conformation, the tertiary structure. For instance the prion protein has two different tertiary structures, with different helices and sheets, one of which is normal and functional, the other toxic. Proteins often form functional complexes made up of several individual polypeptide chains, and the term quaternary structure describes the spatial form of these structures made of several chains. Proteins carry out numerous functions. As enzymes, they are first of all the catalysts in the metabolism of both small molecules and those involved in replication, transcription and translation. They are also responsible for the transport of metabolites. Of course, because of their shape, they have an essential architectural role, as can be seen with silk proteins, or those of the hair and nails. They also play a crucial role as control elements: it is normally proteins that determine whether a certain gene is to be transcribed, and when. Almost all the functions of an organism are thus based on proteins

Duplication of the chromosomal DNA molecule, by separating the two strands of the double helix and building a new complementary strand for each one, using the correspondence rule A -> T, T -> A, G -> C and C -> G. Each new DNA molecule thus contains one old strand and one new strand

Restriction enzyme
An enzyme which cuts DNA at specific sites (strings of nucleotides) – for example the enzyme EcoRI cuts after the G in the sequence GAATTC – and is used in making recombinant DNA. A restriction map of a chromosome is a physical map which shows the position of sites recognized by restriction enzymes. When several restriction enzymes are used in combination, separation of the resulting fragments after partial restriction may make it possible to reconstruct the map. However this is a very difficult and sometimes impossible exercise, and now that we have direct access to the genome sequence it is no longer very useful

An organelle in the cytoplasm made of several kinds of RNA, and about fifty proteins; messenger RNA is drawn through it and translated in protein synthesis

RNA or RiboNucleic Acid is a macromolecule of a single strand of nucleotides. As in DNA, the nucleotides are made up of a phosphate group, a sugar, and a base, joined together by phosphodiester bonds, but in RNA the sugar is a ribose and the bases are A, C, G and U. There are different kinds of RNA, among them:

State machine
There is a whole hierarchy of state machines. The simplest one is a Finite State Machine (FSM) or Finite State Automaton (FSA). It has a finite number of states with an initial state, and transitions triggered by conditions (inputs). Apart from the states reflecting its current situation, the FSM has no mechanism for remembering past operations. Going up the hierarchy ladder, more sophisticated state machines are augmented with an increasingly versatile storage facility. The Turing Machine is on top with no restriction on its number of states. The information treated by a FSM resides in its states and its inputs. In the case of a Turing Machine, the information is also stored as symbols on the tape

Rewriting of a stretch of DNA into RNA, using the correspondence rule A -> U, T-> A, C -> G and G-> C. Note that this is not exactly the same as in replication, where A corresponds to T

The rewriting of an mRNA in the form of a string of amino-acids (a protein). Successive codons of mRNA are read in phase with a start codon (usually AUG), and the corresponding amino-acid is added to the string in accordance with the genetic code

Turing Machine
An abstract machine with a virtual head to read and write symbols structured in words (sequences) from an infinite tape, focusing on one symbol at a time. A ribosome translating the information from a messenger RNA into a protein can be described as a Turing Machine, with striking physical similarity

Vectors, phages and plasmids
In genetic engineering, a vector is an autonomous replicating unit into which scientists insert (clone) fragments of DNA, to amplify them. The vector is then introduced into a host cell (usually E. coli, but also brewer’s yeast and other organisms), where it replicates, together with the cloned DNA it carries. There are two main types of vectors. They may be mini-chromosomes called plasmids, most of which are double-stranded circles of DNA. These are not essential to the life of the cell, but they give it particular properties, such as resistance to an antibiotic. Or they may be viruses, and when these infect bacteria they are called bacteriophages, or phages for short. There are two types of phages: virulent phages infect their host and multiply until they burst the cell open, killing it (this is a lytic cycle). Temperate phages do not always kill their host, and many of them have an original method of multiplication. They can choose either to act like virulent phages for a time, setting off a lytic cycle (and killing their host), or they can remain hidden in their host in the form of a plasmid, or even by integrating themselves into the host chromosome in the form of a prophage (like a Trojan horse). In this case the lytic cycle will only be set off if particular events occur in which the survival of the phage is paramount. For instance if the host’s survival is threatened by its environment, the lytic cycle is set off, allowing the virus to escape instead of dying in the prophage state