Bibliometrics in biological sciences

In History Classic [Chapter I, "On Destiny"] it was said long ago that "it is not difficult to know but difficult to put it in practice."

TANG Yi-Jie 汤一介

Selected Topics

Indicators
Open access
Impact Factor
The H-Index
Other indices
Peer-review
Retractions

A Note on Bibliometrics

The number of scientists in the world is considerable (several millions), and in biology alone more than 1.5 million articles are published every year. It is therefore impossible to read most of them. Research is financed most often by public money and this makes essential to evaluate the output of the general production of scientists.

The core product of research activity is the scientific article. Obviously, because this pertains to creation of hypotheses, accumulation of hopefully significant facts, placing them in context, and discovery, evaluation cannot proceed via any type of mass process. Knowledge is not the result of a democratic vote. As in all types of highly specialised competences, only specialists well aware of the domain (would you trust a self-taught engineer to take care of the engine of the plane on which you board?) can judge whether scientific activity is relevant and creative. This is true also for distribution of knowledge: access should never be made through anonymous sources. Understanding the situation resulted in the widely accepted peer-review system. This system, which builds up its efficiency on the direct competence of specialists, is far from perfect, but it cannot be replaced by any other system because it rests on knowledge which cannot be widely shared (except for one essential element, unfortunately often forgotten, common sense). The problem with peer-review is that it cannot avoid biases, either ideological or personal. For this reason it is generally accepted that peer-review is handled anonymously, while publications, which reveal a time-dependent status of knowledge are not anonymous. This has many drawbacks (in particular it allows much unethical behaviour) which are usually remedied by involving several reviewers (at least two, often three or more) to judge one piece of work. Yet, this process is extremely time-consuming, and financing agencies have had an (unfortunate) tendency to try to substitute peer-review by automated processes. Bibliometrics is one such process.

Bibliometrics belongs to the many social sciences techniques that rest on creation of measures. In bibliometrics, a variety of measures (distances and the like) are used to evaluate the information content of articles and the performance of their authors, according to a variety of methods. Among those are the citations of work, notoriety of journals and notoriety of authors. Indices have been created, combining bibliometrics and other sociological measures to evaluate the performance of universities and countries in education and scientific domains. Particular emphasis is generally placed on the poorly defined concept of notoriety.

Interesting views on this question have been published by Jacques Ninio, by Frank Laloe and by John Ioannidis. A retraction watch has been created to identify those journals or magazines that carry over the majority of retracted articles. Remarkably, there is a significant correlation between the relative number of retractions and the Impact Factor of a journal. As a matter of fact a study published in 2011 showed that "prestigious" science journals tend to attract bad science. This is not unexpected as, contrary to most journals, there is a pre-selection step, meant to push studies that would increase the impact of the journal, made by editors who, quite often do not have a strong scientific background. This would not be a problem if this had not a considerable consequence on mass media and therefore on the general public, which tends now to distrust science.

A piece of advice provided by Phil Bourne:

Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia

Rule 1: Emphasize Publication Impact, Not Journal Impact
Rule 2: Quantify and Convince
Rule 3: Make Methods and Software Count
Rule 4: Make Web Sites Count
Rule 5: Make Data Deposition, Curation, and Other Related Activities Count
Rule 6: Use Modern Tools to Emphasize/Quantify Your Academic Standing
Rule 7: Make an Easily Digestible Quantified Summary of Your Accomplishments
Rule 8: Make the Reviewers’ Job Easy
Rule 9: Make the Job of Your References Easy
Rule 10: Do Not Oversell Yourself

Among the important consequences of the pressure exerted on scientist to publish fashionable results is biased thinking. This should, in fact, decrease the efficiency of research institutions, in terms of discoveries produced per capita (as well as per $ used in supporting research).

A common view to evaluate the content of research is to rest on fashion, assuming that what is fashionable (hence rapidly cited) is scientifically sound and interesting. A commonly used measure of fashion, the Impact Factor (or Fashion Index, IF) of a journal has been created to take this popular view (easy to communicate to politics and media) into account. The Impact Factor has been invented by a commercial company the Institute for Scientific Information (ISI), and it has an important role in generating revenues, as scholars as well as institutions wait, every year, for the novel IFs of journals, that they can only get via the company. The IF represents, for a given year, the ratio between the number of citations divided by the number of articles published by a journal, during a two years period of reference. It measures the average frequency of the quotations of all the articles of that journal cited during a defined period of time. It is a retrospective index of the short term impact of a journal. It is of course not a measure in any way of the quality of the output of a scientist, but rather, on his or her ability to cope with fashionable items, including lobbying. As a consequence high impact factor journals have now created a double screening procedure for articles they publish. A first screen is provided by journalists (in general with a very limited knowledge of any scientific domain), who decide whether the submitted work will be transmitted for further reviewing or be rejected immediately with no review. The IF is also a way, for countries hosting the corresponding journals, to make research performed by other countries, without spending a single cent on the work.

Because using the IF of journals in which an author publishes to evaluate the quality of her or his output can be very misleading (in particular it goes much against innovation) the University of Cork (Ireland), in 2009 issued in its guidelines for peer-reviewers of research the following statement:

All panels will work to an underpinning principle that all forms of research output will be assessed on a fair and equal basis. Panels will neither rank outputs, nor regard any particular form of output as of greater or lesser quality than another per se. Panels may use, as one measure of quality, evidence that the output has already been reviewed or refereed by experts (who may include users of the research), and has been judged to embody research of high quality. No panel will use journal impact factors as a proxy measure for assessing quality.

As an example is the rise and fall of a discipline: driven by the ubiquitous development of genomics, year 2002 witnessed a dramatic change in the Impact Factor of many journals. The Impact Factors of Journals, computed by the Institute of Scientific Information (ISI) for 2009 is available since mid-june 2010. It is quite interesting as it illustrates extremely well the effects of fashion in science. Indeed, after a period of a few years of celebrations genomics and bioinformatics is now on the decline. Microbiology is clearly out of fashion. In contrast, anything publishing images (and it is well known that the visual cortex is very important but quite unable to deep integration of concepts, for example) is now very successful. This will probably be trendy for the next few years. So, if you wish to be visible (yes, this is not simply a metaphor!) publish fine images. The content does no longer seem to matter much ...

The IF of genomics and bioinformatics journals has been considerably on the rise, as well as that of open access journals. It is still developing, in particular with metagenome studies because big data access will always be used, and therefore the papers that descrive first the data will be considerably cites, creating an enormous upward bias for the journals which publish that data. Nevertheless genome studies may be levelling off as new fashionable domains, such as that of Systems Biology are emerging. Because of those biases it is important to check, using Google Scholar for example, that no important references from an author have not been missed by the ISI. This is particularly important when analyzing the track record of young scientists, who might be discriminated against simply because they did not publish their important work in journals immediately tracked by the ISI. Furthermore the way the ISI "analyses" the output of investigators mixes up all kinds of publications, including work that is not meant to be cited (secondary publications in popularization magazines, for example) so that a superficial use of the automatic indicators is only valid for scientists who do not communicate with the public and follow the mainstream trends. Of course plagiarism play an unfortunate role in distorting indicators: an interesting view of the situation can be obtained by browsing PubPeer site . For the time being Google Scholar is more reliable than the ISI (except for papers published earlier than the beginning of the Internet, ca 1985).

In January 2009 the Open Access Journal PLoS ONE created a series of alternative metrics to the traditional Impact Factor, at the ScienceOnline'09 conference in Research Triangle Park, North Carolina. Many other indexes have been further proposed.

Many features of scientific publications are relevant to game theory, with no direct connection with the scientific content supposed to be carried by articles. A remarkable study shows why current publication practices may distort science demonstrating that "The current system of publication in biomedical research provides a distorted view of the reality of scientific data that are generated in the laboratory and clinic" with a considerable bias towards overestimation of the quality of work published in high IF journals... Many examples of the situation can be found. Late 2008 a retraction of a high profile study on a long sought for abscisic receptor in plants is a further demonstration of the unfortunate situation we have now reached. For a list of high profile retractions in 2010 see The Scientist, but there is many many more!, see Retraction watch.

A fair use of bibliometric indicators (beware of cheaters)

Open access (making access to Science free for all)

The Impact Factor (the impact of a journal, not of a work or a scientist)

The H-index (the citation level of an author)

Other indices (notoriety, immediacy, SCimago...)

Before investigating further the nature of bibliometrics indicators such as the IF, it is essential to exercise common sense, and to consider that the aim of research is discovery, not making oneself known. By definition a discovery cannot be predicted, and because it is new, it often takes time to be recognized. In a world where emphasis is placed on the futile, on what is important one day and forgotten the next day, where crooners make the headlines, where money replaced moral values, it is unavoidable that many scientists are tempted by the limelight. Some scientific magazines, whose aim is profit, take the full measure of this unfortunate situation and play on indicators which best fit their money-driven goal. We hope that the vast majority of our colleagues are still motivated by the quest for Knowledge, and that they will resist the temptation of facility, which would make them evaluate their peers with a gross usage of bibliometric indicators, rather than by analyzing the actual content of their work. The following paragraphs are meant to help them in this endeavour. It should finally be noticed that journals producing images are systematically biased positively, demonstrating that the role of structured language is much less important in the way science is produced at the moment than the ever-growing power of images.

A fair use of bibliometric indicators (A further analysis)

As remarked by the late Maurice Hofnung (1942-2001), many factors affect bibliometrics indicators:

1- The number of citations dramatically depends on the research domain, on the number of scientists publishing in the domain, on the number of publications in the domain. Medical sciences, for example, have an impact factor (see below for a definition) which is often considerably multiplied as compared to biochemistry, just because of the sheer number of publications and scientists in the domain (many publications come from hospitals all over the world, and a large number are simple case studies). This is however compounded by the fact that a large domain also has a large number of journals. In addition, medical journals contain many articles that are not peer-reviewed, so that up to 40% of the IF is due to references to non peer-reviewed articles in these journals! It is therefore expected to find medical publications, or publications dealing with medical subjects in the top IF publications, even when they would be quite average in other domains. Inside medical sciences, it is better to be an immunologist than a clinician, for example. In contrast, zoology or molecular microbiology would fare low. If one absolutely wishes to use IFs, a correct way to appreciate a domain is then to calculate a "relative impact factor" which standardizes the IF by dividing it by the IF of the highest impact journal in the domain. For example the reference journal in cell biology (cytology) is the journal Cell, while its counterpart in microbiology is Environmantal Microbiology: comparing scientists in both domains would benefit from comparing their citation record in the perspective of the relative IFs of these journals (a ratio of 5 to 6 in the significant number of citations). Authors who publish successful methods can have a huge impact (see protein dosage, plasmid preparation, software for protein model construction and the like) and this contributes to the impact of a journal. In the same way, publishing big data generates a huge number of citations. In contrast, authors who take some time to popularize science in popular magazines are discriminated against as soon as the corresponding journals are included at the ISI. Indeed, articles in these journals are not meant to be cited, but read by a general public, so that this will immediately impact on the average yearly citations of the authors who think that it is important to promote interest for Science in the general public.

2- The bibliometric profile depends on the history of the domain. Generalist magazines such as Nature or Science have a high impact factor because of their format (weekly magazines) and of their status as established publications. FUthermore they do not resort to standard peer review, but make a preselection of what could be published as a function of fashion. Also they are journals with high advertisement impact, asking them to have regular contacts with the popular daily mass media. This has nothing to do with the quality of science (and, as a matter of fact, many fakes are published there and many great discoveries have been refused publication there). Thus, at the beginning of molecular biology of pathogenic bacteria, it was extremely difficult to publish in high impact generalist journals such as The EMBO Journal. This is much less so today, and many new journals appeared in the domain, as the size of the community increased (with concomitant increase in impact factor). In contrast, publications on model organisms, which reached high impact journals formerly, now are confined to much lower impact journals (as the size of the community is shrinking). Publication of genome sequences, which contributed considerably to the IF of popular magazines are now considered standard work and are published in the specialized journals of the disciplines of the corresponding topics. It is now replaced by metagenomic studies, that automatically generate a large IF because they are big data, whatever their scientific content. Bibliometrics using IFs measures the impact of a domain and not only that of the work under analysis. How can we compare, using IFs, disciplines as different today as mycology, entomology or development? Since there are difficulties to define a domain, one may compare recognized scientists known to belong to the same domain. Examples of domains are: HIV, Yersinia, protein structure, vaccinology, cellular microbiology, etc. This may allow one to situate the scientist in his/her domain. One may try to normalize for each domain by dividing by the total number of publications in the domain during the same period of time. Comparing scientists in different domains is extremely difficult: a way might be to compare the level they have in their specialized domain. Multivariate analyses may be important methods to perform the task, but they need to be used by people competent in statistics.

3- Bibliometrics measures an ensemble of factors describing the ability of a scientist to make discoveries and/or inventions and to make them known. Some scientists have the knack to make discoveries, while others help other scientists to make them, others to make their own work known, and others to make the work of others known! Putting too much emphasis on a narrow use of bibliometrics has the unfortunate consequence to make the "make-known" more important than the "make-discover" or the "make-invent". It is also an incentive for unverified or even fake experiments. In its narrow use, bibliometrics does not take into account patents (and even less the fact that a patent has been granted a licence!) or databases, and it forgets conferences, teaching, the organisation of meetings, creation of laboratories, etc

4- The bibliometric profile depends on the moment of the career of a scientist and it is rare that his/her production is constant in quality or in quantity. This should be taken into account. A new subject or a new laboratory setting will inevitably introduce a gap in scientific production, and bibliometrics should not prevent this type of innovative approach to Science! Indeed, the best reviewing committees measure the production of scientists placing it in proper context, and they are careful not to simply evaluate the quantity of output. As a rule of thumb it should not be accepted that a scientist publishes more than one article every two weeks (and usually much less), as a too large output is the sign of sloppiness, unethical behaviour and lack of proper consideration of the importance of Science. For journals it is good practice to black-list scientists who are familiar with such practices and never to use them as peer-reviewers. A sudden explosive increase in the output of a scientist should be carefully monitored, as it is often the sign that something unethical is happening.

5- Some heads of laboratories sign only the articles where they have had a significant scientific contribution. Others have the tendency to sign everything, even without reading what they sign! Some journals now demand that each individual author is identified by his/her explicit contribution. This practice should be generalized. The normal ethical behaviour is that the first author of an article is the person who performed most of the work. While this practice is still not general it is good policy, to judge a leader, to count not only his/her production, but all that coming from his/her laboratory. In any event, scientists who publish far too much (some sign 50 articles per year or even more!) should not be considered as belonging to the category of ethical scientists and should be black-listed.

6- It is now recognized that the utilization of bibliometric criteria modifies the policy of the signature of articles. There already exists scientists (especially in countries familiar with lobbying practices) who deliberately omit to cite their competitors to lower their impact. This attitude goes against the fairness in chosing citations and jeopardizes the objective use of bibliometrics. This is already reflected in the average reference lists: references of articles in the USA contain more citations from English-speaking authors than the real world-wide contribution in the domain. This is easily measured by comparing the citations offered by authors of other nations in the same domain. This bibliometric pratice should be known when scientists from diverse countries compete for a given position.

7- A study has shown that authors with names difficult to write, or unfamiliar to English-speaking countries are often inaccurately spelled, and therefore not quoted properly nor counted in the citation half-life for example. It is indeed important that the spelling of the name of authors is reported without errors. Because English is the standard publication language, spelling errors in English names are less frequent than in other names (e.g. Polish names, with their many consonants often experiment spelling mistakes). Also, it is not infrequent, when a new word is created for a new concept, that it is ill-spelled (because it is absent from dictionaries) and this results in under-reporting of citations. This is another bias (fortunately not acting against Chinese, who have very simple spelling for their surnames) which goes again in favour of the extreme domination of English-speaking countries, already favored by the use of English as the basic language of communication. Of course this has nothing to do with the quality of the corresponding science. As a consequence, bibliometrics should be used with appropriate caveats, especially in non-English speaking countries.

Draft of a possible scheme for a more objective bibliometric evaluation of a scientist

A. Number of years since the first publication
B. Number of peer-reviewed publications
C. Number of ill-spelled citations (when identified)
D. Total number of citations
E.Average number of citations per year (before the five preceding years)
F. Publications in the five preceding years; give a negative value when the number of articles is higher than 30 per year
G. Number of papers cited less than 5 times (before the five preceding years)
H. Number of papers cited more than 10 times
I. Number of citations for the five most cited paper
J. Number of papers cited after 10 years
K. Average rank in publications (first=last=1;
second=penultimate=2;
other place = 3 etc)

• high influence: near 1

• highly collaborative: near 3

Of course, this is only one indicator, and, for comparative purposes, it is important to evaluate the impact of the specific domain, novelty, publication of patents, databases, etc! It is also important to check articles that were subsequently "commented" upon by other authors: the comments often underline plagiarism, sloppy experiments or even fakes...

An evident bias in favour of native English speakers has been found, and there seems also to exist a gender bias:

Gender bias in the refereeing process? Tom Tregenza
Trends in Ecology & Evolution, (June 06, 2002), 10.1016/S0169-5347(02)02545-4

Abstract

Scientists are measured by their publications. Yet anonymous peer review is far from transparent. Does bias lurk within the refereeing process? Investigating the outcomes of manuscript subvisions suggests that the overall process is not sexist, but differences in acceptance rates across journals according to gender of the first author give grounds for caution. Manuscripts with more authors and by native English speakers are more successful; whether this is due to bias remains to be seen.

Note also that scientific authors can also be cited in the Literature and Arts domain, as well as in the domain of Social Sciences, Anthropology and Philosophy...

Finally, unfortunately, the peer review system as it is working now is heavily flawed.

Open access

For several years a bitter fight is developing between the tenants of private publishing and those favoring open access to scientific research. The role of the Impact Factor of journals is important in this fight, as already established commercial publications make a large proportion of their success on this bibliometric measure of their influence. Government agencies, such as the National Institutes of Health in the USA consider that the research they support being funded by taxes, it should be public and open access. In a similar move, the Wellcome Trust, the most influential charity in UK, has required, from october 2005, that the research it supports is published in open access journals. Some commercial journals, such as Nucleic Acids Research, have already decided to become open access. Open access journals make the content of original publications free and public, leaving the copyright property to the authors, provided they refer exactly to the place where the work has been published.

A study published on february 19th, 2009 shows that free online availability of scientific articles increases the prospect for authors to get cited. The tendency is particularly visible in developing countries, where funding for research is limited. It is now common practice for an author to look for a reference in field directly related to his or her work, and to shift to a related paper if the article initially chosen is not readily available.

The idea of Open access was welcome. Unfortunately, some unethical persons discovered that this was a way to make a huge amount of money, simply by creating journals that ask the authors to pay a fee per article. Thousands of predatory journals, without any real credential have thus been created, generating sloppy or purely invented pseudo-science...

Computing the Impact Factor (see a thorough analysis of Biomedical Digital Libraries)?

A fashionable way to evaluate Science is to use bibliometric studies. One often considers the "Impact Factor" associated to the publications of a scientist, assuming that this is a way to evaluate the quality of his/her production. In fact an "Impact Factor" (invented by Eugene Garfield from the profit-making Institute of Scientific Information) is but one among several bibliometric markers; it is a measure of the number of times a journal is quoted in references, for a limited period of time. It rested initially on the sole responsability of a Private Company, that which maintains the Institute of Scientific Information (ISI). Several other structures now compute a similar index, that can be now be computed using Google Scholar.

The Impact factor represents, for a given year, the ratio between the number of citations divided by the number of articles published by a journal, during a two years period of reference. It measures the average frequency of the quotations of all the articles of that journal cited during a defined period of time. It is a retrospective index of the short term impact of a journal.

For example, the impact factor of Science (21.911) in 1995, has been computed as follows:
- citations in 1995 of articles published in 1993: 24,979; 1994 = 20,684; total = 45,663
- number of articles published in 1993: 1,030; 1994 = 1,054 ; total = 2,084-
- IF = number of citations/number of articles (45,663/2,084) : 21.911

This means that the papers published in Science in 1993 and 1994 have been cited slightly less than 22 times in 1995 on average.

Because this is a ratio, the impact factor depends heavily on the definition of an article, and the same definition is not used for the numerator and the denominator. Scientific articles are usually counted for the total, while the total number of citations quotes all types of articles published in the journal. As a matter of fact, the definition of IF comprises all types of articles, including Reviews, Comments, Editorials etc. Therefore a journal publishing reviews has always a higher IF than those which do not publish reviews (hence the vogue for mini-reviews or even review sections in most major journals now). For example the largest IF in 1999 was that of Annual Reviews of Biochemistry (37.111). Many review articles are not peer-reviewed in the same way as standard scientific articles, but commissioned, thus creating a huge bias in the choice of authors. Lobbying in this domain is common practice. Morevover, comments and editorials are often political in nature, and therefore frequently quoted: the IF of a journal such as The Lancet owes much to its controversial editorials, not to the scientific content of its articles. This is even more so for Nature or Science, and this explains the introduction of special sections such as "Insight" in Nature, since this will automatically boost the impact factor of the journal.

As the magazine Nature discovered a few years ago, the Impact factor is flawed. Nature reiterated its words of caution on june 23d, 2005:

"The net result of all these variables is a conclusion that impact factors don't tell us as much as some people may think about the respective quality of the science that journals are publishing. Neither do most scientists judge journals using such statistics; they rely instead on their own assessment of what they actually read. None of this would really matter very much, were it not for the unhealthy reliance on impact factors by administrators and investigators' employers worldwide to assess the scientific quality of nations and institutions, and often even to judge individuals. There is no doubt that impact factors are here to stay. But these figures illustrate why they should be handled with caution."

An analysis of quotations of the Human Genome articles shows important errors in citation statistics (Nature (2002) 415: 101.) As stated in the editorial of this famous magazine: "This adds to worries about relying heavily on these figures when rating scientific performance." Furthermore, this figure is only computed for journals selected by the ISI, excluding some very important journals (in particular some published on the World Wide Web and fundamental to genomics). It is important now to complement data provided by the ISI with other sources of citation records, such as those provided by Google Scholar. Caveat: journals are considered only as providing references from the date they are incorporated in the survey: this means that older papers from those journals are not taken into account. There is also a very strong bias in favour of native English speakers and English speaking countries (Tom Tregenza T.Tregenza@leeds.ac.uk Trends in Ecology and Evolution 2002, 17:349-350).

The Impact Factor does not measure the quality of the production of a scientist. An old study by Maunoury already showed that "9% of the articles in Cell, 16% in PNAS, 43% in Experimental Physiology and 52% in the European Journal of Pharmacology, published between 1989 and 1993, have never been quoted". The citation of one single article may significantly affect the IF of a journal. For example, the article on Blast2 is quoted so often that this single paper increased the IF of Nucleic Acids Research by one unit! In the same way, genome papers were often "hot papers" (see below) and they affected considerably the IF of journals (this is the case of both Nature and Science). In total, just a handful of articles may double the IF of a journal. This is why production of big data such as metagenomic studies are so fashionable. By far, the most important factor to evaluate the production of a scientist is the number of times his or her work has been cited by others. This however depends heavily on the journals considered in the databases. In the absence of an independent, non commercial, study, the figures we possess may be flawed. Note that some articles act as "attractors" and get most citations of a given subject. Quite unfortunately (and unethically) lobbies are trained to cite only papers of friends... Furthermore immoral scientists (in particular in the highly hierarchized medical domain) sign more papers than they can really contribute to, artificially increasing their citation record (in particular through self-citation). This unfortunate unethical and sloppy practice often involves plagiarism (including self-plagiarism) using a general canvas for articles where the name of the cases, organisms etc. may easily be replaced by a variety of alternatives.

Other approaches are much better, e.g. the number of times a paper (or a scientist) is still quoted after 5 years, 10 years or 20 years. The Impact Factor measures the ability of a journal to make itself known by advertisement (often paid advertisement), or even scandal. The publication of fakes or uncontrolled results, increases the impact factor of a journal (see the arsenic nightmare)! For a public view of the situation, the "Research's Scarlet List" as named by Alison McCook provides a very conservative identification of misconduct, see also for example (and this happens quite often):

Retraction: Metal-insulator transition in chains with correlated disorder
PEDRO CARPENA, PEDRO BERNAOLA-GALVAN, PLAMEN CH. IVANOV & H. EUGENE STANLEY
Retraction: A cytosolic catalase is needed to extend adult lifespan in C. elegans daf-C and clk-1 mutants
J. TAUB, J. F. LAU, C. MA, J. H. HAHN, R. HOQUE, J. ROTHBLATT & M. CHALFIE

Another example of retraction that has important consequences in the understanding of oxidative damage in animal cells, a fundamental topic in cancer studies is the following:

A highly cited 1997 paper on transcription-coupled repair was retracted by Science in june 2005, after coauthor Steven Leadon, formerly of the University of North Carolina, was found guilty by a university committee of fabricating and falsifying data. An analysis by Graciela Flores, shows that in spite of this retraction, the matter is not dealt with as what we understand as ethics could dictate. As it is often the case (we could remember the famous fakes of Mark Spector, which turned the whole community of cancer scientists away from investigating metabolic features of the disease, for more than two decades), this is not the first time that a paper by Leadon is retracted. Scientific misconduct is a widespread plague, unfortunately. Aside from publishing many more paper than intellectually possible (a behaviour akin to corruption, and very much spread in places where corruption is omnipresent), one of the most pervasive misconduct is the lack of proper citation of related work, which, of course, has a very high contribution to the impact of scientific work. In fact, as stated in an editorial of Nature about the terrible case of misconduct of a Korean scientist working in the domain of embryonic stem cells: "In view of the pattern of behaviour that led up to Hwang's disgrace, however, no one should argue ever again that despotism, abuse of junior colleagues, promiscuous authorship on scientific papers or undisclosed payment of research subjects can be tolerated on the grounds of eccentricity or genius. Research ethics matter immensely to the health of the scientific enterprise. Anyone who thinks differently should seek employment in another sphere."

There is a strong correlation between the Impact Factor of a journal and the number of retractions of articles published in the journal (see the retraction index).

Is, then, the impact factor a good way to evaluate excellent science?

The H-index

After the "Impact Factor", another index, the "h-index", has been proposed by HE Hirsch to evaluate the production of mainstream scientists: it is the highest number "h" of papers cited more than "h" times (the same "h") for a given author. h=35 will mean that an author has published at least 35 papers cited 35 times or more. This index is interesting in that it dampens the bias in favour of authors who have a few highly cited papers but nothing else. It also takes into account the past contribution of a scientist if his or her work continues to be cited over the years, while being in a significant number. It seems likely that this measure will play a role at least as important as that of the popular Impact Factor. A strong caveat must be borne in mind when considering this index: because it refers to citation in a given domain, it is highly sensitive to the number of articles published in that particular domain. In fact purists would contend that, as the process of citation is the result of many multiplicative causes, it is more likely that the distribution of citations is log-normal rather than normal. Furthermore, a large field produces many citations, but more articles will share those citations, so more citations are needed to keep up the average. If this is the case, then the multiplicative factor should perhaps be the logarithm of the ratio of the overall number of citations. If this were the case a h-index of 100 in cell biology would correspond to 75 in immunology or studies of the CNS, and 40-50 in microbiology. Naturally, combining two fields, such as in cellular microbiology, would even increase the h-index. It it therefore extremely important to remember that the size of a community is important when comparing the performance of various authors: a h-index of 30 in Cell Biology is much less significant than the same value in Microbiology. Strictly speaking, comparing investigators on the basis of h-indexes should be only possible if they belong to the same field.

A major issue with this index is, of course, the way citations are recorded. It is easy to see how authors publishing in journals that are not counted in the citations used by the ISI, for example, will be overlooked even if their research has a large impact: while the Japanese journal DNA Research is in the ISI list, for some reason its early papers are not in the cited papers by the same ISI, and there are many such examples, in particular with the journals published by BioMedCentral... For this reason it is important to prefer Google Scholar to the ISI and compute the H-index using this source of information. It is therefore prudent to take it with a grain of salt as scientists are gregarious and tend to cluster in heavily trodden avenues: it is not unusual to see that fairly boring topics are very popular (hence, highly cited). Indeed, analysis of the publication record of scientists who were awarded a Nobel Prize suggests an important caveat: while most have a fairly high "h-index" (but usually not among the highest ones), some do not. This can easily be understood as a senior scientist signing a very large number of papers will have a high h-index, despite, often, lack of originality (interestingly, the h-index may help to identify misconduct: no author should be able to be a co-author in one article per week, for example, except in a very corrupt system). Also, many important discoveries are made at the interface between domains, or are found out of the mainstream, and this does not always contribute to a high "h-index". It is sometimes also important to monitor, together with the h-index, the slope of the decrease in the citation number. Naturally, it is essential to have an idea of the size of the underlying community: working with infectious diseases, for example, will lead to work widely cited in hospitals, which are much more frequent than microbiology departments at universities! Original work takes time to be recognized, and recent domains have much smaller communities than older ones. This is particularly true in the past ten years or so, when many new domains were created, precluding high citation record.

The place were one has access to citation numbers is also very important. "Google Scholar" is becoming an important resource in that domain and it sometimes provides better insight and probably less biases, for new or innovative articles than the expensive ISI resource. The phenomenon is quite visible in Bioinformatics, where the number of articles referenced at the ISI is significantly lower, and sometimes much lower, than at Google Scholar (this is highly time-dependent, however, and this database is recent and does not identify easily citations older than 10 years ago). By contrast, because it started recently, papers published before 1985 are considerably less cited in "Google Scholar" than at the ISI. Google Scholar often omits to associate many articles with a given author, in particular when the list of authors is a long list. It is also heavily influenced by the behaviour of people with respect to Internet connection.

Other indices

A new factor, the "y factor" is being created to monitor the Internet-related prestige of journals: we are not very far, now, from Science as a full-blown show-business! Hence, as always with bibliometrics, one needs to take all indexes with a grain of salt, especially in the most innovative domains. Anyhow, it is of course essential to consider first the content of research, rather than the place where it is published or its visibility.

The immediacy index of a journal is calculated by dividing the citations a journal receives in the current year by the number of articles it publishes in that year, i.e., the 2000 immediacy index is the average number of citations in 2000 to articles published in 2000. The number measures how quickly items in that journal get cited upon publication. Hot papers is an other way to measure immediacy of the few papers which are most quoted immediately after their publication, and during two years.

The cited half-life is a measure of the rate of decline of the citation curve of an article. It is the number of years that the number of current citations takes to decline to 50% of its initial value. It is a measure of how long articles in a journal continue to be cited after publication. Review articles usually have longer cited half-lifes.

The lucrative activity of publication of scientific articles cannot accept that fair assessment of research could affect their financial benefits. There is therefore a considerable activity in the domain of creation of indexes meant to rank papers (and hence publishers). Elsevier, via its database Scopus has created a new index SCImago meant to rank journals and countries. SCImago's ranks differ somewhat from those obtained at the ISI, but review articles and journals producing images obtain the top scores, as expected.

The reviewing process

Publishing a scientific articles assume some sort of selection process. The most widely accepted view for this process is peer-review. As democracy, this is not a perfect system, but it is probably the best possible one. The idea behind is that an author cannot be entirely safe with the research (s)he is producing not only because (s)he is judge and jury, but also because knowledge is so vast that it is always possible to overlook important research pertaining to the subject of an investigation, or simply to be unable to point out a defect in reasoning. As a principle a referee should therefore help to improve the content of an article, and only reject it when it is logically flawed, plagiarized previous work, or simply did not acknowledge the proper scientific background of the work expecting to be published. This is no longer the case with many journals which proceed with an editorial pre-selection before they send the article for review. This is an explicit way to state that the corresponding journals do not have knowledge, or science as their main goal.

Many drawbacks are associated to the peer-review procedure, in particular conflict of interest (when the reviewer is doing work in the exact same domain as the author of the work under review) or lack of competence or insight. The equilibrium between these two features is difficult to get, as competence requires that the reviewer is knowledgeable in the reviewed work. Peer-reviewed is an unpaid work, to lower the impact of another type of conflict of interest, that of money. Yet, the very fact that many journals aim at getting a high impact factor makes that reviewers are asked to reject much more work than what they would normally do. And, in the most fashionable journals, which make a huge amount of money out of the fame attached to a high impact factors, reviewing is strongly distorting science towards fashion. However, there is also a huge number of studies which are simply sound, but dull, and do not help much in the progress of knowledge. The most difficult question associated to the peer-review process is the way it tackles innovation and discoveries. Fashion and pressure for impact favour fakes (and retractions of articles in otherwise reputed magazines is extremely frequent).

On the other hand, using reviewers who are excellent professionals but are not open to innovation is the most frequent drawback in the process. It is often excellent to use scientists who have been long recognized, and do not need further glory to support highly innovative ideas. A 2009 study of the peer-reviewing process in the Proceedings of the National Academy of Sciences of the USA demonstrates this is a remarkable way. Rand and Pfeiffer have investigated systematic differences in impact across publication tracks at this journal. PNAS has three tracks for article subvision. Papers can be “Communicated” for others by NAS members (Track I), submitted directly via the standard peer review process (Track II), or “Contributed” by NAS members (Track III). In the latter case the NAS members choose the reviewers according to their preferences, so that, in general, this means that if the NAS member has accepted to transmit the paper, the reviewers will be kind enough to make the paper accepted. For this reason it was feared that the process would end up, for Track III papers in papers that would rate below average quality. This work shows that this is indeed the case for a significant number of papers, but, in contrast, that the most interesting and innovative papers belong systematically to this category. The standard peer-review papers are of excellent professional quality, with little papers with a low citation rank, but they are not the papers which would leave important traces in science...

Retractions

Retractions is the plague of scientific communication. The fact that self-advertisement became more and more important in recent years pushed investigators to become very sloppy in the way they use statistics, or event to fake important results. This is particularly true in the medical domain, as shown in a study published by Ioannidis in PLoS Medicine: Why most published research findings are false. The more fashionable a journal, the less confident we should be in the published results. This may have considerable consequences in terms of medicine. For example, back in 1998 Andrew Wakefield and his colleagues claimed in the famous medical journal The Lancet to have found that there was a link between the triple vaccine against measles, mumps, and rubella and autism. This claim was based on wrong statistics and very poor experiments. Unfortunately, because it was published in a fashionable journal, it triggered a strong anti-vaccination reaction, which is still ongoing (rumors are difficult to stop). The Lancet has finally published a complete retraction of the paper, telling readers that the published flawed study should never have been made public. But this work had enormous consequences not only in UK, with a considerable increase in morbidity and mortality of these diseases, but also in the developing world, where persons coming from UK contaminated children.

In 2005, among many other cases, a prominent Japanese scientist who published two major papers in Nature (H. Kawasaki and K. Taira Nature 423, 838−842; 2003 and Nature 431, 211−217; 2004) could not produce the corresponding experimental data. The first paper had already been retracted (Nature 426, 100; 2003) and the second corrected (Nature 431, 878; 2004). An investigation panel asked Taira to submit samples and notebooks relating to the experiments, but the researcher in his lab who ran the experiments did not have them. It is obvious that references to those papers will appear, be it only to state that they are likely to be faked, but this will obviously increase the Impact Factor of the journal...

In the same way we can see in october 2005: Retraction: RNA-interference-directed chromatin modification coupled to RNA polymerase II transcription
Vera Schramke, Daniel M. Sheedy, Ahmet M. Denli, Carolina Bonila, Karl Ekwall, Gregory J. Hannon and Robin C. Allshire
Nature 435, 1275;1279 (2005)

In spring 2006, investigation about RNA work in Taira's group (at least 12 papers published in quite "visible" journals are probably fakes).

In the same way, on december 16^th, 2005, the magazine Science said that Woo-suk Hwang, the Korean cloning researcher, has requested a retraction of a paper on patient-specific human stem cells that made the headlines of dailies world-wide. The consequences of this work were considerable, including involving ethical considerations about human cloning....

Retraction Watch follows developments in this domain, which is literallly exploding. One of the most interesting cases, still ongoing in 2016, is that of the work of Olivier Voinnet and his colleagues, who apparently faked a remarkable number of figures in their articles over the years. One of the most remarkable feature of this situation is the sheer number of authors involved, which casts doubt on the way "scientific" articles are written.

The Pubpeer site often finds multiple fakes from the same authors or laboratories. Four papers from common authors are suspicious, among which another prominent research has recently been retracted by the journal Cell: A self-produced trigger for biofilm disassembly that targets exopolysaccharide.
Kolodkin-Gal I, Cao S, Chai L, Böttcher T, Kolter R, Clardy J, Losick R.
Cell. 2012 Apr 27;149(3):684-92. doi: 10.1016/j.cell.2012.02.055.

This paper is interesting as it was enough to access the public annotation of the genome of the bacterium involved in the study, Bacillus subtilis, to be aware that it was certainly not producing the "trigger" molecule described in the article. This implies that both the authors and the reviewers did not use a scientific approach to evaluate the content of this (and other) papers. It will be interesting to see whether there is a follow up at Pubpeer, as in the case of Voinnet papers.

Go to Top

SCIENTIFIC INDICATORS AND GAME THEORY