Genome-Wide Studies Still Showing the Power of Association

April 2013
DNA strands

Glance at the titles of the Top Ten biology papers and you can just about make out a pattern. Scattered here and there are words like large-scale, identifies, loci and susceptibility. Variants, such as analysis and analyses. Sometimes risk is present, sometimes not. Association is always there. Given time and computational tools, you could work out what the data are trying to tell you: that there is a very close link between a high-citation count and genome-wide association studies (GWAS). It isn’t certain, of course, but the chances of a paper being highly cited seem to increase if it reports on a GWAS.

ScienceWatch first began to notice the relationship in early 2011. Scrutinizing the new data confirms that impression. At #5, a GWAS of body mass index. Just off the Top Ten table, at #12, ditto schizophrenia. Next up, at #14, coronary artery disease.  And at #17, ulcerative colitis.

What’s Hot in Biology

Rank Paper Citations This Period (Sept-Oct 12) Rank Last Period (Jul-Aug 12)
1 The 1000 Genomes Project Consortium (D.L. Altshuler, et al.), "A map of human genome variation from population-scale sequencing," Nature, 467(7319): 1061-73, 28 October 2010. [78 institutions worldwide] 121 1
2 S. Anders, W. Huber, “Differential expression analysis for sequence count data,” Genome Biology, 11(10): No. R106, 2010.  [European Mol. Biol. Lab., Heidelberg, Germany] 47 +
3 B. Schwanhausser, et al., “Global quantification of mammalian gene expression control,” Nature, 473(7347): 337-42, 19 May 2011. [Max Delbruck Ctr. Mol. Med., Berlin, Germany; MicroDiscovery, Berlin] 44 7
4 M.A. DePristo, et al., “A framework for variation discovery and genotyping using next-generation DNA sequencing data,” Nature Genetics, 43(5): 491, May 2011. [Broad Inst., Cambridge, MA; Brigham & Women’s Hosp., Boston, MA; Harvard U. Sch. Med., Boston; Massachusetts Gen. Hosp., Boston] 39 9
5 E.K. Speliotes, et al., “Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index,” Nature Genetics, 42(11): 937-48, November 2010. [249 institutions worldwide] 37 3
6 J. Ernst, et al., “Mapping and analysis of chromatin state dynamics in nine human cell types,” Nature, 473(7345): 43, 5 May 2011. [Broad Inst., Cambridge, MA; MIT, Cambridge; Harvard U. Sch. Med., Boston, MA; Massachusetts Gen. Hosp., Boston] 37 +
7 J. Kim, et al., “AMPK and MTOR regulate autophagy through direct phosphorylation of Ulk1,” Nature Cell Biology, 12(2): February 2011. [U. Calif., San Diego; St. Jude Children’s Res. Hosp., Memphis, TN; U. Paris 05, France] 28 +
8 S.G.F. Rasmussen, et al., “Crystal structure of the beta(2) adrenergic receptor-Gs protein complex,” Nature, 477(7366): 549, 29 September 2011. [10 institutions worldwide] 28 6
9 B.L. Wu, et al., “Structures of the CXCR4 chemokine GPCR with small-molecule and cyclic peptide antagonists,” Science, 330(6007): 1066-71, 19 November 2010. [Scripps Res. Inst., La Jolla, CA; U. Calif., San Diego; Pfizer, San Diego] 27 +
10 D. De Stefani, et al., “A forty-kilodalton protein of the inner membrane is the mitochondrial calcium uniporter,” Nature, 476(7360): 336, 18 August 2011. [U. Padua, Italy] 27 +

SOURCE: Thomson Reuters Web of Science
NB. Only papers indexed by Thomson Reuters since November 2010 are tracked. The “+” sign indicates that the paper was not ranked in the Top Ten during the last period. In the event that two or more papers collected the same number of citations in the most recent bimonthly period, total citations to date determine the rankings


The basic GWAS methodology is common to all: Take two matched groups of people, one that has the condition of interest, one that does not. Large groups, ideally tens or even hundreds of thousands, offer the power to detect small effects. The DNA of each individual is mapped to look at hundreds of thousands of single nucleotide polymorphisms (SNPs). SNPs that are significantly more likely to be present in the cases than in the controls are then examined in fresh groups of cases and controls from different populations. Finally, look at the genes around the SNP to see whether they might plausibly be part of the causal chain that results in the disease.

Quite apart from sharing a basic methodology, what these papers also have in common are truly massive research teams, often with a consortium as the foundation of the efforts. Cynically, that suggests one reason why they are highly cited; if each member of a 300-strong team cites their work in another paper, that’s an awful lot of citations. But there’s more to it than that.

One promise of GWAS studies is that they may lead to better treatment and, if the genetic factors are well understood, prediction and perhaps prevention.

The BMI paper at #5 really is a landmark for several reasons. The obesity epidemic currently burdening health systems is clearly driven by lifestyle changes, but there is also a hefty genetic component. However, as the authors note, “in most instances, the loci identified by the present study harbour few, if any, annotated genes with clear connections to the biology of weight regulation.” One clear exception may be a locus near GIPR, the gene that codes for incretin, a hormone secreted by the intestinal wall that is a key element in glucose regulation. That and other loci are bound to send researchers down fresh trails in search of the underlying biological mechanisms.

One promise of GWAS studies is that they may lead to better treatment and, if the genetic factors are well understood, prediction and perhaps prevention. Erik Ingelsson of Uppsala University, Sweden, like the other leaders of the BMI team, is under no illusions about what is involved. “We need to understand which are the causal genes and variants, how they affect the risk of obesity, find suitable drug targets, and then move into drug development,” he told ScienceWatch. “It will take at least 10 years before we understand the full value of GWAS being done right now.”


For coronary artery disease, impact may come much sooner. (Paper #14: H. Schunkert, et al., “Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease,” Nature Genetics, 43[4]: 333, 2011; 23 citation this period, 128 overall.) The paper confirmed 10 of 12 previously known loci, and added another 13. Crucially, only 3 of those 13 are associated with factors traditionally linked to coronary heart disease, such as cholesterol levels and blood pressure.

Five of the new loci are pleiotropically associated with other traits, such as celiac disease. While these too raise opportunities for further research into causal mechanisms, the risk loci already offer the possibility of enhanced predictive ability. A weighted score based on the presence of the loci reveals a three-fold difference in risk of coronary artery disease between the top and bottom deciles. This “is at least comparable to that of several other traditional risk factors … including hypertension, diabetes and smoking.”

The authors of the study suggest that their approach may prove cost-effective in improving the performance of current risk-profiling methods, which in turn could help people avoid the worst consequences of having been dealt a bad genetic hand.

When the authors of the BMI study point out the “limited understanding of the biology of BMI” granted by their research, compared to other GWAS, they might well have had the ulcerative colitis study in mind. (Paper #17: C.A. Anderson, et al., “Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47,” Nature Genetics, 43[3]: 246, 2011; 22 citations this period, 131 overall.) This really does offer insights into the mechanisms behind a complex disease, with more loci identified than any other.

One important finding is that ulcerative colitis shares at least 28 of the risk loci with Crohn’s disease. Researchers had long thought that there would be some overlap, but not to this extent. In fact, John David Rioux, of the Université de Montréal, one of the lead researchers on the team, told ScienceWatch that although physicians see the two as distinct clinical entities, from a genetic perspective they probably represent a biological continuum in which non-genetic elements likely have an important impact on clinical presentation.

Could the same be true for schizophrenia and bipolar disorder? (Paper #12: S. Ripke, et al., “Genome-wide association study identifies five new schizophrenia loci,” Nature Genetics, 43[10]: 969, 2011; 25 citations this period, 69 overall). Clinicians recognize certain similarities, which the GWAS confirmed. Of five new loci associated with schizophrenia, three are also linked to bipolar disorder.

Tantalizingly, some of the loci are close to the gene MIR137, a microRNA that is closely involved in neuronal development. Of course there are many, many steps between disordered neuronal maturation and manifest schizophrenia (or bipolar disorder), but, as the authors say, these observations “suggest an intriguing new insight into the pathogenesis of schizophrenia.”

For now, it seems as if the massive effort required to do a GWAS that uses large-scale analysis to identify risk loci is indeed enough to garner lots of citations, especially if it is bigger and more comprehensive that its predecessors. It won’t be long, though, before the more detailed understanding that the GWAS calls for supersedes the GWAS themselves in the high-citation stakes.

Dr. Jeremy Cherfas is Senior Science Writer at Bioversity International, Rome, Italy.

The data and citation records included in this report are from Thomson Reuters Web of ScienceTM. Web of ScienceTM is a registered trademark of Thomson Reuters. All rights reserved.