Thomson Reuters Web of Knowledge Ideation Challenge Winners Q&A
In January 2013, Thomson Reuters launched the Ideation Challenge, a contest for the scientific and scholarly research community designed to expand the discovery experience offered by Thomson Reuters Web of KnowledgeSM, an intelligent research platform providing access to the world’s leading citation databases with over 20 million users, ranging from students, information professionals, researchers and teaching faculty. The Ideation Challenge received 177 submissions from 35 different countries.
The Ideation Challenge is the first in a series of innovation challenges to be hosted by the Scientific & Scholarly Research group of Thomson Reuters. It is designed to raise awareness for relevant news and commentary, recognize excellence in research and encourage professional development. This challenge awarded $10,000 to the selected individuals who develop innovative concept papers describing new ways users can interact with the content and tools in Web of Knowledge.
First Place ($6,000 USD): Giving "Web of Knowledge" a whole new meaning: Capitalizing on citation networks for mining scientific information.
Winners Reason for selection: The submission is well thought out and well written. It addresses our core competency of citation analysis and the citation universe and would enhance the presentation of citation relationships through visual clustering of related articles.
Second Place ($3,000 USD): Personalized Scholarly Portals
Winners Reason for selection: The submission is very thorough and clearly articulates a market need for the development of a personalized discovery experience for scholarly information “power users”. The solver describes a solution that uses machine learning and recommendation techniques alongside traditional scholarly content.
Third Place ($1,000 USD): Visual timeline display of papers
Winners Reason for selection: This submission is well defined and is a creative approach to scholarly literature review. By use of a graphical presentation, the solution provides alternative ways to filter and sort search results to determine the current state of research, as well as trace historical significance of research findings.
1st Place Winner: Has Chosen to Remain Anonymous, United Kingdom
The idea that I submitted was one that I actually came up with a while ago and had been wondering what would be the best way for it to see light. It was quite clear that the key to its successful implementation is a broad citation base, which is something Thomson Reuters has and I do not.
My idea was to use scientific papers' citation records, not just as a way to gauge their popularity, but as a powerful tool to mine scientific information—and perhaps one can even say, as a source of knowledge in itself.
This can be made possible by building graphs (or "networks") of papers based on their citation records and then using network clustering and community detection algorithms to identify "nodes" and "crossroads" in them. Similar methodologies have been used successfully for mining information in social networks with impressive results (i.e. Building the psychological portraits of their users or learning about global economic trends). So I thought, ‘why not use the same for scientific papers, some of the earliest examples of hyperlinked text’?
I was learning about network analysis for a different problem, naturally from scientific papers, and this idea just popped up in my head.
I described this in detail in my solution, but briefly this would work like this:
When searching MEDLINE records, in addition to standard search options, it will be possible to generate a citation network from papers matching keywords in a user query. These networks will likely be too big to visualize, but so-called network community detection algorithms will be used to detect "hubs" that attract large numbers of references. These will be plotted and labeled with the keywords that best describe papers that map to these hubs. This way it will be possible to see emerging trends and concepts in science, quite possibly even before scientists themselves have become aware of them. This is what I mean about generating new knowledge!
Additionally, a "Find Connections" tool could be implemented to find papers that link citation "hubs" matching two sets of keywords. This is useful for people trying to find "bridges" between two fields , such as, population geneticists who have identified a statistical association between a gene and disease and are mining the literature to see if any known mechanisms can explain it.
It will help users mine the literature faster and motivate them to think about scientific knowledge in a different, more global and interconnected way, in terms of "trends" and "communities,” not just isolated "stories".
2nd Place Winner: Mark N. Ziats, MD-PhD student, Baylor College of Medicine and University of Cambridge, United States
The challenge was very related to an idea I had already been thinking a lot about. I had gone so far as to do some market research and start to sketch out a business plan, and even talk to potential co-founders and investors about it. For a variety of reasons, the idea was put on the back burner, but when I saw the Ideation Challenge, I figured I would submit something since I had already spent a lot of time thinking and planning it out.
The idea is for a ‘Pinterest’ type website where the content is scientific publications, patents, and other related news. The premise is simply that the overwhelming success of Pinterest, Facebook, Pandora, etc. suggests there is something inherent to their platforms that people want—content that is personally relevant and automatically delivered, without having to search for it. In other words, search is a fading technology, in my opinion. However, nothing like this really exists in the world of scientific publications. Instead, researchers just rely on keyword search (e.g. Google Scholar, Pubmed), whereas Facebook, etc use machine learning approaches (via their ‘Like’ button) to learn what you want to see and delivers that content for you automatically.
I think the success of these other platforms demonstrates people in general want their content delivered this way, and I see no reason why the specific niche of scientific publications content would be any different. So in essence, the idea is for a website containing a daily ‘newsfeed’ of scientific publications that are specifically tailored to an individual’s interest based on machine learning.
I am currently a PhD student. I start every day by reading the literature of my field. The best way I have found to do this is to have a Pubmed or Google Scholar ‘keyword’ search delivered to my email every morning. It is not a bad approach, but my field is really broad (autism), so I scan through hundreds of papers to find the few that are relevant to my work specifically. This is a bit annoying, frankly, and I always try to take note of things that annoy me, because I feel like those are good instances for innovation. So that insight, coupled with the fact I read a lot about machine learning techniques both for my research and because I find it fascinating, sparked the idea.
Well the difficult part—having a large database of scientific publications to draw from—is already stored on Web of Knowledge. The next step would be developing a user interface page that pulls content from the already existing Web of Knowledge database. I envision researchers logging in to a separate Web of Knowledge portal and immediately being presented with a ‘feed’ of articles relevant to them.
The scientific literature is overwhelmingly voluminous, and the recent explosion of open access and other non-traditional journals has only exacerbated this. Moreover, that is just peer-reviewed research papers—many people need to keep up with patents, funding trends, acquisitions, and other scientific publications in addition to papers. This portal would be a one stop shop for all that information, and would be specifically tailored to what a user is most interested in (by learning from their ‘likes’). So this would save users time and the hassle of tracking down the information they want, and it would provide it to them in one place with a nice, interactive design.
I envision all researchers coming into the lab every morning, grabbing a coffee, and going to this website and just browsing for 30 minutes. Basically like reading the newspaper, but a really specific newspaper tailored to an individual’s research interests. It has the potential to reduce the time spent on assessing the literature, because 100% of that time can be spent reading instead of spent finding relevant content.
In a different manner, I think this has the potential to reverse a trend that I think is a negative one for biomedical research overall. Search has completely transformed how broadly researchers know their field and fields outside of their own. Prior to Pubmed and Google Scholar, people had to browse journals every week to find articles relevant to them. Obviously, this was inefficient and why Pubmed/Google are great tools. But there is a good component to browsing that is lost by search, which is that you can’t search for things that you do not know to search for.
By relying on only search, researchers are not ever exposed to papers that are actually relevant to them but they were not searching for. I think this hampers scientific innovation. The history of great scientific innovation is full of examples where researchers from one field adopted a method/theory/technique from another field and in doing so were able to make a great breakthrough (e.g. Watson and Crick). If researchers skip directly to the literature that they search for, they may be missing relevant and related publications. Machine learning approaches are good at identifying and presenting this kind of information. So in a way, I think this has the potential at the same time to both streamline content delivery but also allow what is delivered to be broad enough to foster the kinds of insight that are necessary to make truly innovate leaps in thought.
3rd Place Winner: Anna Furches, Graduate Student, University of Tennessee (United States)
As a grad student, I often found myself frustratingly short of both knowledge and time. In order to establish the background knowledge required to make inferences regarding my own research, it was necessary to expend large amounts of time performing literature searches and reading. Many of the topics I needed to research had a relatively narrow audience, so there weren’t any textbooks or review papers that would allow me to quickly catch up on decades of research. I often wished that search results were presented in a format that would help me quickly assess the background of a particular field and determine which publications represented the most important advances. I know many professionals and students have shared my frustrations. This challenge presented the opportunity to make one part of the research process easier and more efficient for myself and others.
My idea was to present literature search results in a graphic format, and utilizes two different means of providing information to searchers. The first aspect involves representing each search result, or publication, as an icon. These icons (publications) would be organized upon a timeline according to publication date. In addition, icons would be sized differently depending upon the number of times each publication has been cited. So for example, publications that have not yet been cited would be the smallest, publications with several hundred citations would be the largest, and publications with intermediate numbers of citations would be intermediately sized. This format would allow searchers to instantaneously assess the density of research in a field over time, as well as which publications represent the most important advances in any field.
The second aspect provides fast, convenient access to abstracts directly from the timeline. By “rolling over” the icon that represents a publication, searchers would be able to access a pop-up box that contains the abstract. In addition, this pop-up box would provide the date of publication, source, author information, number of citations and a link to the full length publication.
Seeing the ad for the Web of Knowledge Ideation Challenge triggered the idea. It just sort of popped into my head, details and all. I am a visual thinker, so I found writing the proposal (and thus having to explain the idea using words) a much bigger challenge!
I hope users will have the option to view all literature search results using this format. In addition, I hope this format will be available across all platforms; the ability to view and interact with search results in this way will make it much easier to perform literature searches from mobile devices.
Presenting search results in the format I proposed would fill the gap when there aren’t textbooks, review papers or knowledgeable colleagues to help a researcher learn about the history of work on a particular topic. This gap will be encountered more and more often because science and most fields in general, are becoming increasingly specialized. This solution would help researchers build a more complete foundation of knowledge in any field while requiring less energy and time to do so.
This solution will increase the ease and efficiency of performing literature reviews. Searchers will be able instantaneously assess the density of research in a field over time, as well as which publications represent the most important advances in any field.
The data and citation records included in this report are from Thomson Reuters Web of KnowledgeSM. Web of KnowledgeSM is a registered trademark of Thomson Reuters. All rights reserved.