Evidence Based Medicine Information Retrieval and Scientometrics Lab (EIRSL)
The sheer volume of scientific information in the biomedical literature and on the Internet makes manual inspection of literature searches prohibitive even for narrow clinical and research questions. This abundance of information slows the rate of research discoveries (especially translational ones that are slower by virtue of having to cross disciplinary boundaries) and propagates low-quality information to patients via the internet. EIRSL work has recently developed pattern recognition-based filtering methods that can automatically identify the content and quality of both web pages and scientific articles. The models hold the promise of expediting literature search and synthesis by focusing on content-specific articles of the highest methodological quality.
Other EIRSL work has shown that it is possible to augment traditional bibliometric quality measures such as citation count and impact factor using machine learning approaches. In particular, machine learning methods accurately predict citation counts in a deep horizon (10 years after publication) using only data available at publication time. Predicted citations counts could be a powerful filter to focus attention on recent publications that are more likely to influence new scientific and clinical developments.
Machine learning methods can also accurately characterize the nature of a citation as being essential or not. Citation counts can be adjusted by discarding citations that are not important to the citing papers since many highly-cited papers gather numerous citations due to factors other than quality (e.g., may be cited by rebuttal and refutation papers etc.). EIRSL will advance the state of the art in these exciting areas by extending the number and scope of these models and delivering them to both researchers and patients through collaborations with the Ehrman Medical Library to evaluate the effectiveness of these methods and to guide their future improvement.
EIRSL will also develop new technology using pattern-recognition approaches to accurately identify content and quality of MEDLINE and web documents as well as to enhance traditional bibliometric criteria such as the impact factor and citation counts. The motivation for the development of these techniques is to identify higher-impact, higher-quality research findings more quickly to accelerate their use for both research and clinical care.
Summary slides underlying EBMIRSL’s methods:
Pattern recognition models to predict citation count: Main Idea
Pattern recognition models to identify high and low quality articles and web pages: Main Idea
Members:
- Co-Leader: Lawrence Fu Ph.D.
- Co-Leader: Yindalon Aphinyanahongs M.D., Ph.D.
- Senior advisor: Constantin Aliferis M.D., Ph.D.
- Collaborators:
- Karen Brewer Ph.D.,
- Alexander Statnikov. Ph.D.,
- Brian Haynes, M.D., PhD.
References
- "Text Categorization Models for Retrieval of High Quality Articles in Internal Medicine." Y. Aphinyanaphongs, C.F. Aliferis. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, Washington, DC, USA; pages 31-35, 2003.
- “Learning Boolean Queries for Article Quality Filtering”. Y. Aphinyanaphongs, C.F. Aliferis . In Proceedings of the 11th World Congress on Medical Informatics (MEDINFO), San Francisco, California, USA; September 7-11, 2004
- “Text Categorization Models for Retrieval of High Quality Articles in Internal Medicine”. Y. Aphinyanaphongs, I. Tsamardinos, A. Statnikov, D. Hardin, C.F. Aliferis. J Am Med Inform Assoc., Mar-Apr;12(2):207-16, 2005.
- "Extracting Drug-Drug Interaction Articles from MEDLINE to Improve the Content of Drug Databases". S. Duda, C.F. Aliferis, R.A. Miller, A. Statnikov, K.B.Johnson, Proc AMIA Symposium, 2005.
- "Using citation data to improve retrieval from MEDLINE". E.V.Bernstam, J.R.Herskovic, Y. Aphinyanaphons, C.F.Aliferis, M.G.Sriram, W.R. Hersh. J Am Med Inform Assoc., Jan-Feb; 13(1):96-105, 2006 .
- “Prospective validation of text categorization models for indentifying high-quality content-specific articles in PubMed”. Y. Aphinyanaphongs, C.F. Aliferis. Proc Annual Fall Conf AMIA, 2006.
- “A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality MEDLINE Documents”. Y. Aphinyanaphongs, A. Statnikov, C.F. Aliferis. J Am Med Inform Assoc. Jul-Aug; 13(4):446-55, 2006.
- “Text Categorization Models for Identifying Unproven Cancer Treatments on the Web” Y. Aphinanaphongs, C.F. Aliferis. In International Medical Informatics Congress, MEDINFO, 2007.
- “A comparison of Impact Factor, Clinical Query Filters, and Pattern Recognition Query Filters in Terms of Sensitivity to Topic”. L. Fu, L. Wang, Y. Aphinyanaphongs, C. F. Aliferis. In International Medical Informatics Congress, MEDINFO, 2007.
- “Models for Predicting and Explaining Citation Count of Biomedical Articles” Lawrence D. Fu, Constantin Aliferis. AMIA Fall Symposium 2008.
- “Using Content-based and Bibliometric Features for Machine Learning Models to Predict Citation Counts in the Biomedical Literature: L.D. Fu, C.F. Aliferis. Scientometrics., Published online, Feb 3, 2010.
- “Machine Learning Models for Automatic Classification of Instrumental Citations” L.D. Fu, C.F. Aliferis. (In preparation).
