- Home
- About Us
- People
- Research
- Services
- Best Practices Integrative Informatics Consultation Service (BPIC)
- High Performance Computing Facility (HPCF)
- Sequencing Informatics
- Microarray Informatics
- Proteomics Informatics
- Genetics-Genomics Informatics
- Cancer Informatics
- Centers of Excellence Informatics
- Multi-Modal & Integrative Studies
- Data Management and Mining
- Education
- Publications
- Industry Relations
Molecular Signatures Lab
Members:
- PI: Constantin Aliferis M.D., Ph.D.
- Faculty: Alexander Alekseyenko Ph.D.
- Scientific programmer: Jizhou Ai, M.S.
- Graduate students:
Collaborators:
- Alexander Statnikov. Ph.D.
- Douglas Hardin, Ph.D. (Department of Mathematics, Vanderbilt University)
- Frank Harrell, Ph.D. (Department of Biostatistics, Vanderbilt University)
- Jonathan Schildcrout (Department of Biostatistics, Vanderbilt University)
- Pierre Massion, M.D. (Department of Medicine, Vanderbilt University)
- Isabelle Guyon, Ph.D. (ClopiNet, Berkley, Ca)
- Ioannis Tsamardinos, Ph.D. (Department of Computer Science, University of Crete, Greece)
- Subramani Mani, Ph.D. (Department of Biomedical Informatics, Vanderbilt University)
Mission
One of the most significant translational developments of recent years is high-dimensional molecular profiles (aka, “molecular classifiers”, or “molecular signatures” or “In Vitro Diagnostic Multivariate Index Assays”). These computational models use as input patient-specific high-throughput assay measurements from a variety of possible biological samples (e.g., microarray gene expression values, Mass Spectrometry protein abundance, or SNP array genotypic information, from biopsies, blood samples, sera, etc.) and produce as output the probability or similar score that the patient has a particular disease, or that the disease is localized or of remote origin (e.g., metastatic), or that the patient will likely respond to a specific treatment, or that administered treatments will induce adverse events to specific patients, etc. More than 10,000 peer-reviewed articles since 1998 have established the potential of molecular signatures for supporting diagnosis and personalized medicine.
Molecular signatures are quickly moving to clinical care with dozens of signatures becoming commercially available in the last 3 years and many dozens in development currently. This exciting new development is critically enabled by appropriate computational methods for development, validation and deployment at the bedside. The image above shows some of the recently introduced molecular profiles for clinical use.
The Molecular Signatures Lab at CHIBI (MSL) researches data analytic/bioinformatics theory, algorithms, data analysis protocols and software systems for the development of optimally predictive, safe, parsimonious and cost-effective molecular signatures.
Research carried out by Dr. Aliferis and his colleagues includes proving theorems that connect predictivity with mechanistic (causal interpretation) when selecting genes and proteins to build signatures, creation of new sample and computationally efficient algorithms for discovery of biomarkers and development of maximally compact and predictive signatures, identifying sources of data analysis errors and ways to correct them in major studies, algorithms to deal with the multiplicity of molecular signatures and biomarkers, algorithms for discovery of biologically interpretable biomarkers, intelligent analysis systems for the automatic development of reproducible signatures (GEMS and FAST-AIMS), extensive benchmarking studies of classifier and gene/protein selection algorithms, causal characterization of various data analysis algorithms used in molecular signature research and information-retrieval models for storing and retrieving information about molecular signatures and other high-throughput molecular medicine modalities.
The MSL pursues these areas of research, as well as the development and study of guidelines for reliable data analysis of omics data. The MSL collaborates very closely with the Computational Causal Discovery lab (led by Dr. Statnikov) and with the departments of Pathology and Medicine, The Genome Center and Genetics for developing novel signatures, as well as to eventually deliver molecular signatures at the bedside. Research findings from MSL research inform the recommendations of the BPIC and software produced is deployed by BPIC when appropriate in research projects on campus. Software and methods from MSL are made available broadly to the scientific community outside NYU.
Main principles underlying MSL work
1. MSL utilizes causal Graph induction combined with powerful pattern recognition such as Support Vector machine methods to create parsimonious and highly predictive signatures. Biomarkers have local pathway interpretability, and are highly reproducible.
2. Causal Graph methods also allow for efficient extraction of all equivalent signatures.
(Slide courtesy of Dr. Alexander Statnikov using TIE* algorithm)
3. Extensive benchmarking and protocol validation allows the development of powerful automated software systems and guides best-practices and avoidance of analytic errors
Selected Annotated Related References (for full list of publications please use the CHIBI publications search system)
1. "Towards Principled Feature Selection: Relevance, Filters, and Wrappers". I. Tsamardinos and C.F. Aliferis. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, Florida, USA, January 3-6, 2003. Where we outline the foundational theoretical principles of both predictive and causal feature (hence biomarker) selection methods.
2. "Causal Explorer: A Probabilistic Network Learning Toolkit for Biomedical Discovery". C. Aliferis, I. Tsamardinos, A. Statnikov, L.E. Brown. In Proceedings of the 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS), Las Vegas, Nevada, USA; CSREA Press, June 23-26, 2003. Where we provide an industrial strength software tool for biomarker and pathway discovery. Available from: http://discover.mc.vanderbilt.edu/discover/public/causal_explorer/index.html
3. "HITON, A Novel Markov Blanket Algorithm for Optimal Variable Selection”. C. F. Aliferis, I. Tsamardinos, A. Statnikov. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, pages 21-25, 2003. Where we introduced the first correct local causal pathway and Markov Blanket algorithms that can be used with genomic scale data and small samples solving simultaneously the local pathway discovery problem and most parsimonious/most predictive biomarker selection problem. Note: this preliminary report has been radically enhanced subsequently with the introduction of the GLL class of algorithms.
4. "A Theoretical Characterization of Linear SVM-Based Feature Selection". D. Hardin, I. Tsamardinos, C.F. Aliferis. In Twenty-First International Conference on Machine Learning (ICML), 2004. Where we show that SVMs are a best-of-class classifier in terms of predictivity (but not gene selection) for construction of molecular signatures.
5. "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis". A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, S. Levy. Bioinformatics, Mar 1;21(5):631-43, 2005. Where we show that SVMs are a best-of-class classifier for construction of molecular signatures.
6. "Gene Expression Model Selector (GEMS): a system for decision support and discovery from array gene expression data". A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. Int J Med Inform., Aug;74(7-8):491-503, 2005. Where we introduced the fully automated system GEMS for molecular signature and biomarker discovery. Available from: http://www.gems-system.org/
7. “The Max-Min Hill Climbing Bayesian Network Structure Learning Algorithm”. I. Tsamardinos, L.E. Brown, C.F. Aliferis. Machine Learning, 65:31-78, 2006. Where we introduce the MMHC algorithm that utilized local search and global orientation to provide correct and genome-scale pathway reconstruction.
8. “Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables”. Statnikov A., Hardin D., Aliferis CF. Workshop on: Feature Selection and Causality, NIPS, 2006. Where we show that SVMs biomarkers should NOT be interpreted causally/mechanistically.
9. “Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective”. Aliferis CF, Statnikov A, Tsamardinos I. Cancer Informatics, 2: 133–162, 2006. Where we provide some high-level guidance for the analysis of omics data.
10. “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification” Alexander Statnikov, Lily Wang, Constantin F. Aliferis. BMC Bioinformatics 2008, 9:319. Where we show that SVMs are more powerful than random Forests for microarray predictive molecular profile development.
11. “Effects of environment, genetics and data analysis pitfalls in an esophagus cancer genome-wide association study”. A. Statnikov, C. Li, C.F. Aliferis. PLoS ONE 2(9): e958 doi:10.1371/journal.pone.0000958.
“A Statistical Reappraisal of the Findings of an Esophageal Cancer Genome-Wide Association Study” A. Statnikov, C. Li, C.F. Aliferis. Cancer Research 68, 3074-3075, April 15, 2008. doi: 10.1158/0008-5472.CAN-07-2999. Where we show that major errors can occur even in “straightforward” analysis of GWAS data for molecular profile construction and biomarker discovery and provide corrected analyses.
12. “Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data”. C.F. Aliferis, A. Statnikov, I. Tsamardinos., J. Schildcrout, B. Shepherd, F. Harrell Jr. PLoS ONE, 2009; 4(3): e4922. Where we show that major and subtle errors in published papers can hide the predictive signal in microarray molecular profile construction and biomarker discovery and we provide correct and more powerful analysis protocols. Notice that the paper with said errors (Michiels et al 2005) continues to be cited as being correct in hundreds of subsequent papers!
13. “The FAST-AIMS Clinical Mass Spectrometry Analysis System” N. Fananapazir, A. Statnikov, C.F. Aliferis. Advances in Bioinformatics, vol. 2009, Article ID 598241, 2009. Where we discuss FAST-AIMS, the first system that automatically analyzes mass spectrometry data. Available from: http://www.dsl-lab.org/FAST-AIMS/
14. “Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation” Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos (to appear in Journal of Machine Learning Research 2009).
“Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part II: Analysis and Extensions” Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos (to appear in Journal of Machine Learning Research 2009).
Where we present a powerful class of biomarker selection algorithms and compare causal and non-causal biomarker discovery methods. We find that causal methods confer many important advantages.
Selected Related Grants
- NIH/NLM, 7 R56 LM007948-04A1, (Aliferis) “Causal Discovery Algorithms for Translational Research with High-Throughput Data”, 10/15/09 – 10/14/10, Total Award: $344,512.
- NIH/NCRR 1 U54 RR024386-01A1 (Cronstein), “NYU-HHC Clinical and Translational Science Award”, 07/14/09 – 03/31/14, Total Award: $32,411,416.
- NIH/NCCAM 1 R01 AT004662-01A1 (Kokkotou), “Omics and Variable Responses to Placebo and
- Acupuncture in Irritable Bowel Syndrome”, 05/01/09 – 04/30/14, Total Subcontract Award: $347,688.
- NIH 3RO1 AR056667201S1 (Cronstein), “The Pharmacology of Dermal Fibrosis”, 01/15/09 – 12/31/13, Total Award: $798,312
- Department of Defense PC093319P1 (Ostrer & Aliferis – dual PI mechanism), “Enhanced Predication of Prostate Cancer Risk and Progression and Causative Gene Identification Award Mechanism: Synergistic Idea Development Award”, 07/0110 – 6/30/13, Total Award: $633,750
- NIH, NCI 1 U24 CA126479-01 (Liebler), “Clinical Proteomic Technology Assessment for Cancer”, 09/28/2006 – 08/31/2011, Total Award: $7,388,990.
- NSF 0725746 (Guyon & Aliferis), “Causal Discovery Workbench and Challenge Program”, 08/15/2007 – 07/31/2009, Total Award: $107,721.
- NIH/NLM, 1 R01 LM007948-01 (Aliferis) “Principled methods for very-large-scale causal discovery.” 07/01/2003 – 06/30/2006. Total Award: $631,180.
- NIH/NLM BISTI Planning Grant, 1 P20 LM007613-01 (Aliferis, Pilot PI) “Pilot Project Computational Models of Lung Cancer: Connecting Classification, Gene Selection, and Molecular Sub-typing”, 09/01/2002 – 08/30/2004, Total Award: $226,500.
