- About Us
- Collaboration & Service
Sequencing Informatics Group
Next Generation DNA Sequencing is a very dynamic technology in current biomedical research. Huge quantities of high quality data are being produced at ever lower costs. The Sequencing Informatics Group provides basic and clinical scientists with the bioinformatics expertise to manage and analyze the data produced by these new DNA sequencing machines in support of a wide variety of different research objectives. We work closely with the NYULMC Genome Technology Center to insure that the latest bioinformatics methods are incorporated for optimal experimental design, data processing, and downstream analytic methods for every sample that is sequenced. We develop best practice data analysis pipelines for a variety of experimental designs that integrate proprietary software from the vendors of DNA sequencing machines (Illumina GAII/HiSeq and Roche 454) and the best Open Source tools as well as software developed within our group. We also develop custom data analysis solutions for the unique research needs of each investigator.
The NYULMC Sequencing Informatics Group provides expert bioinformatics consulting to investigators affiliated with any institution, and provides analysis solutions for DNA sequence data generated at any academic or private sequencing facility.
- ChIP-seq identifies transcription factor binding sites and epigenetic modificaions
- RNA-seq measures gene expression, alternative splicing, and finds coding mutations
- Genome resequencing identifies mutations
- De novo genome sequencing and assembly
- Metagenomic sequencing of 16S amplicons and whole metagenomic shotgun
- Group Leader: Stuart M. Brown Ph.D.
- Faculty: Jinhua Wang Ph.D.
- Faculty: Alexander Alekseyenko Ph.D.
- Faculty: Phillip Ross Smith Ph.D., M.D.
- Faculty: Alexander Statnikov. Ph.D.
- Faculty: Jiri Zavadil, Ph.D.
- Faculty: Efstratios Efstathiadis Ph.D.
- Faculty: Constantin Aliferis, M.D., Ph.D.
- Faculty: Steven Shen M.D., Ph.D.
- Frank Hsu, Ph.D.(Department of Computer Science, Fordham University)
- Ivan W Selesnick Ph.D.(Department of Electrical and Computer Engineering, NYU Polytechnic University)
The ChIP-seq pipeline is a combination of software from Illumina, Open Source, and methods developed at CHIBI. We use Illumina RTA for basecalling and ELAND/CASAVA for genome alignment of sequence reads. We then remove duplicate sequence reads and reads mapped to more than one genomic location with a custom script to produce a data set of 'unique single' tags for each sample and input control lane. Data quality for each sample is measured by coverage and clustering methods. Peak-detection is computed with the CHIBI TRLocator method, and validated with the Open Source MACS software package. Analytics are available for quantitative comparisons of different biological treatments.
We are currently operating two different pipelines for the analysis of RNAseq data. For experiments that are focused on gene expression data, the Illumina software package provides a stringent alignment to the reference genome using the ELAND aligner, then counts sequence reads associated with each known RefSeq gene including individual counts for each exon and each exon junction boundary. For experimental designs that are also concerned with alternative splicing, or expression from unannotated regions of the genome, the Open Source Bowtie/TopHat/Cufflinks package provides greater flexibility to discover novel modes of gene expression.
The sequence variant pipeline is designed to detect mutations in the genome of an individual patient as compared with the standard reference human genome (GenBank hg18 or hg19). Raw basecalls (typically from the Illumina GA or HiSeq machines) are aligned to the reference genome with high sensitivity with bwa, SNPs and small indels are filtered using a set of parameters implemented with SAMtools, then custom annotation scripts are used to identify coding SNPs, know variants in dbSNP, and other experiment-specific factors. We have implemented variants of this pipeline to screen for relapse-specific mutations in diagnosis-relapse tumor pairs and also for data generated from RNAseq rather than genomic DNA libraries. We have developed another pipeline that uses paired-end sequencing data for the detection of translocations
The de novo assembly of a genome is best approached as a scientific investigation with extensive collaboration between bioinforamtics, rather than a simple software pipeline. As a starting point, we offer Newbler assembly of 454 reads and Velvet assembly of Illumina reads.