- Home
- About Us
- People
- Research
- Services
- Best Practices Integrative Informatics Consultation Service (BPIC)
- High Performance Computing Facility (HPCF)
- Sequencing Informatics
- Microarray Informatics
- Proteomics Informatics
- Genetics-Genomics Informatics
- Cancer Informatics
- Centers of Excellence Informatics
- Multi-Modal & Integrative Studies
- Data Management and Mining
- Education
- Publications
- Industry Relations
High Performance Computing Facility (HPCF)
1. Phase I: Expansion and re-purposing of the Research Computing Resource
In conjunction with MCIT, CHIBI has designed and implemented a high performance Sun Blade Cluster to support both sequencing informatics and other bioinformatics projects.
• A suite of server-based bioinformatics programs has been installed on a 64-core SUN Blade Cluster (with 8 GB RAM per node and 10 TB of SAN data storage capacity).
•Each blade has independent read/write access to shared storage volumes and networking infrastructure with minimum of one gigabit per second transfer rates.
•The cluster is optimized for multiprocessor-based parallel software application based on the Linux HPC implementation and the SGE queue controller.
•Software includes the Illumina genome sequencing pipeline, Roche/454 sequence analysis tools, EMBOSS, Matlab, and a variety of modeling and simulation tools.
Other bioinformatics services include:
- An FPGA hardware accelerated BLAST, Smith-Waterman, and pattern matching algorithms on a Decypher CodeQuest machine.
- A dedicated 8 processor cluster for the Roche 454 sequencer
- A dedicated 2 processor Red Hat Linux server to support bioinformatics training and student projects.
- A 2 processor Windows server for general purpose research databases
- An open computer room (120 sq. ft.) with 4 workstations available to investigators.
2. Phase II: High Performance Computing
Introduction, Scope, and Importance
Today the NYULMC research community is facing greatly expanded and much more complicated computing needs. Significant new technologies invented in the last five years are now needed by investigators, such as next generation genome sequencing: the invention of instruments such as the Roche 454, Illumina and ABI SOLID permits extremely rapid, accurate, and deep sequencing (provided very significant computing power is available), transcriptional regulation studies (via Chip-seq, epigenetic, and non-protein coding regulatory mRNA studies), de novo sequencing of microbes and parasites, microbiomic studies, re-sequencing of known species, SNP/copy number variation studies, splice variation studies, epigenetic studies, digital gene expression, and discovery of novel genes. Investigators now need to apply methods to obtain molecular classifiers (“signatures”) and associated biomarkers that can provide vastly improved diagnostic accuracy, early disease detection, personalization of treatment selection and dosing, as well as enhance accordingly the process of drug development across the disease spectrum. In addition, molecular assays combined with even more compute-intensive methods such as de novo reverse engineering of molecular pathways shed light on how disease develops and the impact drugs have on the disease processes.
Assessment of high-performance needs at NYULMC
In our institution a large number of investigators who are leaders in their fields are working in a broad range of cutting-edge areas in great need for high-performance computing: in pediatric neuroscience and psychiatry, medical parasitology, gastroenterology, renal disease, radiation therapy in prostate and breast cancer, drug discovery, new sequencing methodologies, metadata analyses, molecular signature development, biomarker discovery, biomedical information retrieval and novel informatics methods development. All of these areas rely on sophisticated computer analyses and the breadth of research areas is indicative of the power of high performance computing resources to make profound contributions in biomedical research. Very few, if any, shared instruments have the ability to deliver benefits on this range and scale.
During the period of two months in Spring of 2009 we conducted a comprehensive survey of high-end computing needs and identified current, pending and planned grants/projects that require computing power that exceeds the capabilities of local resources and the RCR.
To meet these needs CHIBI has developed plans and secured development funds to purchase and deploy a high-end cluster to meet the needs at the NYULMC level.
Summary Technical Specifications of NYUMC HPC expansion
The current design for the NYUMC HPC expansion consists of 67 nodes configured as two large-memory nodes (DL785G5), one cluster head-node (DL380G5) and 64 “worker” nodes (DL160seG6), 32 of which have Tesla S1070 accelerators. The configuration will have 48TB of attached storage. This design and vendor (HP) was selected in order to have very high flexibility in terms of the computational tasks that were needed to be solved from very large memory applications, were manipulations requiring the entire genome to be held in memory are to be performed, to tasks that are amenable to a very high degree of parallelization, such as large-scale alignment searches. A significant boost in compute power from the Tesla accelerators will extend the number of tasks that it will be reasonable to program for this equipment. The power of this equipment exceeds the initial needs of the currently active projects several folds in order to allow for (a) significant usage expansion for the entirety of its anticipated 5-year working life; (b) downtime and training time; (c) benchmarking and code optimization studies useful to all users; and (d) sufficient leftover cycles that can be allocated to investigators who are building preliminary results toward future NIH funding.
HPC hardware specifications
- 2 DL785G5 Opteron Large Memory Nodes
- 1 DL380G5 Head Node with 48TB Storage
- Modular Smart Array (MSA2000sa) Serial Attached SCSI (SAS) 48TB
- 32 DL160seG6 Compute Nodes [DL160G6 placeholder for DL160seG6]
- 32 DL160seG6 Accelerator Nodes with Tesla S1070 [DL160G6 placeholder for DL160seG6]
- 16 Tesla S1070 - Each S1070 contains 4 GPU, 2 PCIE
- 3 HP ProCurve 2610-48 10/100/1000 console network switch [Server management port]
- 3 HP ProCurve 2848 1000T administration network switch [Server Nic 1 port]
- 4 Voltaire 4036 IB QDR 36P Switches configured for DDR 1:1 Bandwidth
- 3 HP 42U Rack 10642 G2 Racks, joined1 TFT7600 RKM combines a full 17 inch WXGA+ monitor and keyboard with touch pad in a 1U format
- 1 KVM for Head Storage Node and large memory nodes, connection to all nodes via admin and console network
Networking
The NYULMC currently operates a fiber-optic-based, switched Ethernet network core at a speed of 10G. Networking to the desktop is typically 100baseTX, but “power users” can request 1G connections to the core for their desktops if they require it. Firewalls provide appropriate network security by managing all traffic into and out of the site. The link to the University main site and to the Internet is 1G over redundant single-mode fiber paths and plans are being developed to increase this to 10G. Within the “C20” space the plan for our new equipment is to provide a new InfiniBand interconnection fabric for machine-to-machine interconnections and machine to local data storage, plus 10G connections from each machine to the core.
Personnel:
We have opened two dedicated positions for the HPCF
Technical Director High Performance Computing Facility (1002271)
The Technical Director will be responsible for implementing a modern high-performance computing environment; implementing and following best practices specific to the field of high-performance scientific biomedical computation that includes physical space access and specifications, local network and internet connectivity, data storage and backups, job scheduling, programmatic interfaces; working with and reporting to Scientific Director to ensure that the HPCF meets the needs of NYUMC research projects, to build benchmarks for periodic evaluation, and to maintain and regularly renew hardware and software; working with and reporting to the HPCF Operations Director for day-to-day operational activities such as training, maintenance, project execution, cost recovery, and facilities management for the HPCF; working with the HPCF Applications Training Lead, to facilitate development and delivery of training programs for scientists and students on how to use the facility for their research projects; supervising a Systems Manager and Programmer; plus all other related functions.
Systems Manager and Programmer High Performance Computing Facility (1002270)
The SMP will be responsible for following best practices specific to the field of high-performance scientific computation in terms of physical space access and specifications, local network and Internet connectivity, data storage and backups, job scheduling, programmatic interfaces, etc; playing a major role in new software development for the HPCF; working with and reporting to HPCF Technical Director for the execution of HPCF users' projects, benchmarks, and for the maintenance and renewal of hardware and software; facilitating development and delivery of training programs for scientists and students on how to use the facility for their research projects; developing and delivering training programs for external programmers; plus all other related functions.
In addition to these two new full-time positions the following individuals will devote part-time or full-time efforts to the HPC:
Constantin Aliferis functions as Scientific Director of the HPC. His role is to ensure the HPC meets the needs of the Biomedical Informatics Core of the CTSI, and that the needs of NYUMC research projects requiring high-end computing are met. He will work with users and CHIBI faculty to build benchmarks for periodic evaluation of services provided and planning of renewal of hardware.
Dr. Philip Ross Smith is Operations Director bearing day-to-day operational responsibility and oversight of training, maintenance, project execution, cost recovery, and facilities for the HPC.
Dr. Stuart Brown, is assigned the role of Applications Training Lead, responsible for training scientists and students on how to best use (as opposed to program) the facility for their research projects.
The NYULMC IT department, under the direction of the Chief Technology Officer Nader Mherabi, provides central management of the “systems” aspects for all the computing resources needed for informatics. These tasks include system design, managing equipment purchases, systems management, management of large-scale disk storage, backup and security and centralized housing. The IT Lead for the HPC is Aleksandar Kacanski, Director of System Engineering.
Advisory Committees & Financials:
Executive Advisory Committee (HPC-EAC): The role of the EAC is to ensure that the HPC continues to fulfill the goals outlined in the grants that contribute to its funding, to ensure that the HPC facility operates in a manner consistent with the NYULMC’s goals for its shared resources, and to ensure that the HPC is a dynamic entity growing to meet the emerging needs of the research community.
HPC Users Group (HPC-UG): The HPC-UG will have two roles. First, it will provide a sounding board for the EAC to understand the evolving needs of the HPC user community. Second, there will be brief presentations of the scientific highlights of the work being done on the equipment and an opportunity to present new developments in HPC.
Managing access: Specific policies for HPC usage will be determined in consultation with the HPC-EAC, HPC-UG, and CHIBI. The policies that we will develop will give priority to NIH-funded projects but will also allow pre-funding availability and benchmarking/downtime and maintenance.
User charges: The NYULMC is in the process of establishing a standardized financial environment for all its shared resources, of which the HPC would be one. The goal of these emerging financial standards is to ensure that every project that utilizes a shared resource has a budget that accounts for all the costs associated with that use. Appropriate chargeback will be initiated for users of the HPC, but at present the actual charge rate and methodology remain to be determined. Recovered costs will be used only for expansion and long-term sustainability of the shared resource and the new staff positions associated with it.
