Job Summary:Roles & Responsibilities:
The Scientific Computing and Data group at the Icahn School of Medicine at Mount Sinai partners with scientists to accelerate scientific discovery. To achieve these aims, we support a cutting-edge high-performance computing ("HPC") and data ecosystem along with MD/PhD-level support for researchers. The group is composed of a high-performance computing team, the research clinical data warehouse team and a research data services team.
The HPC Administrator, as a member of the Scientific Computing and Data group, is responsible for a computational and data science ecosystem for researchers at Mount Sinai. Duties include the design, deployment, management and maintenance of high-performance computing (HPC) clusters, an archival storage system, scientific workflows and associated data resources. Participate in the integration of HPC resources with genomic systems and in the resolution of all technical issues. This position reports to the Director for Computational & Data Ecosystem in Scientific Computing.Specific responsibilities are listed below.
Responsibilities
- Designs, deploys and maintains HPC clusters with over 30,000 cores with Infiniband, 40+ petabytes of data storage and databases in production.
- Designs and develops scripts for system administration, monitoring and usage reporting.
- Designs and develops scripts for system regression test and performance (file systems, scheduler, interconnect, high availability, etc.).
- Troubleshoots, isolates and resolves application, system and other technical problems (hardware, software, and network).
- Maintains, tunes and manages computational, data and workflow systems for MSSM researchers and scientists. Helps define and deploy a comprehensive computational and data vision.
- Researches, deploys and optimizes resource management and scheduling software and policies. Develops and implements backup policies.
- Understands research use cases, researches and deploys new technologies, defining cost, performance and other trade-offs.
- Answers and resolves user tickets.
Participates in the integration of HPC resources with laboratory equipment such as sequencers, etc. - Designs, tunes and manages parallel file systems, storage and data-oriented resources.
- Manages and maintains tools for configuration management, resource management, scheduling and all necessary aspects of HPC in accordance with best practices.
- Researches, deploys and manages networking and security infrastructure, including development of policies and procedures.
- Partners with other peers regionally, nationally and internationally to discover, propose and deploy a world-class research infrastructure for Mount Sinai.
- Assists in developing and writing proposals and publications.
- Creates and provides clear documentation.
- Works as a strong team player within the group, within Mount Sinai, and externally. Provide after hours support in case of a critical system issue.
- Performs other related duties as assigned or requested