Senior Linux System Engineer
Insight Global
Job Description
As a HPC Systems Engineer you will help ensure today is safe and tomorrow is smarter. Our work depends on HPC Systems Engineer joining our team to bridge the gap between our researchers and the high performance computing resources. You will be one of the faces of our High Performance Compute (HPC) clusters to the clients research community who will rely on you to help them get their important research work done. You will focus on supporting HPC hardware, installing scientific applications, optimizing submission scripts and running jobs, and monitoring the health of our clients HPC clusters; a 4000 core HPC cluster that is GPU-focused and a 1,500 core HPC cluster.
How a HPC Systems Engineer will Make an Impact:
· Work with a 4000 core HPC cluster that is GPU-focused and a 1,500 HPC cluster supporting the hardware and operating system environments
· Supporting bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, and AI/ML
· Monitor the portfolio of software applications and be proactive in planning upgrades and license renewals
· Monitor and report on cluster performance and generate data to show usage and trends
· Triage support requests from the research community and work with others in the Scientific Infrastructure team to resolve issues and complete service requests
· Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows
· Engage with researchers to understand their HPC needs to include data life cycle management, integration of scientific instruments to HPC, and storage capacity and compute requirements
· Provide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.
· Attend and actively participate in daily standup meetings to provide updates on progress, discuss obstacles, and co-ordinate tasks with other team members
· Work collaboratively in a team environment to achieve project goals
· Engage in open communication, share knowledge, and support fellow teammates
Provide feedback and contribute to the continuous improvement of team processes
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com .
To learn more about how we collect, keep, and process your private information, please review Insight Globals Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .
Skills and Requirements
· BS/BA (or equivalent)
· Five years of related experience
· Minimum of five years of experience with servers, datacenters, networking, and related technologies
· Minimum of five years of experience managing Linux systems
· Experience with Spack package manager, including making packages from PyPi, R, Github
· Experience installing and packaging GPU applications and optimizing job submission scripts that are used for ML model training, data mining operations, or high-res graphics rendering
· Experience with Python scripting
· Experience using Git distributed workflows
· Experience with Ansible manage system configuration
· Experience with Terraform for provisioning systems
· Must be able to obtain a NIH Public Trust
· Ability to translate technical concepts in HPC and research computing to scientists and other non- technical personnel
Ability to determine meaningful metrics and usage data for leadership HPC scheduler experience (esp. SLURM) null
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to HR@insightglobal.com.