Infrastructure Engineer

TEKsystems

This is a 6-month contract position. This could be extended, although this not guaranteed. No C2C or sub contractors.

Top Skills Details

Bare Metal GPU Provisioning

OS Installation

Infrastructure Automation

Scripting

IPMI

MUST BE IN PACIFIC OR MOUNTAIN TIME ZONE OR WILLING TO WORK THESE HOURS

Description:

The objective for each contractor is to bring-up and maintain our clients DGX Cloud infrastructure services running on top of bare metal.

We are seeking a contractor to deploy our clients infrastructure services on bare metal.

In this organization, you will deploy and ensure that our infrastructure services atop of our hardware for accelerated computing are running as reliably as needed.

What you’ll do:

● Deploy and run cloud infrastructure services in scope to meet our business goals performing migrations and decommissions as necessary.

● Eliminate toil or automate it where the ROI of building and maintaining automation is worth it.

● Practice sustainable blameless incident prevention and incident response while being a member of an on call rotation.

No prior experience having worked in a team of any particular name or having worked in a ML/AI focused team are required but also a nice to have.

Skills:

Bare Metal, IPMI, BMC, GPU, Ansible, Python, NCCL, Slurm, Docker, Kubernetes, Go, Perl, Ruby, Bash, SRE, DevOps, CRE

Top Skills Details:

Bare Metal,IPMI,BMC,GPU

Additional Skills & Qualifications:

Ways to stand out from the crowd:

● Experience working with GPUs and ancillary services and hardware on bare metal.

● Experience working with or developing bare metal as a service (BMaaS) associated systems.

● Experience working with or developing multi-cloud infrastructure services.

● Experience teaching reliability (e.g SRE) or more general cloud systems good practices to peers or to other companies (e.g CRE).

● Experience in running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker or Slurm.

● Experience with our clients Collective Communication Library (NCCL).

Experience Level:

Intermediate Level

o Eligibility requirements apply to some benefits and may depend on your job classification

and length of employment. Benefits are subject to change and may be subject to

specific elections, plan, or program terms.  If eligible, the benefits

available for this temporary role may include the following:

§ Medical, dental & vision

§ Critical Illness, Accident, and Hospital

§ 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available

§ Life Insurance (Voluntary Life & AD&D for the employee and dependents)

§ Short and long-term disability

§ Health Spending Account (HSA)

§ Transportation benefits

§ Employee Assistance Program

§ Time Off/Leave (PTO, Vacation or Sick Leave)

About TEKsystems:

Were partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. Thats the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

Show Full Vacancy