GPU Cluster Sysadmin

Stanford Research Computing · Research Computing

unknown

Salary Range (USD)

Negotiable

Location

Stanford, USA

Visa Support

Not mentioned

Funding Stage

Unknown

Job Responsibilities

• Working with AI/ML/Deep Learning software & frameworks
• Keeping the environment up-to-date
• Interacting with users & PIs

Required Skills

AI/ML/Deep Learning software & frameworks

Engineering Culture & Tech Stack

NVIDIA DGX H100DDN IntelliflashDDN NFS

Raw Post

Show original text

Stanford Research Computing | Stanford, CA (next to Palo Alto) | Full-time | Three positions | HYBRID Stanford Research Computing (https://srcc.stanford.edu) is a collaboration between University IT and the Vice Provost and Dean of Research. We operate HPC environments for researchers, we do one-time consultations on projects (from software and pipelines, to data management, to physical building design and fit-out), and we provide contract support for individual Labs, Departments, and Schools. We have three open positions: • Principal Storage Architect & Team Lead: Our current storage team lead is moving on to Industry, so we're splitting his work into two separate roles. This is the Technical Manager position: You'll be leading the storage team, setting the direction for our large storage environments: Oak (file storage, used by multiple cluster), Fir (fast scratch for Sherlock), and Elm (object storage on top of tape). Knowledge of Lustre, Infiniband, and PB-scale storage is important. • Storage Architect or Storage Sysadmin: This is the second role I referenced above. You'll be maintaining & expanding Oak, our 20+ Pebibyte Lustre storage environment used by our largest HPC clusters. Depending on your experience level, you might also have some responsibility for Elm, which provides object storage on top of tape. Knowledge of Lustre, Infiniband, and PB-scale storage is important here, too. • GPU Cluster Sysadmin: With Marlowe—our 1SU NVIDIA DGX H100 SuperPOD with DDN Intelliflash and DDN NFS storage—launched, we have decided to hire an additional sysadmin! You'll be working with the latest AI/ML/Deep Learning/LLM software & frameworks, getting them to work in an HPC environment. You'll be keeping the environment up-to-date, and working with NVIDIA/DDN when there's trouble. You should also expect to interact with users & PIs a lot. No "more info" links right now, unfortunately: Our hiring system is down this week, as part of an HR & Payroll systems upgrade. For now, keep a watch on https://srcc.stanford.edu/work-research-computing; updated links will be posted there next week! If you don't already live in the Bay Area, we provide a relocation incentive. Depending on where you live, we provide free transit passes. Unfortunately, if you drive, you will have to pay for parking for the days you're on-site. There is some on-call around the holidays. We get a 403(b) match, good healthcare, and 30+ days off per year (holidays + vacation). All Benefits are all publicly documented at https://cardinalatwork.stanford.edu/benefits-rewards. If you have questions, feel free to reply here or email me (the info is in my profile)!

AI Risk Insights

No major risk signals detected.

Recent News

No recent updates

Data Source

Content parsed by LLM from Hacker News raw data. Confidence:HIGH

Set up Profile