Senior Site Reliability Engineer - AI/ML optimized GPU clusters
We usually respond within three days
The organization
Our client operates one of the largest GPU infrastructures in the world — 90,000+ GPUs and 10InfiniBand fabrics across five global data centers. Their infrastructure doubles in size every year. We’re looking for engineers who love getting deep into Linux systems, pushing hardware and software to their limits, and making the world’s fastest AI and HPC workloads run even faster. They develop their own proprietary stack / cloud environment, fully optimized for AI/ML applications.
The role
Your responsibilities will include:
Ensure fault-tolerance, scale, and uninterrupted operations for the service.
Use cutting-edge cloud technology to solve a variety of infrastructure problems.
Implement and improve CI/CD processes.
Your profile
Solid experience with programming languages (like Go, Python, or C++), beyond scripting;
You have experience in environments with a multitude of GPUs distributed over multiple nodes;
Good understanding of classic algorithms and data structures;
Commercial experience with, and deep understanding of, Unix/Linux systems and network technology;
Solid experience with CI/CD and IaC;
Experience with containerization and configuration management (Ansible, Salt, Terraform, Docker, Kubenetes, Helm).
It will be an added bonus if you have:
A desire to be involved in backend development;
Experience designing, developing, and running high-load distributed systems;
Experience with a variety of cloud platforms.
Coding interviews are part of the process.
What's offered
Competitive salary and comprehensive benefits package.
Opportunities for professional growth and taking ownership in a massivley scaling environment.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.
On-site in Amsterdam or full-remote (across Europe).
- Business unit
- The Next Chapter W&S
- Locations
- Europe, Amsterdam
- Remote status
- Hybrid
- Is work permit / visa sponsorship offered?
- Yes, but only for candidates already based in Europe.
- Is remote possible?
- This role is open for both on-site in The Netherlands as well as full-remote
- Is freelance possible?
- No, this is a permanent job with a regular contract of employment.
- Which language skills are required (professional level)?
- English
- Employment type
- Full-time, Regular - indefinite, Regular - temporary
About The Next Chapter W&S
We focus on job opportunities in The Netherlands for IT and engineering professionals. We share relevant tips and tricks with jobseekers and we can support employers with regards to relocation, work permit rules, 30% ruling et cetera. We value transparancy, honesty and a no-nonsense approach based on our extensive technical and international recruitment expertise.