- Meta (Menlo Park, CA)
- …libraries and scheduling infrastructure. **Required Skills:** Sr. Technical Lead Manager - AI / HPC Systems Performance Responsibilities: 1. Support ... requirements of large-scale training and inference workloads. To improve performance of these systems we constantly look...on monitoring, benchmarking and looking for opportunities to improve performance of AI Training and Inference. 2.… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
- Meta (Austin, TX)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Austin, TX)
- … testing with focus on automation. 22. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity with ... and/or similar languages. **Preferred Qualifications:** Preferred Qualifications: 16. Proficiency in High- Performance Computing ( HPC ) or AI system… more
- Meta (Menlo Park, CA)
- …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
- Deloitte (Cincinnati, OH)
- …optimizing high- performance computing ( HPC ) and artificial intelligence ( AI ) infrastructure, ensuring the systems meet the requirements for scalability, ... Architecture and Design: Develop and refine the architecture for HPC and AI systems . This...system performance , ensuring the efficient execution of AI models and HPC applications. Implement techniques… more
- Deloitte (Dayton, OH)
- …day-to-day operations of the High- Performance Computing ( HPC ) and AI infrastructure, ensuring all systems meet or exceed requirements for scalability, ... Responsibilities: + System support and management of infrastructure for HPC and AI systems , this...system performance , ensuring the efficient execution of AI models and HPC applications. Implement techniques… more
- NVIDIA (Santa Clara, CA)
- …to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
- NVIDIA (Santa Clara, CA)
- …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
- NVIDIA (Santa Clara, CA)
- …looking for an experienced HPC Engineer to join the E2E software verification HPC / AI Infrastructure team. We are building supercomputers and HPC clusters ... develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist...be doing: + Designing, implementing and maintaining large scale HPC / AI clusters with monitoring, logging and alerting… more