1. HW Optimization
ML Job Workflow When a researcher submits a training job, Scheduler reserves nodes, OS provides the GPU devices and memory allocations using the NVIDIA driver Container provides the correct software environment including the optimized, hardware-aware CUDA libraries User code (e.g. PyTorch, TensorFlow, JAX) uses these CUDA libraries which ultimately communicate with the driver and hardware. Quick refresher From Superchip to SuperPOD GB200 & GB200 NVL72 Specs ...