News Overview
- Architectural Developments: NVIDIA introduces Vera (CPU) and Rubin (GPU) architectures, with Rubin Ultra aiming to pack 576 GPU dies into a single rack consuming 600 kW of power.
- Performance Targets: The Vera-Rubin NVL144 system is projected to deliver up to 3.6 exaFLOPS of dense FP4 performance for inference and 1.2 exaFLOPS for FP8 training.
- Release Timeline: Vera CPU cores and Rubin GPUs are expected to be available starting late 2026, with Rubin Ultra systems anticipated in late 2027.
Original article: Nvidia’s Vera Rubin CPU, GPU roadmap charts course for hot-hot-hot 600 kW racks
In-Depth Analysis
Vera CPU Architecture
- Core Design: Features 88 custom-designed Arm cores with simultaneous multithreading (SMT), allowing up to 176 threads per socket.
- Integration: Includes NVLink chip-to-chip connectivity for seamless integration with Rubin GPUs.
Rubin GPU Architecture
- Design Basis: Builds upon the Blackwell architecture, utilizing two reticle-limited dies per GPU package.
- Memory and Bandwidth: Equipped with 288 GB of HBM4 memory, offering a bandwidth of 13 TB/s.
- Performance Metrics: Capable of delivering up to 50 petaFLOPS at FP4 precision.
Rubin Ultra Systems
- Density and Power: Designed to house 576 GPU dies within a single rack, consuming 600 kW of power.
- Performance Enhancements: The NVL144 system is projected to offer 3.3 times higher floating-point performance compared to previous models, reaching up to 3.6 exaFLOPS for inference and 1.2 exaFLOPS for training.
- Networking: Incorporates NVIDIA’s 6th-generation NVLink switch fabric, providing an aggregate interconnect bandwidth of 260 TB/s.
Commentary
NVIDIA’s roadmap signifies a bold move towards ultra-high-density, high-performance computing solutions tailored for next-generation data centers. The ambitious power consumption targets, particularly the 600 kW per rack, highlight the increasing energy demands of advanced AI workloads. This development underscores the need for robust power and cooling infrastructures to support such dense configurations. As these architectures progress from design to deployment, their impact on computational capabilities and data center operations will be closely scrutinized by industry stakeholders.