The rebirth of artificial intelligence over the past five years has led to rapid progress in challenging areas such as computer vision and speech recognition. As computers begin to learn about the world around them, this is in turn opening up new possibilities in fields such as healthcare, transportation and robotics.
“Machine learning is one of the most important computer revolutions ever,” Nvidia CEO Jensen Huang said at last week’s annual GPU Technology Conference, “Computers are learning by themselves.”
The emergence of deep learning is the result of three factors: smarter algorithms, lots of data, and the use of GPUs to speed up training. The world’s largest cloud companies are increasingly relying on GPUs to develop their own models and making this infrastructure available to their customers. It’s little surprise that Nvidia’s sales to datacenters for GPU computing nearly tripled year-over-year in its most recent quarter.
But the company’s ambitions go beyond selling GPUs. Nvidia’s strategy, as Huang put it, is to provide the most productive hardware and software platform for deep learning. And at the GPU Technology Conference, Nvidia made a compelling case.
The most significant announcement was the first GPU based on Nvidia’s new Volta architecture. Like the current Pascal GP100, the GV100 is designed for both high-performance computing and deep learning workloads. But the similarities end there.
The GV100 is manufactured on a more advanced process, foundry TSMC’s 12nm recipe, and is a much larger chip with an astounding 21.1 billion transistors on a die measuring 815 square millimeters. By comparison, the 16nm GP100 has 15.3 billion transistors on a die that measures 610 square millimeters. The Tesla V100 is so big and complex that Huang said it perhaps the most expensive chip ever built. “If anyone would like to buy this, it’s approximately $3 billion,” he joked while holding up what he said was the first one back from the foundry.
It also has a new architecture that consists of up to 84 Streaming Multiprocessors (SMs)–each with 64 single-precision floating-point units, 64 single-precision integer units, and 32 double-precision floating-point units–a total of 5,376 FP64 CUDA cores and 2,688 FP32 CUDA cores. The GV100 also includes a new type of core, called a Tensor Core, that is designed specifically to speed up the kind of matrix math used heavily in deep learning. Each SM has eight of these for a total of 672 Tensor Cores.
The first product based on this GPU is the Tesla V100, which has 80 active SMs or a total of 5,120 FP64 CUDA cores and 640 Tensor Cores. (It’s common for large chips such as GPUs to use most–but not all–of the resources on the die to improve yields.) It also has enhanced shared memory, 16GB of Samsung’s HBM2 stacked memory with 900GBps of bandwidth (50 percent more throughput than the P100 when combined with other efficiencies), and an updated version of the…