Skip to content

Jensen Huang Walked Out With a Chip Doing 50 Petaflops. The AI Industry Held Its Breath.

DS
LDS Team
Let's Data Science
13 minAudio · 1 listens
Listen Along
0:00/ 0:00
AI voice
At GTC 2026, NVIDIA officially launched the Vera Rubin platform with 5x the inference performance of Blackwell, unveiled the 600kW Rubin Ultra rack coming in 2027, gave the world its first look at the Feynman architecture on TSMC's 1.6nm A16 process, and announced NemoClaw — an open-source enterprise AI agent platform. The message was simple: the AI buildout is not slowing down.

The SAP Center in San Jose had 30,000 people packed into it on a Monday morning, which is a strange sentence to write about a chip conference. Jensen Huang walked out at 11 a.m. PT in his leather jacket, paused at the edge of the stage, and said something to the effect of: we have a lot to show you today. That turned out to be an understatement.

GTC 2026 — NVIDIA's annual developer conference, running March 16-19 — was always going to be a statement event. NVIDIA's stock had pulled back roughly 14% after a record $68.1 billion revenue quarter, with investors asking uncomfortable questions about whether the AI infrastructure build is sustainable. Huang's answer was a two-hour barrage of silicon, software, and roadmap.

The keynote ran two hours and covered three major platforms.

January 2026 — CES Las Vegas
Vera Rubin NVL72 Formally Announced
Huang announces the Vera Rubin platform at CES, confirms it is already in full production, and promises 5x inference performance and 10x lower cost per token over Blackwell. Ships to hyperscalers in second half of 2026.
March 10, 2026 — Before GTC
NemoClaw Leaks via Wired
Wired reports NVIDIA has been briefing Salesforce, Cisco, Google, Adobe, and CrowdStrike on NemoClaw, an open-source enterprise AI agent platform. The company does not confirm or deny.
March 13, 2026 — Days Before GTC
TrendForce: Feynman Preview Expected at GTC
TrendForce reports NVIDIA may offer a first public look at the Feynman architecture, a 2028 chip built on TSMC's A16 1.6nm process — the most advanced semiconductor node in mass production history.
March 16, 2026 — GTC Keynote
Jensen Huang Takes the Stage at SAP Center
30,000 attendees from over 190 countries. A two-hour keynote covering Vera Rubin, Rubin Ultra, Feynman, NemoClaw, physical AI, and robotics. One of the most consequential presentations in the company's history.

The Chip That Changes the Math on Inference

The Vera Rubin NVL72 is the centerpiece of GTC 2026. Huang walked through the numbers with the deliberate pacing of someone who knows the audience has calculators.

Each Rubin GPU delivers 50 PFLOPS of NVFP4 inference performance — 5x over Blackwell. At the rack level, the NVL72 configuration delivers 3.6 EFLOPS of inference compute. The GPU itself is built from two reticle-sized dies, packs 336 billion transistors, and uses HBM4 memory with up to 288GB per chip and 22 TB/s of memory bandwidth.

The number that drew the loudest reaction in the room was the NVLink 6 interconnect figure: 260 TB/s of aggregate scale-up bandwidth in a single NVL72 rack. NVIDIA's claim is that this exceeds the total bandwidth of the entire internet. Whether that comparison holds up to scrutiny or not, it lands.

The broader cost story is what matters for enterprise buyers. NVIDIA says the NVL72 delivers 10x lower cost per token compared to Blackwell for large MoE model inference, and that training a given MoE model requires only one-quarter the number of GPUs that Blackwell needed. Those numbers — if they hold in production — would reshape the economics of every AI company running at scale.

The rack ships to cloud partners including AWS, Google Cloud, Microsoft Azure, and Oracle Cloud in the second half of 2026. Rubin is already in full production as of Q1 2026.

Rubin Ultra: A 600-Kilowatt Rack Coming in 2027

Huang did not stop at Vera Rubin. He unveiled the Rubin Ultra NVL576 — a configuration arriving in the second half of 2027 that packs 576 GPU chiplets (144 quad-chiplet Rubin Ultra GPUs), 365TB of HBM4e memory, and 15 EFLOPS of FP4 inference compute.

The power envelope is staggering: 600 kilowatts per rack, housed in what NVIDIA is calling the "Kyber" rack infrastructure. That is not a typo. A single rack consuming 600kW requires entirely new data center design, liquid cooling architectures, and power distribution infrastructure.

More than four times the throughput of the Vera Rubin NVL72, the Rubin Ultra NVL576 represents a generational leap that would have been considered science fiction just three years ago.

The Feynman Architecture Gets Its First Public Look

The moment many analysts had been waiting for: Huang showed the first formal preview of Feynman, NVIDIA's 2028 GPU architecture.

Feynman is built on TSMC's A16 process — a 1.6nm-class node that is the most advanced semiconductor manufacturing technology ever brought to mass production. The A16 node introduces Super Power Rail (SPR), a backside power delivery network that routes power beneath the silicon rather than competing for routing space on the front side. This improves both power efficiency and thermal headroom.

The architecture is designed as an "inference-first" chip, built for the long-context, multi-step reasoning requirements of AI agents rather than for maximizing training throughput. It also introduces something architecturally new: silicon photonics, replacing traditional electrical interconnects with optical signals for inter-chip communication. This is a foundational shift in how GPU clusters communicate at scale.

Feynman is not expected to ship until 2028. But the fact that NVIDIA showed it at all — while Rubin is barely out the door — signals how far ahead the company is building its roadmap.

Vera Rubin vs. Blackwell: The Numbers Side by Side

SpecificationBlackwell B200Vera Rubin (per GPU)Change
Inference performance (NVFP4)~10 PFLOPS50 PFLOPS5x
Training performance (NVFP4)~10 PFLOPS35 PFLOPS3.5x
HBM memory per GPU192GB HBM3e288GB HBM41.5x
HBM bandwidth per GPU~8 TB/s22 TB/s~2.75x
GPU transistor count208B336B1.6x
Rack config (max scale-up)NVL72 (Blackwell)NVL72 (Rubin)
Rack NVLink bandwidth~130 TB/s260 TB/s2x
Rack inference compute~0.72 EFLOPS3.6 EFLOPS5x
Cost per token vs. prior genBaseline10x lower (MoE inference)10x
MoE training GPU count neededBaseline0.25x (one-quarter)4x fewer
Memory processHBM3eHBM4+1 gen
Manufacturing nodeTSMC N4PTSMC N3P+1 gen

NemoClaw: NVIDIA Enters the Enterprise AI Agent Market

The software announcement of the keynote was NemoClaw, NVIDIA's open-source platform for deploying AI agents inside enterprises.

The framing matters here. OpenClaw — which Huang also mentioned — is built for individual users. NemoClaw is built for companies. It integrates three existing NVIDIA components: the NeMo framework for model training and agent reasoning pipelines, the Nemotron model family (released December 2025), and NIM inference microservices for deployment.

Two design choices set NemoClaw apart from most enterprise AI agent offerings. First, it is hardware-agnostic — companies can run it regardless of whether they use NVIDIA chips, a notable departure from NVIDIA's historical CUDA lock-in strategy. Second, it includes built-in security and privacy controls designed for enterprise compliance requirements, addressing the concern that has slowed agent adoption in regulated industries.

NVIDIA briefed Salesforce, Cisco, Google, Adobe, and CrowdStrike ahead of GTC. No formal partnership deals were confirmed as of the keynote.

The practical ambition here is significant. NemoClaw is not a chatbot wrapper or an API. It is infrastructure for deploying AI agents that can execute multi-step tasks autonomously across business workflows — the kind of thing that would actually replace or augment meaningful knowledge work at scale.

Physical AI: Robots, Digital Twins, and the Newton Engine

Huang spent a substantial portion of the keynote on physical AI — NVIDIA's term for the convergence of AI with robotics and the physical world.

Huang used the physical AI segment to show how far NVIDIA's robotics platform has come since GTC 2025, when the company first unveiled Isaac GR00T N1 — the humanoid robot foundation model — alongside the Blue robot built in collaboration with Disney Research and Google DeepMind, and the Newton physics engine for training robot movements at scale. The GTC 2026 segment demonstrated new deployments and updates to the platform, showing robots executing increasingly complex manipulation tasks in real industrial environments.

The vision Huang articulated is that every industrial facility, warehouse, and logistics operation will eventually run AI agents and physical robots in concert. The data center builds NVIDIA has been selling for the last three years are, in this framing, not the end product — they are the training ground for the physical AI systems that come next.

What the Skeptics Are Saying

Not everyone in the room — or watching the livestream — was convinced.

The timing of GTC is pointed. NVIDIA posted $68.1 billion in quarterly revenue last month and watched its stock fall more than 14% in the aftermath. The market had already priced in strong growth; what investors are trying to figure out now is whether AI infrastructure spending continues to compound or begins to moderate.

Several analysts raised specific concerns ahead of the keynote. The "paper spec" problem is the first: NVIDIA has a history of announcing extraordinary benchmark numbers at conferences that later prove difficult to replicate in real-world production workloads. The 50 PFLOPS figure, for instance, applies to NVFP4 precision — a data format that not every model or inference pipeline can use. Real-world throughput depends heavily on model architecture, batch size, memory pressure, and the specific mix of operations being run.

The 10x cost-per-token claim carries similar caveats. That comparison is specific to MoE model inference — an architecture that benefits disproportionately from Rubin's interconnect bandwidth and memory capacity. Dense model inference will likely see a smaller improvement.

The 600kW Rubin Ultra rack is a genuine constraint. Data centers designed for today's AI workloads were not built with 600kW per rack in mind. Retrofitting existing facilities — or building new ones to accommodate that power density — adds capital costs and timelines that could slow adoption curves significantly.

Bank of America maintained a $300 price target heading into GTC. The market consensus sat around $273. But the persistent question — whether AI infrastructure spending can sustain NVIDIA's growth trajectory into 2027 and beyond — was not answered definitively today. Huang made the case that it can. He usually does.

The Strategic Picture

Vera Rubin, NemoClaw, and Feynman aren't three separate product launches.

NVIDIA is building a platform company, not a chip company. The Vera Rubin NVL72 handles today's hyperscale training and inference demand. NemoClaw captures the enterprise software layer as AI agents move from experiments to production systems. Feynman — with its inference-first design and optical interconnects — is being built for a world where agentic AI systems are running at sustained scale, not just in training runs.

The silicon photonics move is worth watching closely. Electrical interconnects at the scale NVIDIA is operating are hitting physical limits. Moving to optical signals for inter-chip communication within racks and across clusters is not incremental — it is a wholesale change to the plumbing of AI infrastructure.

For a longer look at the Vera Rubin architecture's technical foundations, see our detailed breakdown of the Vera Rubin GPU architecture published before GTC.

The Bottom Line

NVIDIA walked into its biggest conference in years carrying investor skepticism, a 14% stock pullback, and questions about whether the AI bubble was quietly deflating. Jensen Huang responded by showing a chip already in production that does 50 petaflops per GPU, a 600-kilowatt rack coming in 2027, a 2028 architecture built on the world's most advanced semiconductor process, and an enterprise software platform designed to capture the next layer of AI spending.

Whether the numbers hold up in production — and whether enterprise customers actually adopt NemoClaw at the scale NVIDIA is projecting — will determine whether today's announcements are a turning point or a very expensive set of promises.

But there is no question about the ambition. Huang looked at a room full of 30,000 people who flew in from 190 countries and told them the AI buildout is not even close to done.

"We are at the beginning of the industrial revolution of AI," he said. "Every company in every industry will need a data center. Every data center will need to be upgraded."

That is either the most important thing anyone said in technology today, or the most expensive sales pitch in history. Probably both.

Sources

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems
Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths