The company reported a record \$68.1 billion quarter, shipped its first Vera Rubin samples to customers, and gave guidance suggesting the AI infrastructure boom is still accelerating.

On February 25, 2026, NVIDIA did two things that would have been unthinkable five years ago. First, it reported **$68.1 billion** in revenue for a single quarter, the largest in the history of the semiconductor industry. Then, almost as a footnote on the earnings call, CFO Colette Kress mentioned something that mattered even more: "We shipped our first Vera Rubin samples to customers earlier this week."

Vera Rubin is NVIDIA's next-generation AI chip platform. It is the successor to Blackwell, which has been the engine behind the current AI infrastructure boom. If Blackwell was the chip that proved AI could be a hundred-billion-dollar business, Vera Rubin is the one NVIDIA is betting will make it a trillion-dollar one.

The timing is deliberate. NVIDIA's GTC 2026 conference kicks off March 16 in San Jose. CEO Jensen Huang has promised to unveil something that will "surprise the world." The samples shipping this week are a preview, a signal to the market, to competitors, and to every company building AI infrastructure: the next generation is here.

What Vera Rubin Actually Is

Vera Rubin is not just a GPU. It is a six-chip platform, the most complex NVIDIA has ever built.

The platform is named after Vera Florence Cooper Rubin (1928-2016), the American astronomer who provided the first strong observational evidence for dark matter by studying galaxy rotation curves in the 1970s. NVIDIA has a tradition of naming GPU architectures after scientists, from Tesla and Fermi to Ada Lovelace and Grace Hopper. Rubin continues the recent trend of honoring women who transformed their fields.

The six components:

Rubin GPU, The compute engine. Built on TSMC's 3nm process, with 336 billion transistors packed across two reticle-sized compute chiplets and two I/O dies.
Vera CPU, An 88-core custom Arm processor designed specifically to pair with the Rubin GPU. It is the first CPU to natively support FP8 precision.
NVLink 6, The interconnect fabric, delivering 3.6 TB/s per GPU, enough bandwidth to make dozens of GPUs behave like a single massive processor.
ConnectX-9 SuperNIC, BlueField-4 DPU, NVLink 6 Switch, and Spectrum-6 Ethernet Switch, The networking components that tie the system together at rack scale.

When assembled into NVIDIA's NVL72 configuration, 72 Rubin GPUs and 36 Vera CPUs in a single rack, the system delivers 3.6 exaflops of FP4 compute and 260 TB/s of internal bandwidth. NVIDIA says that bandwidth figure exceeds the entire internet's current capacity.

CNBC, which received an exclusive first look at the physical hardware, reported that each Vera Rubin system contains 1.3 million components from more than 80 suppliers across 20 countries.

Worth noting: "Vera Rubin" refers to the combined CPU-GPU superchip. "Vera" is the CPU. "Rubin" is the GPU. They connect via NVLink-C2C at 1.8 TB/s, double the bandwidth of the previous Grace Blackwell pairing.

The Specs

Here is what NVIDIA has revealed, compared to the current-generation Blackwell B200:

Spec	Vera Rubin	Blackwell (B200)	Improvement
Inference (FP4)	50 PFLOPS	~10 PFLOPS	5x
Training (FP4)	35 PFLOPS	~10 PFLOPS	3.5x
Memory	288 GB HBM4	192 GB HBM3e	1.5x
Memory Bandwidth	22 TB/s	8 TB/s	2.8x
Transistors	336 billion	208 billion	1.6x
NVLink Bandwidth	3.6 TB/s per GPU	1.8 TB/s per GPU	2x
Process Node	TSMC 3nm (N3P)	TSMC 4nm	,
TDP (reported)	~2,300W	1,200W	,

The Vera CPU brings 88 custom "Olympus" Arm cores with 176 threads via Spatial Multithreading, up to 1.5 TB of LPDDR5X memory, and 1.2 TB/s of memory bandwidth. Its performance is roughly double the Grace CPU it replaces.

In the full NVL72 rack, 72 GPUs, 36 CPUs, one enclosure, the system delivers 3.6 exaflops of FP4 compute, 20.7 TB of total HBM4 memory, and 260 TB/s of scale-up bandwidth. NVIDIA claims up to 10x lower cost per token and 4x fewer GPUs needed for mixture-of-experts training compared to Blackwell.

Worth noting: The reported ~2,300-watt TDP per GPU (per analyst estimates; NVIDIA has not officially confirmed this figure) is nearly double Blackwell's. Data centers will need significant infrastructure upgrades to run Vera Rubin at scale. NVIDIA claims the system-level efficiency improvements offset the raw power increase, but the absolute power draw is a real constraint for deployment.

The Biggest Quarter in Semiconductor History

The Vera Rubin sample shipment came on the same day NVIDIA reported financial results that broke its own records.

Metric	Q4 FY2026	Year-Over-Year
Revenue	$68.1 billion	+73%
Data Center Revenue	$62.3 billion	+75%
Net Income	$43.0 billion	~+94%
EPS (adjusted)	$1.62	+82%
Q1 FY2027 Guidance	$78.0 billion	Beat estimates by $5.4B

For the full fiscal year 2026 (ended January 2026), NVIDIA reported *$215.9 billion** in total revenue, up 65% year-over-year. Data center alone accounted for *$193.7 billion**, roughly 90% of the total. Hyperscalers represent just over half of data center revenue.

The quarterly acceleration is striking. Q1: *$44.1 billion**. Q2: *$46.7 billion**. Q3: *$57.0 billion**. Q4: *$68.1 billion**. Each quarter larger than the last, with no sign of deceleration. The Q1 FY2027 guidance of *$78 billion**, beating analyst estimates by *$5.4 billion**, suggests the trend is continuing.

NVIDIA's market capitalization sits at approximately *$4.7 trillion**, making it the most valuable company in the world. Its order backlog exceeds *$500 billion** and continues to grow as customers place full-year orders for Vera Rubin.

Huang framed the economics bluntly on the earnings call: "Compute is revenues. Without compute, there is no way to generate tokens. Without tokens, there's no way to grow revenues."

Kress added: "We expect every cloud model builder to deploy Vera Rubin."

The Road to Vera Rubin

March 18, 2025

GTC 2025: The Reveal

Jensen Huang shows the physical Vera Rubin Superchip for the first time. He announces the full roadmap, Rubin in H2 2026, Rubin Ultra in H2 2027, Feynman in 2028, and says reasoning and agentic AI have created "easily 100 times more" compute demand than expected a year prior.

Late 2025

Tape-Out and Fabrication

Both the Rubin GPU and Vera CPU complete tape-out and enter TSMC's 3nm fabrication line. SK Hynix begins ramping HBM4 memory production, though NVIDIA's decision to raise per-pin speed requirements above 11 Gbps pushes the HBM4 capacity ramp from Q2 to Q3 2026.

January 5, 2026

CES 2026: "In Full Production"

Jensen Huang announces that Vera Rubin is "in full production" at TSMC. The NVL72 rack system is officially launched. NVIDIA also reveals Rubin Ultra details: NVL576 racks with 576 GPUs delivering 15 exaflops of FP4 compute, due H2 2027.

February 25, 2026

First Samples Ship

CFO Colette Kress confirms on the Q4 earnings call that NVIDIA shipped its first Vera Rubin samples to customers earlier that week. The company reports record quarterly revenue of \$68.1 billion. Production shipments remain on track for H2 2026.

March 16, 2026

GTC 2026: "Surprise the World"

NVIDIA's annual GPU Technology Conference opens in San Jose. Jensen Huang delivers the keynote. He has promised to unveil a chip that will "surprise the world." Over 700 sessions are planned across four days.

Everyone Wants One

The list of confirmed Vera Rubin deployment partners reads like a directory of the world's most valuable technology companies.

Cloud providers (first wave, H2 2026): AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, and Nscale.

AI labs: Meta has committed to deploying "millions of Blackwell and Rubin GPUs." Anthropic will train and run inference on Vera Rubin systems. OpenAI, xAI, Mistral AI, Cohere, and Perplexity are all expected adopters.

Infrastructure partners: Dell, HPE, Lenovo, Cisco, and Supermicro will build server systems around the platform.

Microsoft is planning deployments of "hundreds of thousands of Vera Rubin Superchips" across its Fairwater AI superfactory sites.

The spending commitments behind these deployments are staggering. The four largest hyperscalers have collectively guided for $635-665 billion in capital expenditure for 2026:

Company	2026 CapEx (Guided)
Amazon	~$200 billion
Alphabet/Google	$175-185 billion
Microsoft	~$145 billion (analyst estimate)
Meta	$115-135 billion

That is a 67-74% increase over 2025 levels. Roughly three-quarters of it, around **$450 billion**, is directly tied to AI infrastructure: servers, GPUs, and data centers.

Worth noting: Amazon's AI infrastructure spending is so aggressive that analysts project the company will run negative free cash flow of $17-28 billion in 2026. The hyperscalers are increasingly turning to debt markets to fund AI capex, transforming what were historically cash-rich businesses into used ones. The question of whether this spending generates proportional returns is one nobody can answer yet.

The Competition Is Real

NVIDIA holds an estimated 86-95% of the AI training chip market depending on the measure. But for the first time, credible alternatives are emerging on multiple fronts.

AMD is closest. At CES 2026, AMD unveiled Helios, its direct rack-scale competitor to the NVL72. The MI400 series chips inside it feature 432 GB of HBM4 memory (50% more than Rubin's 288 GB), 19.6 TB/s bandwidth, and 40 PFLOPS of FP4 compute. AMD is targeting Helios shipments for H2 2026, potentially before Vera Rubin reaches volume production. Oracle has committed to 50,000 MI450 series chips, and OpenAI has partnered with AMD on a massive 6-gigawatt, **$90 billion**-plus data center and computing infrastructure deal.

Custom silicon is the bigger threat. Every major hyperscaler is now building its own AI chips:

Company	Custom Chip	Status
Google	TPU Trillium (v6)	Generally available. Anthropic signed the largest TPU deal in Google's history.
Amazon	Trainium2 / Trainium3	Trainium2 deployed (~500K chips). Trainium3 ramping early 2026.
Microsoft	Maia 200	Announced January 2026 on TSMC 3nm. Claims 3x Trainium3 inference performance.
Meta	MTIA v2 / v3	v3 due H2 2026. Targets 35%+ of Meta's inference fleet on custom silicon by year-end.

Custom ASIC shipments are projected to grow 44.6% in 2026, versus 16.1% growth for GPUs. Analysts project custom AI server ASICs could surpass GPU shipments by 2028.

Intel has effectively exited. The company cancelled its Falcon Shores data center GPU in January 2025 after failing to gain meaningful traction with Gaudi 3. Its replacement, Jaguar Shores, is not expected until late 2026 at the earliest. Intel is not a factor in the AI accelerator race.

But one number explains why NVIDIA is not panicking: CUDA has over 4 million developers and thousands of optimized applications. The switching cost is enormous. And in February 2026, Meta, despite years of investment in its own MTIA chips, signed a deal to buy millions more GPUs from NVIDIA anyway.

Worth noting: The custom silicon trend cuts both ways. Google's TPUs power Gemini. Amazon's Trainium runs Anthropic's Claude. But both Google and Amazon remain massive NVIDIA customers. Custom chips are supplementing NVIDIA, not replacing it, at least not yet. The real question is whether that changes when custom ASICs reach performance parity, which some analysts project could happen by 2028.

The Bigger Picture

The AI infrastructure buildout is now the largest technology investment in history.

The combined capital expenditure of the four largest hyperscalers in 2026, $635-665 billion, exceeds the GDP of most countries. Jensen Huang has framed it as the beginning, not the peak. At CES, he described AI infrastructure as an **$85 trillion** opportunity over the next 15 years and denied that the current spending represents a bubble.

There is evidence on both sides.

The demand for AI compute is genuinely explosive. NVIDIA has a **$500 billion**-plus backlog that keeps growing. Inference costs are falling fast enough to unlock entirely new applications. Agentic AI, systems that take autonomous actions, not just generate text, is creating what Huang calls "easily 100 times more" compute demand than the industry expected a year ago. At the GTC 2025 keynote, he argued that the shift from single-shot answers to multi-step reasoning has fundamentally changed the math on how much compute the world needs.

But hyperscalers are taking on massive debt to fund infrastructure that may not generate proportional revenue for years. Custom silicon threatens NVIDIA's pricing power. And the history of technology is littered with infrastructure booms that ended in correction, from fiber optics in 2000 to crypto mining rigs in 2018.

NVIDIA's answer to this is speed. Its one-year cadence, Blackwell (2024), Vera Rubin (2026), Rubin Ultra (2027), Feynman (2028), is designed to make the competition irrelevant before it arrives. Rubin Ultra will scale to NVL576 racks with 576 GPUs, delivering 15 exaflops of FP4 compute with up to 1 TB of HBM4e memory per GPU. By the time competitors match Vera Rubin, NVIDIA plans to be two generations ahead.

As one industry analyst put it: "If NVIDIA maintains this cadence, it will be even more difficult for competitors to catch up."

The Bottom Line

NVIDIA just reported the largest quarter in semiconductor history and shipped the first samples of the most powerful AI chip ever made. Its market cap stands at **$4.7 trillion**. Its backlog exceeds half a trillion dollars. And its guidance says growth is accelerating, not slowing.

Vera Rubin is a genuine generational leap: 5x the inference performance of Blackwell, 2.8x the memory bandwidth, the first GPU to use HBM4, and an entirely new 88-core Arm CPU designed from scratch to pair with it. Every major cloud provider, every major AI lab, and every major infrastructure partner has signed up to deploy it. The question is not whether Vera Rubin will sell. It is whether NVIDIA can make enough of them.

The competition is more credible than it has ever been. AMD's Helios could ship before Vera Rubin reaches volume. Custom ASICs from Google, Amazon, Microsoft, and Meta are growing nearly three times faster than GPUs. And the **$635 billion** in hyperscaler capex for 2026 suggests the market may be big enough for multiple winners.

But NVIDIA's annual release cadence, the CUDA ecosystem's 4 million developers, and a **$500 billion** backlog create a moat that nobody has come close to breaching. GTC 2026 is three weeks away. Jensen Huang has promised to "surprise the world."

Given what he just shipped, that is a remarkable thing to say.

Sources

NVIDIA: Financial Results for Fourth Quarter and Fiscal 2026 (Official) (Feb 25, 2026)
CNBC: First Look at NVIDIA's AI System Vera Rubin and How It Beats Blackwell (Feb 25, 2026)
CNBC: NVIDIA Q4 2026 Earnings Report (Feb 25, 2026)
Tom's Hardware: NVIDIA Delivers First Vera Rubin AI GPU Samples to Customers (Feb 25, 2026)
Tom's Hardware: NVIDIA Launches Vera Rubin NVL72 at CES (Jan 2026)
Tom's Hardware: NVIDIA's Vera Rubin Platform In-Depth (2026)
VideoCardz: NVIDIA Vera Rubin NVL72 Detailed, 72 GPUs, 36 CPUs, 260 TB/s (2026)
Fortune: NVIDIA Q4 Earnings Results (Feb 25, 2026)
NVIDIA Newsroom: Rubin Platform AI Supercomputer (2026)
CNBC: Tech AI Spending Approaches $700B in 2026 (Feb 6, 2026)
Tom's Hardware: NVIDIA Announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman After (Mar 2025)
Microsoft Azure Blog: Large-Scale NVIDIA Rubin Deployments (2026)

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths

Recommended Reading

Curated articles related to this topic

News

14 min

China's GLM-5: The 744B Open-Source Model Trained Entirely on Huawei Chips

China's GLM-5 represents a pivotal moment in sovereign AI development as a 744-billion parameter open-source model trained exclusively on 100,000 Huawei Ascend 910B chips using the MindSpore framework. Zhipu AI, rebranded as Z.ai, utilized a Mixture-of-Experts architecture where only 40 billion parameters activate per token, achieving computational efficiency comparable to dense models at a fraction of the inference cost. The model incorporates Multi-head Latent Attention to compress key-value pairs for memory reduction and DeepSeek Sparse Attention to manage a 200,000-token context window without hardware bottlenecks. By implementing Multi-token Prediction with three additional prediction layers, GLM-5 triples output generation speed through an average acceptance length of 2.76 tokens per step. Released under the permissive MIT license, GLM-5 challenges US semiconductor sanctions by proving competitive performance against GPT-5.2 and Claude Opus 4.5 is possible without NVIDIA hardware. Developers can deploy this multi-modal system for text, image, and video processing to reduce API costs significantly while bypassing Western hardware dependencies.

Apple Partners With Google To Power Siri: The "Gemini" Era of Apple Intelligence Begins

The landmark 2026 partnership between Apple and Google integrates Gemini 3 architecture directly into the iOS ecosystem, fundamentally upgrading Siri's generative capabilities. This strategic alliance replaces Apple's reliance on smaller proprietary models and OpenAI stopgaps with Google's state-of-the-art multimodal reasoning engines. The collaboration leverages a hybrid infrastructure where Apple utilizes Google's Cloud TPU resources for model training while executing inference exclusively on Apple's Private Cloud Compute (PCC) to maintain strict data isolation. By adopting Gemini 3, Apple Intelligence gains advanced reasoning without compromising the privacy-first architecture central to the iPhone value proposition. Understanding this integration clarifies how major tech ecosystems decouple model training from inference execution to balance performance with user privacy. Developers and analysts can use these specifications to predict the trajectory of iOS 20 features and the shifting competitive landscape of mobile AI deployment.

Google Just Dropped Gemini 3.1 Pro and the AI Race Just Got a Lot More Interesting

Google's unannounced release of Gemini 3.1 Pro on Vertex AI redefines expectations for agentic model performance by directly addressing the hallucination and consistency issues found in Gemini 3 Pro. The Gemini 3.1 Pro update delivers substantial improvements in multi-step tool execution, reasoning coherence, and instruction adherence, positioning the model as a superior alternative to Claude Opus 4.6 and GPT-5.3 Codex for technical tasks. Early community benchmarks highlight the ability of Gemini 3.1 Pro to handle complex generation tasks, such as creating a functional Windows 11-style web operating system or a 3D browser game in a single prompt. The release signifies a strategic shift toward API-first deployment, prioritizing developer utility over press events. Data scientists and AI engineers can leverage the new model ID gemini-3.1-pro to deploy high-fidelity agentic workflows that require minimal iterative debugging compared to previous Google model iterations.

Feb 19, 2026

MLOpsIntermediate

9 min

Google Vertex AI: The Unified Platform for Scaling ML from Experiment to Production

Google Vertex AI consolidates the machine learning lifecycle into a single unified platform, replacing fragmented workflows involving local notebooks and fragile API deployments. This guide examines how Vertex AI integrates AutoML for rapid prototyping with custom training pipelines for production-grade engineering, utilizing services like Feature Store, Model Registry, and BigQuery integration. Machine learning engineers will learn to navigate the core architecture, deciding between the automated ease of AutoML for baseline models and the flexibility of custom training code using TensorFlow or PyTorch. The analysis details how components like Vertex AI Pipelines orchestrate complex workflows from raw data ingestion to scalable model serving endpoints. By mastering these interconnected tools, developers can move beyond experimental silos and deploy robust, version-controlled machine learning models directly into production environments on Google Cloud Platform.

OpenAI Built a Model That Uses a Computer Better Than You Do. It Needed To.

OpenAI's GPT-5.4 release marks a pivotal moment in artificial intelligence capabilities, becoming the first general-purpose large language model to surpass human performance on the OSWorld computer use benchmark with a score of 75%. This significant leap from the 47.3% score of GPT-5.2 demonstrates native computer operation capabilities, allowing the model to process screenshots, issue mouse commands, and handle complex multi-application workflows without external agent frameworks. The release includes three distinct variants: standard GPT-5.4 for general tasks, GPT-5.4 Thinking for extended reasoning using chain-of-thought methodologies, and GPT-5.4 Pro for high-stakes professional analysis. GPT-5.4 Pro achieves 94.4% on GPQA Diamond and 38.0% on FrontierMath Tier 4, benchmarks designed to challenge professional mathematicians. Readers will understand the specific performance metrics of GPT-5.4 across OSWorld and WebArena-Verified benchmarks, the pricing structures for new million-token context windows, and the strategic implications of OpenAI reclaiming market share following the Pentagon contract controversy.

Audio

Mar 7, 2026

LLM FundamentalsIntermediate

9 min

Open Source vs Closed LLMs: Choosing the Right Model in 2026

The architectural decision between open source and closed Large Language Models in 2026 depends on specific deployment needs rather than a binary quality gap. DeepSeek V3 and DeepSeek R1 proved that open weights can match proprietary systems like OpenAI o1 and GPT-4o on MMLU and MATH-500 benchmarks through efficient Multi-Head Latent Attention and Group Relative Policy Optimization. While open models like Alibaba Qwen 3 offer flexible Apache 2.0 licensing and hybrid thinking modes, closed ecosystems like Gemini 3 Pro and Claude Sonnet 4.5 maintain advantages in production coding and complex instruction following. Developers must weigh the capital efficiency of FP8 mixed-precision training and self-hosting against the operational simplicity of managed APIs. Data scientists can use this framework to select the correct model architecture by analyzing reasoning capabilities, total cost of ownership, and specific performance metrics like AIME scores.

Cursor Hit $2 Billion in Revenue. Then It Told Developers to Stop Coding.

Cursor, the AI-powered code editor developed by Anysphere, achieved 1 billion in just three months. This financial milestone coincides with the launch of Cursor Automations, a feature enabling autonomous AI agents to execute complex programming tasks while engineers sleep. Cursor Automations integrates directly with tools like Datadog and Slack to identify production incidents, draft code fixes, and open pull requests without human intervention. The platform's rapid growth is driven by enterprise adoption, with 60% of revenue now coming from corporate contracts, representing a shift where 25% of all generative AI corporate spend flows to Cursor. Developers and engineering managers can use this analysis to understand the trajectory of AI-native development environments and how autonomous coding agents are replacing traditional software maintenance workflows.

Audio

Mar 7, 2026

LLM FundamentalsIntermediate

16 min

The Transformer Architecture Explained

The complete guide to the Transformer architecture: self-attention, multi-head attention, positional encoding, and why this single paper changed AI forever.

Audio

Mar 10, 2026

Deep LearningIntermediate

16 min

CNNs from Scratch: Understanding Convolutions Visually

Build intuition for convolutional neural networks from the ground up. Covers convolution operations, pooling, feature maps, and landmark CNN architectures from LeNet to EfficientNet.

GPT-5.3 Codex: OpenAI Just Released an AI That Helped Build Itself

GPT-5.3 Codex represents OpenAI's most significant advancement in agentic coding, defined by its ability to debug its own training data and manage deployment processes. This model achieves a record-breaking 77.3% on Terminal-Bench 2.0 and 64.7% on OSWorld, surpassing Claude Opus 4.6 by nearly 12 percentage points in agentic tasks. GPT-5.3 Codex runs on NVIDIA GB200 NVL72 systems, offering 25% faster inference speeds while consuming half the tokens of GPT-5.2 Codex. The architecture integrates reasoning capabilities directly with code generation, positioning the tool as a work-on-a-computer agent rather than a simple code completion assistant. Security researchers have classified GPT-5.3 Codex as the first high-risk cybersecurity model due to these autonomous capabilities. Developers and data scientists can now deploy GPT-5.3 Codex through the CLI, IDE extensions, or web interface to automate complex, multi-step software engineering workflows.

Audio

Feb 6, 2026