Improved agentic capabilities, stronger coding, reduced hallucinations, and a quiet Vertex AI launch that caught everyone off guard

No blog post. No press event. No countdown timer. On February 19, 2026, developers noticed a new model ID appearing on Google's Vertex AI API: gemini-3.1-pro. Within hours, screenshots spread across Reddit, demos started flooding social media, and one user managed to one-shot an entire Windows 11-style web operating system in a single prompt.

Google's Gemini 3.1 Pro had arrived, and it did so the way Google increasingly prefers to ship: quietly, through the API, and letting the results speak for themselves.

This is the latest chapter in a Gemini story that has gone from "perpetually catching up" to "genuinely competitive." Here is everything we know so far.

—-

Key Changes from Gemini 3 Pro to 3.1 Pro

Gemini 3 Pro arrived in late 2025 as Google's most capable model. It scored 76.2% on SWE-bench Verified, offered a 1M token context window, and introduced dynamic thinking with thought signatures. It was genuinely impressive on paper.

But it had problems. Independent community benchmarks flagged Gemini 3 Pro as having one of the highest hallucination rates among frontier models. Users reported inconsistent output quality. And by early 2026, newer competitors like Claude Opus 4.6 and GPT-5.3 Codex had surpassed it on real-world agentic tasks.

Gemini 3.1 Pro appears to be Google's direct response.

Here is what the community has identified so far based on early testing:

Area	Gemini 3 Pro	Gemini 3.1 Pro
Agentic tool use	Functional but inconsistent	Noticeably more reliable multi-step execution
Code generation	Strong on benchmarks, mixed in practice	Users report stronger single-prompt apps
Reasoning coherence	Occasional logical gaps	Tighter logical flow on complex tasks
Instruction following	Sometimes drifted from prompts	More precise adherence to detailed instructions
Hallucination rate	Flagged as high by community evals	Early reports suggest improvement (pending formal benchmarks)
Availability	Preview on AI Studio and Vertex AI	Spotted on Vertex AI, rolling out to API

Official benchmark numbers from Google are still pending. But the early hands-on testing tells a consistent story: 3.1 Pro feels like the model 3 Pro should have been at launch.

—-

The Demos That Went Viral

Within hours of Gemini 3.1 Pro appearing on Vertex AI, the developer community began stress-testing it with increasingly ambitious prompts. The results spread rapidly across Reddit and X.

The Windows 11 Web OS, One developer prompted Gemini 3.1 Pro to build a fully functional Windows 11-style web operating system in a single HTML file. The result included a working text editor, terminal with Python support, code editor, file manager, paint application, and a playable game. All generated in one prompt.

The Space Exploration Game, Another user built a No Man's Sky-style space exploration game over roughly 20 prompts. The iterative process included fixing bugs, changing the spaceship model, improving controls, and adding shooting mechanics with asteroids. The result was a playable 3D space game running entirely in a browser.

The Seed Animation, A particularly striking demo showed a photorealistic interactive animation of a seed growing into a full tree, complete with roots forming, stems emerging, and leaves appearing with natural timing. Community members called it the "best leaves ever seen" from an AI model.

These demos are not just impressive on their own. They show a meaningful step forward from Gemini 3 Pro, which struggled with the kind of complex, multi-component single-prompt generation that 3.1 Pro handles with apparent ease.

—-

The Gemini Evolution: Two Years of Relentless Iteration

To appreciate where Gemini 3.1 Pro sits, it helps to see the full journey. Eight major releases in just over two years:

Dec 2023

Gemini 1.0

Google's first unified multimodal model. The starting gun in the frontier AI race.

Feb 2024

Gemini 1.5 Pro

Breakthrough 1M token context window. The largest of any frontier model at the time.

Jan 2025

Gemini 2.0 Flash

Speed-optimized model with native tool use. Signaled Google's shift toward agentic AI.

Mar 2025

Gemini 2.5 Pro

Major reasoning upgrade. First Gemini model with built-in thinking capabilities.

Apr 2025

Gemini 2.5 Flash

Production-grade Flash with Pro-level quality. Made advanced reasoning affordable at scale.

Nov 2025

Gemini 3 Pro

State-of-the-art reasoning, 76.2% SWE-bench Verified. But hallucination issues held it back.

Dec 2025

Gemini 3 Flash

Surpassed Pro on coding at 78.0% SWE-bench, at 75% lower cost.

Feb 2026

Gemini 3.1 Pro

Enhanced agentic capabilities and improved reliability. You are here.

What started as annual launches in 2023 became quarterly updates by mid-2025 and near-continuous iteration by 2026.

—-

How Gemini 3.1 Pro Compares to the Competition

Frontier AI in February 2026 is a three-way race. Here is where Gemini 3.1 Pro sits based on available benchmarks and reported capabilities:

Category	Gemini 3.1 Pro	Claude Opus 4.6	GPT-5.3 Codex
Coding (SWE-bench)	76.2%+ (Verified)	Uses different variant	Uses SWE-bench Pro
Agentic coding (Terminal-Bench 2.0)	Not yet reported	65.4%	77.3%
Context window	1M tokens	200K (1M in beta)	Not disclosed
Output limit	64K tokens	128K tokens	Not disclosed
Multimodal input	Text, image, video, audio, PDF	Text, image, PDF	Text, image
Dynamic thinking	Yes (3 levels)	Yes (4 effort levels)	Yes
Input pricing	2 USD per 1M tokens	5 USD per 1M tokens	Not disclosed
Output pricing	12 USD per 1M tokens	25 USD per 1M tokens	Not disclosed
Agent frameworks	ADK, Agno, Browser Use, Letta	Claude Code, Agent SDK	Codex CLI, Frontier

Three things stand out.

Price. Gemini 3.1 Pro costs 60% less than Claude Opus 4.6 on input and 52% less on output. For high-volume production workloads, that gap adds up fast.

Multimodal breadth. Gemini 3.1 Pro natively accepts video and audio inputs alongside text, images, and PDFs. Neither Claude nor GPT currently match that range.

Where it trails. Opus 4.6 offers double the output limit at 128K tokens. GPT-5.3 Codex leads on Terminal-Bench 2.0, the benchmark that measures autonomous coding tasks, scoring 77.3% versus Opus 4.6's 65.4%. And until Google publishes official 3.1 Pro benchmarks, we are comparing against Gemini 3 Pro's numbers, which may understate the new model's actual performance.

Pro Tip: Terminal-Bench 2.0 scores can vary depending on which agent harness runs the model. The numbers above reflect each company's self-reported results.

—-

Features Carried Forward from the Gemini 3 Architecture

Gemini 3.1 Pro builds on the same core architecture as Gemini 3 Pro. For developers working with the model, these features carry forward:

Dynamic Thinking

Gemini 3.1 Pro reasons by default. Developers control reasoning depth with the thinking_level parameter:

Thinking Level	Behavior	Best For
Low	Minimizes reasoning depth	Simple queries, chat, high-throughput apps
Medium	Balanced reasoning	Moderate analysis, general-purpose tasks
High (default)	Maximizes reasoning depth	Complex analysis, coding, multi-step tasks

Pro Tip: Google strongly recommends keeping temperature at the default 1.0 for all Gemini 3 series models. Setting it lower can cause unexpected behavior like looping, especially on math and reasoning tasks. This is the opposite of what most developers are used to.

Thought Signatures

Gemini 3.1 Pro uses thought signatures, encrypted representations of the model's internal reasoning that must be passed back across multi-turn conversations. Strip them out and the model loses its reasoning context. For agentic workflows spanning multiple API calls, this is critical.

Granular Multimodal Control

The media_resolution parameter gives developers fine-grained control over how the model processes visual input:

Media Type	Recommended Setting	Max Tokens	Notes
Images	High	1,120	Best for most image analysis
PDFs	Medium	560	Quality saturates at medium for documents
Video (general)	Low	70 per frame	Sufficient for action recognition
Video (text-heavy)	High	280 per frame	Required for reading text in video

—-

The Hallucination Question

This is the biggest open question around Gemini 3.1 Pro.

Gemini 3 Pro's hallucination rate was its most criticized weakness. Community-run evaluations consistently flagged it as producing factual errors more frequently than competing frontier models from Anthropic and OpenAI.

Early testing of Gemini 3.1 Pro suggests this is one of the areas Google targeted. Users report fewer obvious factual errors and more instances of the model expressing uncertainty rather than generating confident but wrong answers. But without formal benchmarks, this remains anecdotal.

If Google has meaningfully reduced hallucinations in 3.1 Pro, it changes the calculus for enterprise adoption. A model that is cheaper than its competitors and factually reliable is a compelling combination. If the hallucination rate remains high, no amount of pricing advantage will overcome the trust gap for production use cases in healthcare, legal, and financial services.

Google's formal benchmark publication will be the moment of truth.

—-

Pricing and Availability

Based on the Gemini 3 series pricing (which 3.1 Pro is expected to match):

Detail	Specification
Vertex AI	Available now
Google AI Studio	Not yet available
Gemini API	Rolling out, spotted by developers
Input pricing (standard)	2 USD per 1M tokens
Input pricing (long context)	4 USD per 1M tokens (over 200K)
Output pricing (standard)	12 USD per 1M tokens
Output pricing (long context)	18 USD per 1M tokens (over 200K)
Context window	1M tokens
Output limit	64K tokens
Knowledge cutoff	January 2025

For developers already using the Gemini API, switching to Gemini 3.1 Pro should be as simple as updating the model ID once it becomes generally available.

—-

The Bigger Picture: Google's AI Strategy in 2026

Gemini 3.1 Pro does not exist in isolation. It is one piece of a much larger Google AI offensive that has accelerated in early 2026:

Lyria RealTime, Experimental AI music generation model available via the Gemini API
Veo 3.1, State-of-the-art video generation with native audio
Gemini CLI, Google's answer to Claude Code, now supporting Gemini 3 series models
Agent Development Kit (ADK), Open-source framework for building Gemini-powered agents, with integrations across Agno, Browser Use, Eigent, Letta, and mem0
India AI Summit 2026, Google is making major investments, competing directly with OpenAI's Reliance partnership

Google's approach is different from its competitors. Where Anthropic leads with safety research and OpenAI leads with consumer products, Google is using its infrastructure advantage: the models are cheaper, the multimodal capabilities are broader, and the integration with Google Cloud gives enterprise customers a smooth deployment path.

—-

The Bottom Line

Gemini 3.1 Pro is not a revolution. It is a targeted fix for the things that held Gemini 3 Pro back from being a genuine top-tier choice.

The Gemini 3 Pro foundation was already strong: competitive benchmarks, 1M token context, native multimodal understanding across five input types, and pricing that undercuts the competition by more than half. What it lacked was consistency, reliability in agentic tasks, and factual accuracy.

If Gemini 3.1 Pro delivers on the early reports, better tool use, stronger code generation, fewer hallucinations, then Google has quietly closed the gap with Claude Opus 4.6 and GPT-5.3 Codex in the areas that matter most for production use.

For now, Gemini 3.1 Pro is live on Vertex AI for anyone who wants to try it. No waitlist. No announcement. No fanfare. Just a new model ID and a community of developers already pushing it to its limits.

—-

Google Just Dropped Gemini 3.1 Pro and the AI Race Just Got a Lot More Interesting

Key Changes from Gemini 3 Pro to 3.1 Pro

The Demos That Went Viral

The Gemini Evolution: Two Years of Relentless Iteration

How Gemini 3.1 Pro Compares to the Competition

Features Carried Forward from the Gemini 3 Architecture

Dynamic Thinking

Thought Signatures

Granular Multimodal Control

The Hallucination Question

Pricing and Availability

The Bigger Picture: Google's AI Strategy in 2026

The Bottom Line

Sources

Recommended Reading

OpenAI Built a Model That Uses a Computer Better Than You Do. It Needed To.

GPT-5.3 Codex: OpenAI Just Released an AI That Helped Build Itself

Apple Partners With Google To Power Siri: The "Gemini" Era of Apple Intelligence Begins

Claude Opus 4.6: Anthropic Just Dropped Its Most Intelligent Model and Wall Street Is Paying Attention

Open Source vs Closed LLMs: Choosing the Right Model in 2026

China's GLM-5: The 744B Open-Source Model Trained Entirely on Huawei Chips

Structured Outputs: Making LLMs Return Reliable JSON

Reasoning Models: How AI Learned to Think Step by Step

Humanity's Last Exam: The Test That's Humbling the World's Smartest AI

Long Context Models: Working with 1M+ Token Windows