Google Just Dropped Gemini 3.1 Pro and the AI Race Just Got a Lot More Interesting

DS
LDS Team
Let's Data Science
10 min read

Improved agentic capabilities, stronger coding, reduced hallucinations, and a quiet Vertex AI launch that caught everyone off guard

By LDS Team

February 19, 2026

No blog post. No press event. No countdown timer. On February 19, 2026, developers noticed a new model ID appearing on Google's Vertex AI API: gemini-3.1-pro. Within hours, screenshots spread across Reddit, demos started flooding social media, and one user managed to one-shot an entire Windows 11-style web operating system in a single prompt.

Google's Gemini 3.1 Pro had arrived, and it did so the way Google increasingly prefers to ship: quietly, through the API, and letting the results speak for themselves.

This is the latest chapter in a Gemini story that has gone from "perpetually catching up" to "genuinely competitive." Here is everything we know so far.

Key Changes from Gemini 3 Pro to 3.1 Pro

Gemini 3 Pro arrived in late 2025 as Google's most capable model. It scored 76.2% on SWE-bench Verified, offered a 1M token context window, and introduced dynamic thinking with thought signatures. It was genuinely impressive on paper.

But it had problems. Independent community benchmarks flagged Gemini 3 Pro as having one of the highest hallucination rates among frontier models. Users reported inconsistent output quality. And by early 2026, newer competitors like Claude Opus 4.6 and GPT-5.3 Codex had surpassed it on real-world agentic tasks.

Gemini 3.1 Pro appears to be Google's direct response.

Here is what the community has identified so far based on early testing:

AreaGemini 3 ProGemini 3.1 Pro
Agentic tool useFunctional but inconsistentNoticeably more reliable multi-step execution
Code generationStrong on benchmarks, mixed in practiceUsers report stronger single-prompt apps
Reasoning coherenceOccasional logical gapsTighter logical flow on complex tasks
Instruction followingSometimes drifted from promptsMore precise adherence to detailed instructions
Hallucination rateFlagged as high by community evalsEarly reports suggest improvement (pending formal benchmarks)
AvailabilityPreview on AI Studio and Vertex AISpotted on Vertex AI, rolling out to API

Official benchmark numbers from Google are still pending. But the early hands-on testing tells a consistent story: 3.1 Pro feels like the model 3 Pro should have been at launch.

The Demos That Went Viral

Within hours of Gemini 3.1 Pro appearing on Vertex AI, the developer community began stress-testing it with increasingly ambitious prompts. The results spread rapidly across Reddit and X.

The Windows 11 Web OS -- One developer prompted Gemini 3.1 Pro to build a fully functional Windows 11-style web operating system in a single HTML file. The result included a working text editor, terminal with Python support, code editor, file manager, paint application, and a playable game. All generated in one prompt.

The Space Exploration Game -- Another user built a No Man's Sky-style space exploration game over roughly 20 prompts. The iterative process included fixing bugs, changing the spaceship model, improving controls, and adding shooting mechanics with asteroids. The result was a playable 3D space game running entirely in a browser.

The Seed Animation -- A particularly striking demo showed a photorealistic interactive animation of a seed growing into a full tree, complete with roots forming, stems emerging, and leaves appearing with natural timing. Community members called it the "best leaves ever seen" from an AI model.

These demos are not just impressive on their own. They show a meaningful step forward from Gemini 3 Pro, which struggled with the kind of complex, multi-component single-prompt generation that 3.1 Pro handles with apparent ease.

The Gemini Evolution: Two Years of Relentless Iteration

To appreciate where Gemini 3.1 Pro sits, it helps to see the full journey. Eight major releases in just over two years:

Dec 2023
Gemini 1.0
Google's first unified multimodal model. The starting gun in the frontier AI race.
Feb 2024
Gemini 1.5 Pro
Breakthrough 1M token context window. The largest of any frontier model at the time.
Jan 2025
Gemini 2.0 Flash
Speed-optimized model with native tool use. Signaled Google's shift toward agentic AI.
Mar 2025
Gemini 2.5 Pro
Major reasoning upgrade. First Gemini model with built-in thinking capabilities.
Apr 2025
Gemini 2.5 Flash
Production-grade Flash with Pro-level quality. Made advanced reasoning affordable at scale.
Nov 2025
Gemini 3 Pro
State-of-the-art reasoning, 76.2% SWE-bench Verified. But hallucination issues held it back.
Dec 2025
Gemini 3 Flash
Surpassed Pro on coding at 78.0% SWE-bench, at 75% lower cost.
Feb 2026
Gemini 3.1 Pro
Enhanced agentic capabilities and improved reliability. You are here.

What started as annual launches in 2023 became quarterly updates by mid-2025 and near-continuous iteration by 2026.

How Gemini 3.1 Pro Compares to the Competition

The frontier AI landscape in February 2026 is a three-way race. Here is where Gemini 3.1 Pro sits based on available benchmarks and reported capabilities:

CategoryGemini 3.1 ProClaude Opus 4.6GPT-5.3 Codex
Coding (SWE-bench)76.2%+ (Verified)Uses different variantUses SWE-bench Pro
Agentic coding (Terminal-Bench 2.0)Not yet reported65.4%77.3%
Context window1M tokens200K (1M in beta)Not disclosed
Output limit64K tokens128K tokensNot disclosed
Multimodal inputText, image, video, audio, PDFText, image, PDFText, image
Dynamic thinkingYes (3 levels)Yes (4 effort levels)Yes
Input pricing2 USD per 1M tokens5 USD per 1M tokensNot disclosed
Output pricing12 USD per 1M tokens25 USD per 1M tokensNot disclosed
Agent frameworksADK, Agno, Browser Use, LettaClaude Code, Agent SDKCodex CLI, Frontier

Three things stand out.

Price. Gemini 3.1 Pro costs 60% less than Claude Opus 4.6 on input and 52% less on output. For high-volume production workloads, that gap adds up fast.

Multimodal breadth. Gemini 3.1 Pro natively accepts video and audio inputs alongside text, images, and PDFs. Neither Claude nor GPT currently match that range.

Where it trails. Opus 4.6 offers double the output limit at 128K tokens. GPT-5.3 Codex leads on Terminal-Bench 2.0, the benchmark that measures autonomous coding tasks, scoring 77.3% versus Opus 4.6's 65.4%. And until Google publishes official 3.1 Pro benchmarks, we are comparing against Gemini 3 Pro's numbers, which may understate the new model's actual performance.

Pro Tip: Terminal-Bench 2.0 scores can vary depending on which agent harness runs the model. The numbers above reflect each company's self-reported results.

Features Carried Forward from the Gemini 3 Architecture

Gemini 3.1 Pro builds on the same core architecture as Gemini 3 Pro. For developers working with the model, these features carry forward:

Dynamic Thinking

Gemini 3.1 Pro reasons by default. Developers control reasoning depth with the thinking_level parameter:

Thinking LevelBehaviorBest For
LowMinimizes reasoning depthSimple queries, chat, high-throughput apps
MediumBalanced reasoningModerate analysis, general-purpose tasks
High (default)Maximizes reasoning depthComplex analysis, coding, multi-step tasks

Pro Tip: Google strongly recommends keeping temperature at the default 1.0 for all Gemini 3 series models. Setting it lower can cause unexpected behavior like looping, especially on math and reasoning tasks. This is the opposite of what most developers are used to.

Thought Signatures

Gemini 3.1 Pro uses thought signatures, encrypted representations of the model's internal reasoning that must be passed back across multi-turn conversations. Strip them out and the model loses its reasoning context. For agentic workflows spanning multiple API calls, this is critical.

Granular Multimodal Control

The media_resolution parameter gives developers fine-grained control over how the model processes visual input:

Media TypeRecommended SettingMax TokensNotes
ImagesHigh1,120Best for most image analysis
PDFsMedium560Quality saturates at medium for documents
Video (general)Low70 per frameSufficient for action recognition
Video (text-heavy)High280 per frameRequired for reading text in video

The Hallucination Question

This is the biggest open question around Gemini 3.1 Pro.

Gemini 3 Pro's hallucination rate was its most criticized weakness. Community-run evaluations consistently flagged it as producing factual errors more frequently than competing frontier models from Anthropic and OpenAI.

Early testing of Gemini 3.1 Pro suggests this is one of the areas Google targeted. Users report fewer obvious factual errors and more instances of the model expressing uncertainty rather than generating confident but wrong answers. But without formal benchmarks, this remains anecdotal.

If Google has meaningfully reduced hallucinations in 3.1 Pro, it changes the calculus for enterprise adoption. A model that is cheaper than its competitors and factually reliable is a compelling combination. If the hallucination rate remains high, no amount of pricing advantage will overcome the trust gap for production use cases in healthcare, legal, and financial services.

Google's formal benchmark publication will be the moment of truth.

Pricing and Availability

Based on the Gemini 3 series pricing (which 3.1 Pro is expected to match):

DetailSpecification
Vertex AIAvailable now
Google AI StudioNot yet available
Gemini APIRolling out, spotted by developers
Input pricing (standard)2 USD per 1M tokens
Input pricing (long context)4 USD per 1M tokens (over 200K)
Output pricing (standard)12 USD per 1M tokens
Output pricing (long context)18 USD per 1M tokens (over 200K)
Context window1M tokens
Output limit64K tokens
Knowledge cutoffJanuary 2025

For developers already using the Gemini API, switching to Gemini 3.1 Pro should be as simple as updating the model ID once it becomes generally available.

The Bigger Picture: Google's AI Strategy in 2026

Gemini 3.1 Pro does not exist in isolation. It is one piece of a much larger Google AI offensive that has accelerated in early 2026:

  • Lyria RealTime -- Experimental AI music generation model available via the Gemini API
  • Veo 3.1 -- State-of-the-art video generation with native audio
  • Gemini CLI -- Google's answer to Claude Code, now supporting Gemini 3 series models
  • Agent Development Kit (ADK) -- Open-source framework for building Gemini-powered agents, with integrations across Agno, Browser Use, Eigent, Letta, and mem0
  • India AI Summit 2026 -- Google is making major investments, competing directly with OpenAI's Reliance partnership

Google's approach is different from its competitors. Where Anthropic leads with safety research and OpenAI leads with consumer products, Google is leveraging its infrastructure advantage: the models are cheaper, the multimodal capabilities are broader, and the integration with Google Cloud gives enterprise customers a seamless deployment path.

The Bottom Line

Gemini 3.1 Pro is not a revolution. It is a targeted fix for the things that held Gemini 3 Pro back from being a genuine top-tier choice.

The Gemini 3 Pro foundation was already strong: competitive benchmarks, 1M token context, native multimodal understanding across five input types, and pricing that undercuts the competition by more than half. What it lacked was consistency, reliability in agentic tasks, and factual accuracy.

If Gemini 3.1 Pro delivers on the early reports -- better tool use, stronger code generation, fewer hallucinations -- then Google has quietly closed the gap with Claude Opus 4.6 and GPT-5.3 Codex in the areas that matter most for production use.

For now, Gemini 3.1 Pro is live on Vertex AI for anyone who wants to try it. No waitlist. No announcement. No fanfare. Just a new model ID and a community of developers already pushing it to its limits.

Sources