Improved agentic capabilities, stronger coding, reduced hallucinations, and a quiet Vertex AI launch that caught everyone off guard
By LDS Team
February 19, 2026
No blog post. No press event. No countdown timer. On February 19, 2026, developers noticed a new model ID appearing on Google's Vertex AI API: gemini-3.1-pro. Within hours, screenshots spread across Reddit, demos started flooding social media, and one user managed to one-shot an entire Windows 11-style web operating system in a single prompt.
Google's Gemini 3.1 Pro had arrived, and it did so the way Google increasingly prefers to ship: quietly, through the API, and letting the results speak for themselves.
This is the latest chapter in a Gemini story that has gone from "perpetually catching up" to "genuinely competitive." Here is everything we know so far.
Key Changes from Gemini 3 Pro to 3.1 Pro
Gemini 3 Pro arrived in late 2025 as Google's most capable model. It scored 76.2% on SWE-bench Verified, offered a 1M token context window, and introduced dynamic thinking with thought signatures. It was genuinely impressive on paper.
But it had problems. Independent community benchmarks flagged Gemini 3 Pro as having one of the highest hallucination rates among frontier models. Users reported inconsistent output quality. And by early 2026, newer competitors like Claude Opus 4.6 and GPT-5.3 Codex had surpassed it on real-world agentic tasks.
Gemini 3.1 Pro appears to be Google's direct response.
Here is what the community has identified so far based on early testing:
| Area | Gemini 3 Pro | Gemini 3.1 Pro |
|---|---|---|
| Agentic tool use | Functional but inconsistent | Noticeably more reliable multi-step execution |
| Code generation | Strong on benchmarks, mixed in practice | Users report stronger single-prompt apps |
| Reasoning coherence | Occasional logical gaps | Tighter logical flow on complex tasks |
| Instruction following | Sometimes drifted from prompts | More precise adherence to detailed instructions |
| Hallucination rate | Flagged as high by community evals | Early reports suggest improvement (pending formal benchmarks) |
| Availability | Preview on AI Studio and Vertex AI | Spotted on Vertex AI, rolling out to API |
Official benchmark numbers from Google are still pending. But the early hands-on testing tells a consistent story: 3.1 Pro feels like the model 3 Pro should have been at launch.
The Demos That Went Viral
Within hours of Gemini 3.1 Pro appearing on Vertex AI, the developer community began stress-testing it with increasingly ambitious prompts. The results spread rapidly across Reddit and X.
The Windows 11 Web OS -- One developer prompted Gemini 3.1 Pro to build a fully functional Windows 11-style web operating system in a single HTML file. The result included a working text editor, terminal with Python support, code editor, file manager, paint application, and a playable game. All generated in one prompt.
The Space Exploration Game -- Another user built a No Man's Sky-style space exploration game over roughly 20 prompts. The iterative process included fixing bugs, changing the spaceship model, improving controls, and adding shooting mechanics with asteroids. The result was a playable 3D space game running entirely in a browser.
The Seed Animation -- A particularly striking demo showed a photorealistic interactive animation of a seed growing into a full tree, complete with roots forming, stems emerging, and leaves appearing with natural timing. Community members called it the "best leaves ever seen" from an AI model.
These demos are not just impressive on their own. They show a meaningful step forward from Gemini 3 Pro, which struggled with the kind of complex, multi-component single-prompt generation that 3.1 Pro handles with apparent ease.
The Gemini Evolution: Two Years of Relentless Iteration
To appreciate where Gemini 3.1 Pro sits, it helps to see the full journey. Eight major releases in just over two years:
What started as annual launches in 2023 became quarterly updates by mid-2025 and near-continuous iteration by 2026.
How Gemini 3.1 Pro Compares to the Competition
The frontier AI landscape in February 2026 is a three-way race. Here is where Gemini 3.1 Pro sits based on available benchmarks and reported capabilities:
| Category | Gemini 3.1 Pro | Claude Opus 4.6 | GPT-5.3 Codex |
|---|---|---|---|
| Coding (SWE-bench) | 76.2%+ (Verified) | Uses different variant | Uses SWE-bench Pro |
| Agentic coding (Terminal-Bench 2.0) | Not yet reported | 65.4% | 77.3% |
| Context window | 1M tokens | 200K (1M in beta) | Not disclosed |
| Output limit | 64K tokens | 128K tokens | Not disclosed |
| Multimodal input | Text, image, video, audio, PDF | Text, image, PDF | Text, image |
| Dynamic thinking | Yes (3 levels) | Yes (4 effort levels) | Yes |
| Input pricing | 2 USD per 1M tokens | 5 USD per 1M tokens | Not disclosed |
| Output pricing | 12 USD per 1M tokens | 25 USD per 1M tokens | Not disclosed |
| Agent frameworks | ADK, Agno, Browser Use, Letta | Claude Code, Agent SDK | Codex CLI, Frontier |
Three things stand out.
Price. Gemini 3.1 Pro costs 60% less than Claude Opus 4.6 on input and 52% less on output. For high-volume production workloads, that gap adds up fast.
Multimodal breadth. Gemini 3.1 Pro natively accepts video and audio inputs alongside text, images, and PDFs. Neither Claude nor GPT currently match that range.
Where it trails. Opus 4.6 offers double the output limit at 128K tokens. GPT-5.3 Codex leads on Terminal-Bench 2.0, the benchmark that measures autonomous coding tasks, scoring 77.3% versus Opus 4.6's 65.4%. And until Google publishes official 3.1 Pro benchmarks, we are comparing against Gemini 3 Pro's numbers, which may understate the new model's actual performance.
Pro Tip: Terminal-Bench 2.0 scores can vary depending on which agent harness runs the model. The numbers above reflect each company's self-reported results.
Features Carried Forward from the Gemini 3 Architecture
Gemini 3.1 Pro builds on the same core architecture as Gemini 3 Pro. For developers working with the model, these features carry forward:
Dynamic Thinking
Gemini 3.1 Pro reasons by default. Developers control reasoning depth with the thinking_level parameter:
| Thinking Level | Behavior | Best For |
|---|---|---|
| Low | Minimizes reasoning depth | Simple queries, chat, high-throughput apps |
| Medium | Balanced reasoning | Moderate analysis, general-purpose tasks |
| High (default) | Maximizes reasoning depth | Complex analysis, coding, multi-step tasks |
Pro Tip: Google strongly recommends keeping temperature at the default 1.0 for all Gemini 3 series models. Setting it lower can cause unexpected behavior like looping, especially on math and reasoning tasks. This is the opposite of what most developers are used to.
Thought Signatures
Gemini 3.1 Pro uses thought signatures, encrypted representations of the model's internal reasoning that must be passed back across multi-turn conversations. Strip them out and the model loses its reasoning context. For agentic workflows spanning multiple API calls, this is critical.
Granular Multimodal Control
The media_resolution parameter gives developers fine-grained control over how the model processes visual input:
| Media Type | Recommended Setting | Max Tokens | Notes |
|---|---|---|---|
| Images | High | 1,120 | Best for most image analysis |
| PDFs | Medium | 560 | Quality saturates at medium for documents |
| Video (general) | Low | 70 per frame | Sufficient for action recognition |
| Video (text-heavy) | High | 280 per frame | Required for reading text in video |
The Hallucination Question
This is the biggest open question around Gemini 3.1 Pro.
Gemini 3 Pro's hallucination rate was its most criticized weakness. Community-run evaluations consistently flagged it as producing factual errors more frequently than competing frontier models from Anthropic and OpenAI.
Early testing of Gemini 3.1 Pro suggests this is one of the areas Google targeted. Users report fewer obvious factual errors and more instances of the model expressing uncertainty rather than generating confident but wrong answers. But without formal benchmarks, this remains anecdotal.
If Google has meaningfully reduced hallucinations in 3.1 Pro, it changes the calculus for enterprise adoption. A model that is cheaper than its competitors and factually reliable is a compelling combination. If the hallucination rate remains high, no amount of pricing advantage will overcome the trust gap for production use cases in healthcare, legal, and financial services.
Google's formal benchmark publication will be the moment of truth.
Pricing and Availability
Based on the Gemini 3 series pricing (which 3.1 Pro is expected to match):
| Detail | Specification |
|---|---|
| Vertex AI | Available now |
| Google AI Studio | Not yet available |
| Gemini API | Rolling out, spotted by developers |
| Input pricing (standard) | 2 USD per 1M tokens |
| Input pricing (long context) | 4 USD per 1M tokens (over 200K) |
| Output pricing (standard) | 12 USD per 1M tokens |
| Output pricing (long context) | 18 USD per 1M tokens (over 200K) |
| Context window | 1M tokens |
| Output limit | 64K tokens |
| Knowledge cutoff | January 2025 |
For developers already using the Gemini API, switching to Gemini 3.1 Pro should be as simple as updating the model ID once it becomes generally available.
The Bigger Picture: Google's AI Strategy in 2026
Gemini 3.1 Pro does not exist in isolation. It is one piece of a much larger Google AI offensive that has accelerated in early 2026:
- Lyria RealTime -- Experimental AI music generation model available via the Gemini API
- Veo 3.1 -- State-of-the-art video generation with native audio
- Gemini CLI -- Google's answer to Claude Code, now supporting Gemini 3 series models
- Agent Development Kit (ADK) -- Open-source framework for building Gemini-powered agents, with integrations across Agno, Browser Use, Eigent, Letta, and mem0
- India AI Summit 2026 -- Google is making major investments, competing directly with OpenAI's Reliance partnership
Google's approach is different from its competitors. Where Anthropic leads with safety research and OpenAI leads with consumer products, Google is leveraging its infrastructure advantage: the models are cheaper, the multimodal capabilities are broader, and the integration with Google Cloud gives enterprise customers a seamless deployment path.
The Bottom Line
Gemini 3.1 Pro is not a revolution. It is a targeted fix for the things that held Gemini 3 Pro back from being a genuine top-tier choice.
The Gemini 3 Pro foundation was already strong: competitive benchmarks, 1M token context, native multimodal understanding across five input types, and pricing that undercuts the competition by more than half. What it lacked was consistency, reliability in agentic tasks, and factual accuracy.
If Gemini 3.1 Pro delivers on the early reports -- better tool use, stronger code generation, fewer hallucinations -- then Google has quietly closed the gap with Claude Opus 4.6 and GPT-5.3 Codex in the areas that matter most for production use.
For now, Gemini 3.1 Pro is live on Vertex AI for anyone who wants to try it. No waitlist. No announcement. No fanfare. Just a new model ID and a community of developers already pushing it to its limits.
Sources
- Google AI for Developers: Gemini 3 Developer Guide
- Google AI for Developers: Gemini 3 Pro Preview Model Page
- Google AI for Developers: All Models
- Google AI for Developers: Thought Signatures
- Google AI for Developers: Media Resolution
- Google DeepMind: Gemini 3 Pro Benchmarks
- Google DeepMind: Gemini 3 Flash Benchmarks
- Google Developers Blog: Gemini 3 Flash is now available in Gemini CLI
- Google Developers Blog: Real-World Agent Examples with Gemini 3
- Google Developers Blog: 5 Things to Try with Gemini 3 Pro in Gemini CLI
- Terminal-Bench 2.0 Leaderboard
- Anthropic: Claude Models Documentation
- Wikipedia: Gemini (language model)
- Reddit r/singularity: Gemini 3.1 Pro is now live on Vertex AI (Feb 19, 2026)
- Reddit r/Bard: Vertex AI - gemini 3.1 pro is out (Feb 19, 2026)
- Reddit r/singularity: Gemini 3.1 Pro one-shots a Windows 11-style web OS (Feb 19, 2026)
- Reddit r/singularity: Gemini 3.1 Pro makes a NMS style space exploration game (Feb 19, 2026)