Zhipu AI released a 744-billion parameter model trained on 100,000 Huawei Ascend chips with zero NVIDIA dependency, then open-sourced it under an MIT license. It is the first frontier AI model to prove that China can build competitive intelligence without American hardware.

On February 11, 2026, a Beijing-based AI company most Americans have never heard of did something that the entire U.S. semiconductor sanctions regime was designed to prevent. Zhipu AI, now rebranded as Z.ai, released GLM-5, a 744-billion parameter language model that performs within single digits of GPT-5.2 and Claude Opus 4.5 on major benchmarks, and it was trained entirely on Huawei's Ascend 910B processors. Not a single NVIDIA chip was involved.

Then they open-sourced it under the MIT license. Anyone on Earth can download it, modify it, and deploy it commercially with zero restrictions.

The release landed like a shockwave across the AI industry. Within 24 hours, Zhipu AI's stock on the Hong Kong Stock Exchange surged 28.7%. Within days, the model's weights were being downloaded across HuggingFace and ModelScope by developers worldwide. And in Washington, the announcement reignited a debate that has been simmering since DeepSeek's R1 rattled markets in January 2025: are U.S. chip export controls actually working?

The answer, as of February 2026, is more complicated than anyone in the Commerce Department would like to admit.

What GLM-5 actually is

GLM-5 is a Mixture-of-Experts (MoE) model. That means it contains 744 billion total parameters, but only activates 40 billion of them for any given input, routing each token through 8 of its 256 specialized expert sub-networks. This architecture, pioneered at scale by models like Mixtral and DeepSeek-V3, allows GLM-5 to deliver frontier-class performance at a fraction of the computational cost of a dense model of equivalent capability.

Specification	GLM-5
Total parameters	744 billion
Active parameters	40 billion (per token)
Architecture	Mixture-of-Experts (256 experts, 8 active)
Layers	80
Context window	200,000 tokens
Max output length	131,072 tokens
Training data	28.5 trillion tokens
Training hardware	100,000 Huawei Ascend 910B
Training framework	MindSpore (Huawei)
License	MIT (fully permissive)
API pricing (input)	$1.00 / 1M tokens
API pricing (output)	$3.20 / 1M tokens
Release date	February 11-12, 2026

The pricing alone is significant. At $1.00 per million input tokens and $3.20 per million output tokens, GLM-5 is significantly cheaper than GPT-5.2 or Claude Opus 4.6 for API access. For enterprises processing millions of documents daily, that is not a marginal savings, it is a category shift.

The model supports text, image, video, and audio inputs. It can generate text and images as outputs. It handles documents up to 200,000 tokens, roughly 500 pages, in a single context window.

The architecture beneath the benchmarks

GLM-5's technical innovations go beyond simply being large. It adopts Multi-head Latent Attention (MLA) from DeepSeek-V2, which compresses key-value pairs into a latent space to slash memory during inference. DeepSeek Sparse Attention (DSA) dynamically selects which tokens to attend to, enabling the 200K context window without prohibitive compute costs. And Multi-token Prediction (MTP), using three additional prediction layers, achieves an average acceptance length of 2.76 tokens per step, effectively tripling output generation speed per forward pass. Together, these techniques partially compensate for the raw performance gap between Huawei's Ascend chips and NVIDIA's latest hardware.

How it performs

GLM-5's benchmark results tell a subtle story. It is genuinely competitive at the frontier, but it is not uniformly dominant.

Benchmark	GLM-5	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
SWE-bench Verified	77.8%	80.0%	80.9%	76.2%
AIME 2026 I	92.7%	,	93.3%	90.6%
AIME 2025 I	88.7%	100%	,	,
GPQA-Diamond	86.0%	92.4%	87.0%	91.9%
HLE (with tools)	50.4%	,	,	,
AA Intelligence Index	50+	,	,	,

The standout result is SWE-bench Verified, the industry's benchmark for real-world software engineering. GLM-5's 77.8% makes it the highest-scoring open-source model on this benchmark, trailing GPT-5.2 (80.0%) and Claude Opus 4.5 (80.9%) by only 2-3 percentage points. For an open-weight model trained on domestic Chinese hardware, that gap is remarkably narrow.

On GPQA-Diamond, a graduate-level science reasoning benchmark, GLM-5's 86.0% is strong but trails GPT-5.2's 92.4% and Gemini 3 Pro's 91.9%. On mathematical reasoning, the picture is mixed. GLM-5 scores 92.7% on AIME 2026 I, a competitive result. But on AIME 2025 I, it scores 88.7% compared to GPT-5.2's perfect 100%. On Humanity's Last Exam (HLE), a benchmark specifically designed to resist AI performance, GLM-5 with tools scored 50.4%, a result that, if independently verified, would represent a significant advance.

GLM-5 is also the first open-source model to score above 50 on the Artificial Analysis Intelligence Index v4.0, a composite benchmark that aggregates performance across multiple evaluation suites. That milestone matters because it places an openly available model in territory previously reserved for proprietary systems behind API paywalls.

Worth noting: Independent benchmark verification remains an ongoing concern in the AI industry. Chinese labs have faced scrutiny over benchmark practices, and some researchers have noted that self-reported scores should be treated with caution until reproduced by third parties. As of late February 2026, several independent evaluations are underway but not yet published.

Trained on 100,000 Huawei chips

This is the detail that transforms GLM-5 from an impressive model release into a geopolitical event.

GLM-5 was trained on a cluster of 100,000 Huawei Ascend 910B processors, chips designed by Huawei's HiSilicon subsidiary and manufactured by Semiconductor Manufacturing International Corporation (SMIC), China's largest chipmaker, using a 7-nanometer process. No NVIDIA GPUs were used at any stage of training. No AMD chips. No Intel accelerators. The entire training stack, hardware, framework, and infrastructure, is Chinese.

The Ascend 910B delivers approximately 320 TFLOPS of FP16 performance. For reference, that places it between NVIDIA's A100 (312 TFLOPS) and the H100 (989 TFLOPS FP16 dense). It is a capable chip, but it is not competitive with NVIDIA's current generation on raw throughput.

Chip	Process	FP16 Performance	Memory	Status
Huawei Ascend 910B	SMIC 7nm (DUV)	~320 TFLOPS	64GB HBM2e	Used for GLM-5 training
NVIDIA A100	TSMC 7nm	312 TFLOPS	80GB HBM2e	Export-banned to China (Oct 2022)
NVIDIA H100	TSMC 4nm	989 TFLOPS (FP16 dense)	80GB HBM3	Export-banned to China (Oct 2023)
NVIDIA H200	TSMC 4nm	989 TFLOPS (FP16 dense)	141GB HBM3e	Allowed with conditions (Jan 2026)
Huawei Ascend 910C	SMIC 7nm (DUV)	~800 TFLOPS	128GB HBM3	In production, ~30% yield

What Zhipu AI accomplished was not just an engineering challenge of training a large model. It was a systems integration challenge of making 100,000 chips, each individually less powerful than their NVIDIA counterparts, work together reliably enough to complete a training run of 28.5 trillion tokens without the kind of failures that would force a restart.

This is where the SMIC manufacturing constraints become relevant. SMIC produces the Ascend 910B using DUV (deep ultraviolet) lithography, not the EUV (extreme ultraviolet) lithography that TSMC and Samsung use for their most advanced nodes. DUV at 7nm requires multi-patterning, exposing each layer of the chip multiple times, which reduces yield rates. Industry estimates put SMIC's 7nm yield at 30-50%, compared to TSMC's 90%+ at the same node. Every chip in Zhipu AI's 100,000-unit cluster was produced under these constraints.

The training framework was Huawei's MindSpore, an open-source deep learning platform that serves as China's answer to PyTorch. MindSpore was purpose-built for the Ascend hardware ecosystem, providing the compiler optimizations and distributed training capabilities needed to coordinate 100,000 chips efficiently.

The reinforcement learning pipeline

GLM-5's post-training is built around "Slime," an asynchronous reinforcement learning framework named after the foraging behavior of slime molds. It runs over 1,000 concurrent rollouts in parallel, decoupling generation from training to increase RL throughput by an order of magnitude. The model then passes through three RL stages, reasoning (math, coding), agentic (tool use, web browsing), and general alignment, with each stage feeding into the next through on-policy cross-stage distillation.

The results show in hallucination reduction. GLM-5 scores -1 on the AA-Omniscience Index, where 0 represents perfect calibration between confidence and accuracy. The previous generation model GLM-4.7 scored -36. That cross-generation leap, from significantly overconfident to nearly perfectly calibrated, is one of the largest documented gains in hallucination reduction by any lab.

Where DeepSeek failed, Zhipu succeeded

The significance of GLM-5's Huawei-only training becomes clearer in context. DeepSeek, the lab behind R1 that rattled global markets in January 2025, reportedly attempted to train its successor R2 on Huawei Ascend hardware. The effort failed. DeepSeek encountered stability issues that made large-scale Ascend training runs unreliable, and ultimately reverted to NVIDIA GPUs for R2.

If China's most technically accomplished AI lab could not make Huawei hardware work for training at scale, the Ascend ecosystem appeared unready for frontier development. GLM-5's successful training on 100,000 Ascend chips directly contradicts that conclusion.

The contrast extends to geopolitics. The U.S. government has alleged that DeepSeek trained its models on NVIDIA chips obtained in violation of export controls, specifically, Blackwell GPUs that should never have reached China. DeepSeek denies this. Zhipu AI faces no such questions. Their entire training stack is domestically sourced, and the MIT license means anyone can verify it.

From Tsinghua lab to $40 billion company

Zhipu AI was founded in 2019 as a spinout from Tsinghua University's Knowledge Engineering Group (KEG), one of China's most prestigious AI research labs, by professors Tang Jie and Li Juanzi. CEO Zhang Peng leads commercial operations. The deep Tsinghua ties gave the company access to talent, government relationships, and early funding that pure startups lacked, the company raised approximately **$1.5 billion** pre-IPO from Alibaba, Tencent, Meituan, Xiaomi, and Saudi Aramco's venture arm Prosperity7.

On January 8, 2026, Zhipu AI went public on the Hong Kong Stock Exchange (2513.HK), raising *$558 million** at a *$6.5 billion** valuation. The offering was oversubscribed 1,159 times. By mid-February, the stock had surged over 500%, pushing market capitalization past **$40 billion**, more than six times the IPO valuation. On the day GLM-5 was announced, shares surged 28.7%.

The timeline

2019

Zhipu AI Founded

Spun out of Tsinghua University's Knowledge Engineering Group by professors Tang Jie and Li Juanzi. Zhang Peng becomes CEO. Initial focus on knowledge graph technology and NLP research.

October 2022

U.S. Chip Sanctions Begin

The Biden administration imposes sweeping export controls on advanced semiconductors to China. NVIDIA's A100 and H100 GPUs are banned. Chinese AI labs begin scrambling for alternatives.

October 2023

Sanctions Tighten

The U.S. closes the H800/A800 loophole, NVIDIA's China-specific chips designed to comply with initial restrictions. Huawei's Ascend chips become the primary domestic alternative.

May 2024

China Launches Big Fund III

China establishes a \$47.5 billion semiconductor investment fund, the largest in the country's history, to accelerate domestic chip production and reduce dependence on foreign technology.

January 27, 2025

The DeepSeek Shock

DeepSeek releases R1, a reasoning model competitive with OpenAI's o1. NVIDIA loses \$589 billion in market capitalization in a single day. The event forces a global reassessment of China's AI capabilities.

January 8, 2026

Zhipu AI IPOs on HKEX

Zhipu AI lists on the Hong Kong Stock Exchange (2513.HK), raising \$558 million at a \$6.5 billion valuation. The offering is oversubscribed 1,159 times.

January 2026

Trump Reverses Course on Chip Exports

The Trump administration partially eases Biden-era chip export controls, allowing NVIDIA H200 GPU sales to China under strict conditions including case-by-case review and a 25% surcharge. The policy shift comes weeks before GLM-5 demonstrates that China may no longer need them.

February 11-12, 2026

GLM-5 Released

Zhipu AI releases GLM-5, a 744B parameter model trained entirely on Huawei hardware, under MIT license. The model performs within single digits of GPT-5.2 and Claude Opus 4.5 on major benchmarks. Zhipu's stock surges 28.7%.

The sanctions question

GLM-5 arrives at a moment when U.S. chip export controls are under more scrutiny than at any point since they were enacted in October 2022. Through successive rounds of restrictions, banning A100/H100 exports, closing the H800 loophole in October 2023, targeting HBM in December 2024, the U.S. sought to keep China a generation behind. Then, on January 14, 2026, the Trump administration partially reversed course, allowing NVIDIA H200 sales to China under strict conditions: case-by-case BIS review, volume caps, mandatory third-party testing, and a 25% surcharge.

GLM-5's release three weeks later provided ammunition for both sides. Hawks argue that sanctions forced China to build domestic alternatives, and that GLM-5 proves those alternatives now work, accelerating the very outcome sanctions were meant to prevent. Doves argue that if China can train frontier models on domestic hardware regardless, the primary effect has been to shrink NVIDIA's China revenue, which fell to roughly **$17 billion** in FY2025, while providing minimal strategic benefit.

The data tells its own story. Huawei captured 35-40% of China's AI chip market by late 2025, up from near zero in 2022. China mandated 50% domestic chips in public-sector data centers. Big Fund III is pouring **$47.5 billion** into the domestic supply chain. Alibaba's Qwen model family overtook Meta's Llama as the most-downloaded on HuggingFace, per Stanford's HAI AI Index. And the next-generation Ascend 910C, delivering roughly 800 TFLOPS, about 80% of the H100, is already in production. China's domestic AI chip ecosystem has crossed a threshold of self-sufficiency it is unlikely to retreat from.

The weaknesses nobody is ignoring

GLM-5 is not an unqualified success. Its inference throughput, while competitive at a median of around 61 tokens per second per Artificial Analysis, falls behind the fastest proprietary deployments, and the gap widens on Ascend hardware specifically. On mathematics, while scoring 92.7% on AIME 2026 I, it manages only 88.7% on AIME 2025 I compared to GPT-5.2's perfect 100%. On Terminal-Bench, a benchmark for autonomous command-line task completion, GLM-5 reportedly underperforms against Claude and GPT-5.2, a meaningful gap in an era where AI agents are the primary commercial focus.

Some of GLM-5's self-reported scores, particularly on HLE with tools, have not yet been independently verified. And even if Ascend chips can train frontier models, SMIC's 30-50% yield rate at 7nm DUV means Zhipu likely needed 200,000-330,000 total dies for 100,000 working chips, an inefficiency that translates into higher costs and slower scaling.

What this means for the AI industry

Open-source approaches parity. GLM-5 under MIT license delivers performance within striking distance of GPT-5.2 and Claude Opus 4.5 at significantly lower API pricing. Combined with DeepSeek R2, Llama 4, and Qwen, the open-weight ecosystem is collectively approaching the frontier, putting real pressure on proprietary providers to justify their premium.

NVIDIA is no longer the only path. GLM-5 proves that frontier models can be trained without American hardware. Any country or organization seeking AI capabilities now has a proof-of-concept using Huawei's Ascend ecosystem. The chips are imperfect, but they work for the most demanding workloads in AI.

China is playing the open-source long game. By releasing GLM-5 under MIT, even more permissive than Meta's Llama license, which restricts companies above 700 million MAU, Zhipu AI builds global developer dependence on Chinese-originated technology. The strategic logic is deliberate, and for Western enterprises working through compliance around Huawei Entity List restrictions and Chinese data governance, it creates an uncomfortable calculation between capability and geopolitics.

The Bottom Line

Three years after the United States imposed semiconductor export controls designed to keep China at least a generation behind in AI capability, a Beijing company trained a frontier model on 100,000 domestically manufactured chips and gave it away for free.

GLM-5 is not the best model in the world on every benchmark. It is slower than its competitors, its math performance trails GPT-5.2, and some of its claimed scores await independent verification. But these caveats miss the point.

The point is that it exists at all.

A model that competes with GPT-5.2 on software engineering benchmarks. Trained entirely on Huawei chips manufactured by SMIC at a 7-nanometer process node that was supposed to be beyond China's reach. Running on a domestic software stack with zero dependency on any American technology. Released under the most permissive open-source license available, for anyone to use.

DeepSeek showed that Chinese labs could build frontier models efficiently. GLM-5 shows they can do it without NVIDIA. That is a fundamentally different statement about the state of global AI competition, and about the limits of technology denial as a geopolitical strategy.

Whether that matters more to the engineers downloading the model weights tonight or to the policymakers who will have to reconsider their assumptions about semiconductor use, the answer is the same: the world changed a little on February 11, 2026, and it is not changing back.

Sources

Zhipu AI Official Website (Company information, model details)
GLM-5 Model Documentation, Zhipu AI (Model specifications, benchmarks, architecture)
Zhipu AI IPO Prospectus, HKEX (2513.HK) (IPO details, financials, investor information)
CNBC: Chinese AI startup Zhipu surges on Hong Kong debut (Jan 8, 2026)
Bloomberg: US Warns That Using Huawei AI Chip Anywhere Breaks Its Rules (May 2025)
SMIC Investor Relations (Manufacturing capabilities, financial data)
Stanford HAI AI Index 2025 (Qwen/Llama download data, China AI ecosystem analysis)
NVIDIA FY2025 10-K Filing (China revenue impact, export control disclosures)
U.S. Bureau of Industry and Security, Semiconductor Export Controls (Sanctions timeline, policy details)
DeepSeek R1 Technical Report (DeepSeek architecture reference, comparison data)
Artificial Analysis LLM Leaderboard (Composite benchmark rankings, inference speed data)
GLM-5 on HuggingFace (zai-org) (Model card, official benchmark table)
Digitimes: Huawei matches NVIDIA in China AI chip market (Jan 2026)

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths

Recommended Reading

Curated articles related to this topic

LLM FundamentalsIntermediate

9 min

Open Source vs Closed LLMs: Choosing the Right Model in 2026

The architectural decision between open source and closed Large Language Models in 2026 depends on specific deployment needs rather than a binary quality gap. DeepSeek V3 and DeepSeek R1 proved that open weights can match proprietary systems like OpenAI o1 and GPT-4o on MMLU and MATH-500 benchmarks through efficient Multi-Head Latent Attention and Group Relative Policy Optimization. While open models like Alibaba Qwen 3 offer flexible Apache 2.0 licensing and hybrid thinking modes, closed ecosystems like Gemini 3 Pro and Claude Sonnet 4.5 maintain advantages in production coding and complex instruction following. Developers must weigh the capital efficiency of FP8 mixed-precision training and self-hosting against the operational simplicity of managed APIs. Data scientists can use this framework to select the correct model architecture by analyzing reasoning capabilities, total cost of ownership, and specific performance metrics like AIME scores.

OpenAI Built a Model That Uses a Computer Better Than You Do. It Needed To.

OpenAI's GPT-5.4 release marks a pivotal moment in artificial intelligence capabilities, becoming the first general-purpose large language model to surpass human performance on the OSWorld computer use benchmark with a score of 75%. This significant leap from the 47.3% score of GPT-5.2 demonstrates native computer operation capabilities, allowing the model to process screenshots, issue mouse commands, and handle complex multi-application workflows without external agent frameworks. The release includes three distinct variants: standard GPT-5.4 for general tasks, GPT-5.4 Thinking for extended reasoning using chain-of-thought methodologies, and GPT-5.4 Pro for high-stakes professional analysis. GPT-5.4 Pro achieves 94.4% on GPQA Diamond and 38.0% on FrontierMath Tier 4, benchmarks designed to challenge professional mathematicians. Readers will understand the specific performance metrics of GPT-5.4 across OSWorld and WebArena-Verified benchmarks, the pricing structures for new million-token context windows, and the strategic implications of OpenAI reclaiming market share following the Pentagon contract controversy.

GPT-5.3 Codex: OpenAI Just Released an AI That Helped Build Itself

GPT-5.3 Codex represents OpenAI's most significant advancement in agentic coding, defined by its ability to debug its own training data and manage deployment processes. This model achieves a record-breaking 77.3% on Terminal-Bench 2.0 and 64.7% on OSWorld, surpassing Claude Opus 4.6 by nearly 12 percentage points in agentic tasks. GPT-5.3 Codex runs on NVIDIA GB200 NVL72 systems, offering 25% faster inference speeds while consuming half the tokens of GPT-5.2 Codex. The architecture integrates reasoning capabilities directly with code generation, positioning the tool as a work-on-a-computer agent rather than a simple code completion assistant. Security researchers have classified GPT-5.3 Codex as the first high-risk cybersecurity model due to these autonomous capabilities. Developers and data scientists can now deploy GPT-5.3 Codex through the CLI, IDE extensions, or web interface to automate complex, multi-step software engineering workflows.

Chinese AI Labs Caught Stealing Claude's Intelligence Through 24,000 Fake Accounts

Chinese AI laboratories DeepSeek, Moonshot AI, and MiniMax executed an industrial-scale operation utilizing 24,000 fraudulent accounts to extract proprietary intelligence from Anthropic's Claude models. This unauthorized campaign involved over 16 million exchanges aimed at model distillation, a process where outputs from a powerful model like Claude train smaller, cheaper models. MiniMax accounted for 81 percent of the stolen interactions, focusing on agentic coding and orchestration, while Moonshot AI targeted computer vision and data analysis. The operation employed sophisticated hydra cluster architectures—proxy networks managing over 20,000 accounts simultaneously—to disguise distillation traffic as ordinary usage. DeepSeek focused on censorship-safe alternatives and reward model training despite having lower volume. Understanding these adversarial techniques reveals how companies must secure LLM APIs against systematic extraction attacks and fraudulent account proliferation.

Claude Opus 4.6: Anthropic Just Dropped Its Most Intelligent Model and Wall Street Is Paying Attention

Claude Opus 4.6 represents Anthropic's significant leap in artificial intelligence, introducing a one-million token context window and agent teams for parallel processing. The model outperforms GPT-5.2 on major benchmarks, including GDPval-AA for economic analysis and Terminal-Bench 2.0 for coding tasks. Developers can access Claude Opus 4.6 via the API model ID claude-opus-4-6, Amazon Bedrock, Google Cloud Vertex AI, and Snowflake Cortex AI. A key innovation is the agent teams architecture, which allows multiple AI instances to collaborate simultaneously on complex workflows like codebase reviews and large refactors, distinct from single-threaded agents. The upgrade includes adaptive thinking modes with four effort levels and auto-compaction for context management. By leveraging these advancements, software engineers and data scientists can automate enterprise-grade knowledge work and deploy multi-agent systems that handle distinct modules of a project concurrently.

NVIDIA Just Shipped the Most Powerful AI Chip Ever Made

NVIDIA's Vera Rubin AI chip platform marks a generational shift in semiconductor architecture, surpassing the Blackwell B200 with 50 PFLOPS of FP4 inference performance and 288GB of HBM4 memory. The Vera Rubin architecture utilizes a six-chip platform design featuring the Rubin GPU built on TSMC's 3nm process, the 88-core Vera Arm CPU, and NVLink 6 interconnects delivering 3.6 TB/s bandwidth per GPU. This hardware configuration enables 3.6 exaflops of compute in a single NVL72 rack, specifically optimizing large-scale AI training and inference workloads. The platform introduces native FP8 precision support in the CPU and doubles the NVLink bandwidth compared to previous generations, addressing the bottleneck of data movement in massive language models. Technical professionals can leverage these specifications to plan future AI infrastructure upgrades, anticipating the shift from memory-constrained HBM3e systems to high-bandwidth HBM4 architectures for trillion-parameter model deployment.

Audio

Feb 26, 2026

LLM FundamentalsAdvanced

6 min

Long Context Models: Working with 1M+ Token Windows

Long context models like Llama 4 Scout and Gemini 2.5 Pro represent a fundamental shift in AI capability by processing sequence lengths exceeding 1 million tokens. The transition from standard 512-token limits to massive context windows requires overcoming the quadratic attention bottleneck, where doubling input length quadruples computational cost. While architectures like Mixture-of-Experts and techniques such as interleaved Rotary Position Embeddings enable massive input ingestion, benchmarks like RULER demonstrate that retrieval accuracy often degrades before reaching advertised limits. Effectively deploying systems built on GPT-4.1 or DeepSeek V3 necessitates understanding the distinction between maximum input capacity and effective reasoning depth. Flash Attention serves as a critical optimization, preventing the materialization of terabyte-sized attention matrices. Machine learning engineers can evaluate model performance on extended sequences and select the correct architecture for production systems requiring deep retrieval over massive datasets.

Google Just Dropped Gemini 3.1 Pro and the AI Race Just Got a Lot More Interesting

Google's unannounced release of Gemini 3.1 Pro on Vertex AI redefines expectations for agentic model performance by directly addressing the hallucination and consistency issues found in Gemini 3 Pro. The Gemini 3.1 Pro update delivers substantial improvements in multi-step tool execution, reasoning coherence, and instruction adherence, positioning the model as a superior alternative to Claude Opus 4.6 and GPT-5.3 Codex for technical tasks. Early community benchmarks highlight the ability of Gemini 3.1 Pro to handle complex generation tasks, such as creating a functional Windows 11-style web operating system or a 3D browser game in a single prompt. The release signifies a strategic shift toward API-first deployment, prioritizing developer utility over press events. Data scientists and AI engineers can leverage the new model ID gemini-3.1-pro to deploy high-fidelity agentic workflows that require minimal iterative debugging compared to previous Google model iterations.

Feb 19, 2026

News

10 min

Humanity's Last Exam: The Test That's Humbling the World's Smartest AI

Humanity's Last Exam (HLE) represents the absolute frontier of artificial intelligence benchmarking, designed by the Center for AI Safety and Scale AI to replace saturated tests like MMLU. While traditional benchmarks saw models like Claude and GPT-4 scoring above 90%, HLE utilizes 2,500 questions from 1,000 subject-matter experts across 50 countries to challenge AI systems with graduate-level problems. As of February 2026, even Google's Gemini 3 Pro Preview achieves only a 37.52% success rate, highlighting the significant gap between current large language models and true expert-level comprehension. The exam spans over 100 academic subjects, heavily weighted towards Mathematics, Biology, and Computer Science, requiring deep reasoning rather than simple pattern matching. Dan Hendrycks initiated the project after Elon Musk criticized existing evaluations as merely undergraduate level. By analyzing HLE performance metrics, data scientists and AI researchers gain a realistic assessment of model capabilities beyond marketing hype, understanding specifically where artificial general intelligence falls short against specialized human expertise.

Audio

Feb 3, 2026

LLM FundamentalsIntermediate

10 min

How Large Language Models Actually Work

Large Language Models operate as sophisticated statistical engines built on the core principle of next-token prediction, transforming raw text into numerical probabilities rather than possessing genuine cognition. Neural networks like GPT-4 and Llama utilize Byte-Pair Encoding (BPE) to tokenize inputs, mapping these tokens to high-dimensional vector embeddings where semantic relationships exist as geometric distances. Modern architectures replace sequential processing with the Transformer model, leveraging mechanisms like Rotary Position Embeddings (RoPE) to maintain context over millions of tokens. The self-attention mechanism allows these models to process entire sequences simultaneously, weighing the relevance of every word against every other word to generate coherent outputs. By understanding the flow from tokenization through Transformer layers to probability distributions, data scientists can better optimize prompts, debug model hallucinations, and architect more efficient NLP applications.

Audio

Feb 9, 2026