Skip to content

Let's Data ScienceLearn • Practice • Excel

News
Blog
Pricing

© 2026 Let's Data Science

Pricing Terms Privacy

NewsResearcher Runs Qwen 397B Locally Using Flash

Researchmoeqwen 3.5model quantizationflash storage

Researcher Runs Qwen 397B Locally Using Flash

simonwillison.net

|March 19, 2026

7.9

Relevance Score

On March 18, 2026, Dan Woods demonstrated running a custom Qwen3.5-397B-A17B MoE model at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max using techniques from Apple’s 2023 "LLM in a flash" paper. The 209GB model (120GB quantized) streams 2-bit quantized experts from SSD while keeping 5.5GB of non-expert state in RAM; evaluation quality details remain thin.

Scoring Rationale

Practical, high-impact demonstration with reusable code, limited by single-source reporting and thin, unverified quality evaluations.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths

Sources

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally
simonwillison.net
Read Original

Share

More AI & Data Science News

DRDO Director Warns 25 Percent May Be Jobless

DRDO Director Warns 25 Percent May Be Jobless

Andhra Pradesh Launches Talent AP To Upskill Workforce

Andhra Pradesh Launches Talent AP To Upskill Workforce

Physics Forums Admits Using AI to Post Replies

Wall Street Banks Buy Grok Subscriptions

Wall Street Banks Buy Grok Subscriptions

Back to News Feed