Researcher Runs Qwen 397B Locally Using Flash
On March 18, 2026, Dan Woods demonstrated running a custom Qwen3.5-397B-A17B MoE model at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max using techniques from Apple’s 2023 "LLM in a flash" paper. The 209GB model (120GB quantized) streams 2-bit quantized experts from SSD while keeping 5.5GB of non-expert state in RAM; evaluation quality details remain thin.
Scoring Rationale
Practical, high-impact demonstration with reusable code, limited by single-source reporting and thin, unverified quality evaluations.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalAutoresearching Apple's "LLM in a Flash" to run Qwen 397B locallysimonwillison.net


