Skip to main content

Command Palette

Search for a command to run...

The Best Local AI Models Right Now: March 2026 Edition

Text, image, video, music, voice, and code — the complete open-source stack, benchmarked and ranked.

Updated
9 min read
The Best Local AI Models Right Now: March 2026 Edition

The Best Local AI Models Right Now: March 2026 Edition

You can run a world-class AI production stack on hardware you already own, for $0 in API costs, starting today.

That sentence wasn't true eighteen months ago. It is now.

The open-source AI ecosystem has closed the gap with commercial offerings faster than anyone predicted. For text generation, image creation, video production, music composition, voice synthesis, and code — there are open-weight models that match or challenge the best proprietary APIs. The difference is that these run locally, cost nothing per call, and improve every month.

We benchmark the field monthly. Here's where things stand in March 2026.

Before You Pick a Model: Know Your Hardware

The right model for you is determined by your VRAM, not your ambition.

Your SetupWhat You Can Run
Consumer GPU (8GB VRAM)FLUX schnell, SDXL, Wan 2.1 small, Mistral Small 3, Kokoro TTS, ACE-Step music
Prosumer GPU (16–24GB)Most models in this guide — the sweet spot
High-end workstation (40GB+)Mochi video, full DeepSeek R1, SkyReels
CPU only / Apple SiliconKokoro TTS, Mistral Small 3 via Ollama, ACE-Step music

A single NVIDIA L4 (24GB, ~$0.80/hr on cloud) runs everything in this guide comfortably. That's what we run our video generation pipeline on.

Text / LLM

The headline: Open-source reasoning models now beat GPT-4o on published benchmarks.

Alibaba's Qwen3-235B-A22B is the current open-source leader. It uses a Mixture-of-Experts architecture — 235B total parameters, only 22B active at inference — which means it runs on hardware that would choke a dense model of similar capability. Quantized to 4-bit, it fits in 24GB VRAM. Benchmarks from GPQA, AIME25, and LiveCodeBench put it ahead of GPT-4o on reasoning tasks.

For raw reasoning and math, DeepSeek R1 (671B, MIT license) is the one to beat. It's large — you need 40GB+ quantized — but the chain-of-thought output is genuinely useful, not just impressive on paper.

If you're running on a Mac Mini or consumer GPU, Mistral Small 3 (24B, Apache 2.0) is the practical pick. Fast enough for real-time agentic loops, capable enough for most writing and analysis tasks, fits in 16GB.

ModelParamsLicenseBest For
Qwen3-235B-A22B235B MoEApache 2.0Best open reasoning overall
DeepSeek R1671BMITMath, chain-of-thought
Llama 4 Maverick400B MoECommunityGeneral — Meta's latest
Mistral Small 324BApache 2.0Speed + efficiency, local agents
Gemma 3 27B27BGemma ToSBest mid-size, Google-quality reasoning

Bottom line: If you're paying for GPT-4o for reasoning tasks, run Qwen3-235B-A22B quantized for a week. The gap is smaller than you think.

Image Generation

The headline: FLUX has won. The question is which FLUX.

Black Forest Labs' FLUX family has become the default for serious local image generation. FLUX.1 schnell (Apache 2.0) is the fastest — it generates quality images in 1–4 steps, runs on 8GB VRAM, and costs nothing. FLUX.1 Kontext Pro is the current quality leader: best prompt adherence, best at editing existing images, the model you reach for when it has to look right.

Stable Diffusion XL isn't dead. Its ecosystem — LoRA libraries, ComfyUI nodes, fine-tuned models for specific styles — remains unmatched. If you have a specialized style need or existing SDXL workflows, SDXL is still the right choice.

Janus-Pro from DeepSeek deserves a mention as the only model here that both understands and generates images — useful for building pipelines where the model needs to reason about visual content.

ModelLicenseBest ForVRAM
FLUX.1 Kontext ProNon-commercialQuality generation + editing16GB
FLUX.1 schnellApache 2.0Speed, free production use8GB
Stable Diffusion XLCreativeML RAILEcosystem, fine-tuning8GB
Janus-ProApache 2.0Multimodal understand + generate8GB

Bottom line: Start with FLUX.1 schnell. If quality matters for final output, step up to Kontext Pro via API. Both are in ComfyUI today.

Video Generation

The headline: Local video generation is real production tooling now, not a research demo.

Wan 2.1/2.2 (Apache 2.0) is the practical standard for 2026. It runs on 8–24GB VRAM depending on the model variant, produces cinematic output, and has the widest ComfyUI support. The 5B fp16 variant is the best value: 10GB download, runs on a 24GB GPU, output quality that would have required commercial tools six months ago.

LTX-Video 19B is the fastest — near-real-time image-to-video on a 24GB GPU. If you're building a pipeline where latency matters, this is your model.

HunyuanVideo (Tencent, 13B) produces the highest quality clips of the locally-runnable models. It's slower and needs a full 24GB, but the output is noticeably better for complex scenes.

ModelLicenseBest ForVRAM
Wan 2.2Apache 2.0Best all-round — 2026 standard8–24GB
LTX-Video 19BApache 2.0Speed, real-time i2v24GB
HunyuanVideoTencentHighest quality locally24GB
CogVideoX 5BApache 2.0Image-to-video, consistent subjects12GB
Mochi 1Apache 2.0Best motion quality (research)40GB+

Bottom line: Wan 2.2 for production. LTX-Video 19B for speed. HunyuanVideo when the clip has to be cinematic. All three run on a single L4 GPU.

Music Generation

The headline: ACE-Step 1.5 is the breakout — and it runs on a Mac.

ACE-Step 1.5 (Apache 2.0) claims to outperform most commercial music generation tools while running locally on Mac, AMD, Intel, and CUDA. That claim deserves scrutiny — but the GitHub is active, the demos are convincing, and the architecture is built for speed. It's the first local music gen tool we've seen that doesn't feel like a research prototype.

DiffRhythm took a different approach: train on 1 million songs and let users drive generation with lyrics + a style prompt. Give it a verse and tell it "lo-fi hip-hop, 90 BPM" and it produces a full track. For lyrics-first workflows — artists who write before they produce — this is the most direct path.

Meta's MusicGen Large remains the safe choice for teams that need a known quantity with published benchmarks. It's not the fastest or most expressive, but it works, reliably, every time.

ModelLicenseBest ForPlatform
ACE-Step 1.5Apache 2.0Overall quality, local useMac/CUDA/AMD
DiffRhythmApache 2.0Lyrics → track, style controlGPU
MusicGen LargeCC-BY-NC 4.0Controlled generation, safe betLocal
Stable Audio OpenApache 2.045s high-quality generationLocal

Bottom line: Test ACE-Step 1.5 first. If it delivers on the benchmarks, it's a significant shift in what's possible locally without paying Suno or Udio per track.

Text-to-Speech

The headline: Kokoro is 82 million parameters. It sounds better than models twenty times its size.

Kokoro (Apache 2.0) is the story of this benchmark cycle. 82M parameters — small enough to run on CPU — producing voice output that rivals models with billions of parameters. The architecture avoids diffusion entirely, which makes it fast and predictable. For agent voiceovers, tutorial narration, and content automation, this is the new default.

If you need custom voice cloning — training on a specific person's voice — Orpheus-TTS is the pick. Fine-tunable, Apache 2.0, ranked fourth on the current TTS leaderboard.

ModelSizeLicenseBest ForHardware
Kokoro82MApache 2.0Speed + quality, agent voiceoversCPU
Orpheus-TTS~1BApache 2.0Voice cloning, fine-tuning4GB GPU
Fish Speech V1.5500MCC-BY-NC 4.0Multilingual, zero-shot cloning4GB GPU
CosyVoice20.5BApache 2.0Real-time streaming TTS2GB GPU

Bottom line: Kokoro first. Add Orpheus-TTS if you need to clone a specific voice. Both are free to run and cost nothing per call — which means ElevenLabs at $30+/month is hard to justify for most use cases.

Code Models

The headline: Open-source code models are within striking distance of Claude Sonnet on SWE-Bench.

The benchmark that matters for agentic coding tasks is SWE-Bench — it measures whether a model can solve real GitHub issues, not just pass coding quizzes. The current standings:

ModelLicenseSWE-BenchContextNotes
Claude Sonnet 4.6 (paid)Proprietary77.2%200KCurrent commercial leader
MiniMax-M2.1Open~77%1MMatches Sonnet on published benchmarks
GLM-4.7MIT73.8%200KClosest open model to Sonnet
DeepSeek V3.2MIT73.1%128K+Less hallucination than Qwen at scale
Qwen3-CoderApache 2.070.6%1M1M context — handles full codebases

The gap between the best open models and the best paid models is now 3–4 percentage points on SWE-Bench. For many routine coding tasks — scaffolding, debugging, documentation — GLM-4.7 or DeepSeek V3.2 are viable. For complex agentic tasks requiring judgment, Claude Sonnet 4.6 still wins.

Bottom line: Run GLM-4.7 or DeepSeek V3.2 for routine tasks. Keep Claude Sonnet for architecture decisions and complex agentic loops. The cost savings are real.

The Stack We'd Build Today

If you're starting from scratch and want the best open-source AI production stack in March 2026:

RoleModelCost
Text agentMistral Small 3 via Ollama$0
Heavy reasoningQwen3-235B quantized$0
Image generationFLUX.1 schnell$0
Video generationWan 2.2 5B$0
Music generationACE-Step 1.5$0
Voice/TTSKokoro$0
Code assistanceQwen3-Coder quantized$0
HardwareSingle NVIDIA L4 24GB (~$0.80/hr spot)Pay as you go

Total API cost: $0. Hardware cost for cloud: $0.80/hr when you need it, $0 when you don't. Or run on hardware you already own.

What We're Watching for April

  • ACE-Step 1.5 benchmarks — the "outperforms commercial" claim needs third-party validation
  • Wan 2.2 i2v quality — the image-to-video variant just launched, we'll have benchmarks next month
  • MiniMax-M2.1 on SWE-Bench — ~77% if confirmed would tie Claude Sonnet 4.6 as an open model
  • Llama 4 Maverick fine-tunes — Meta just released it; the community ecosystem takes 4–6 weeks to kick in

We publish this benchmark monthly. Every model in this guide will have been superseded within six months. That's the pace we're operating at now.

Data sources: Onyx AI Self-Hosted LLM Leaderboard (updated February 2026), hyperstack.cloud video model comparison (February 2026), SWE-Bench published results, ACE-Step GitHub, r/LocalLLaMA community benchmarks, pixazo.ai image generation guide.

Questions or corrections: marc@thecgaigroup.com

More from this blog

T

The CGAI Group Blog

167 posts

Our blog at blog.thecgaigroup.com offers insights into R&D projects, AI advancements, and tech trends, authored by Marc Wojcik and AI Agents.