North Mini Code
Cohere's open-source 30B/3B MoE coding model with 256K context, interleaved thinking, and strong SWE-Bench scores — all under Apache 2.0.
Overview
North Mini Code 1.0 is Cohere's first open-weights model built specifically for coding, released on June 9, 2026. It's a sparse Mixture-of-Experts (MoE) architecture with 30 billion total parameters but only 3 billion active per token — 128 experts with 8 activated at inference time. That MoE design is the core trick: you get performance that punches well above the 3B active-parameter weight class while keeping inference costs and hardware requirements dramatically lower than a dense 30B model.
The model is purpose-built for agentic software engineering workflows. It supports interleaved thinking — reasoning steps interspersed with tool calls — so it can plan a multi-file code change, execute terminal commands, inspect results, and iterate. On SWE-Bench Verified it scores 67.6% (resolved) and on SWE-Bench Pro it hits 40.2%, putting it in competitive range with models several times its active parameter count. The 256K token context window is large enough to hold substantial codebases in a single pass, and it can generate up to 64K tokens of output.
The Apache 2.0 license makes this genuinely interesting for teams that want to self-host a capable coding agent. You can run it via vLLM, SGLang, Ollama, or Docker, and 26 quantized variants are available for running on consumer hardware. The tradeoff is clear: this is a code-specialist model, not a general-purpose assistant. It won't write your marketing copy or summarize your meeting notes — it's laser-focused on code generation, terminal tasks, and agentic SWE workflows. For that specific use case, it's one of the strongest open-source options available.
Key features
Agentic Coding
Designed for autonomous software engineering workflows. Handles multi-step coding tasks — editing files, running commands, inspecting results, and iterating — using a SWE-Agent-style harness with tool-calling capabilities.
256K Context Window
Supports up to 256K tokens of context with 64K max output, allowing ingestion of large codebases, long file chains, and extended multi-turn agent sessions without truncation.
Interleaved Thinking
Generates explicit reasoning content alongside tool calls. The model thinks through its approach step-by-step before and between actions, improving reliability on complex multi-step tasks.
Tool Use
Native support for function calling via JSON schema. Works with bash, file editing, and custom tools — designed to operate within agentic frameworks like SWE-Agent and ReAct harnesses.
Pricing
Free tier: Completely free and open-source under Apache 2.0. You pay only for your own compute or LLM API hosting costs.
| Plan | Price | What's included |
|---|---|---|
| Open Source | Free | Full model weights under Apache 2.0 — self-host via vLLM, SGLang, Ollama, or Docker. 26 quantized variants available. |
Full model weights under Apache 2.0 — self-host via vLLM, SGLang, Ollama, or Docker. 26 quantized variants available.
Pros & cons
Pros
- ✓Strong SWE-Bench scores (67.6% Verified, 40.2% Pro) from only 3B active parameters — exceptional efficiency
- ✓Apache 2.0 license with 26 quantized variants makes self-hosting on consumer hardware viable
- ✓256K context window with 64K output handles large codebases in a single pass
- ✓Interleaved thinking with native tool use is purpose-built for agentic coding workflows
Cons
- ×Code-specialist only — not a general-purpose model, so don't expect strong performance on non-coding tasks
- ×Requires your own GPU infrastructure or API hosting — no managed cloud endpoint from Cohere yet
- ×Relatively new (June 2026) with limited third-party evaluations beyond the team's own benchmarks
- ×MoE architecture needs compatible serving infrastructure (vLLM, SGLang) — not a simple drop-in for all frameworks


