Cohere North Mini Code: Open Agentic Coding Model
Cohere's North Mini Code packs 30B params into 3B active via 128-expert MoE, targeting agentic coding with Apache 2.0 weights.
30 Billion Parameters, 3 Billion Active
Cohere released North Mini Code on June 9, 2026, and it represents a very specific bet: that agentic coding models do not need to be massive to be effective. The headline architecture is a 30-billion-parameter Mixture-of-Experts model with only 3 billion parameters active per token. That 10:1 ratio is achieved through 128 expert FFN blocks (using SwiGLU activation), with 8 experts activated per token via a sigmoid-based top-k router (per Cohere's technical blog on Hugging Face).
For context, that active parameter count puts North Mini Code in the same inference-cost bracket as models like Gemma 4 (26B) or Devstral Small 2 (24B) — but the total parameter pool gives it knowledge capacity far beyond what a 3B dense model could store. The entire model runs on a single GPU for many configurations, which matters enormously for the deployment scenarios Cohere is targeting.
The model ships under an Apache 2.0 license with weights on Hugging Face in both bf16 and fp8 formats. No usage restrictions, no registration wall, no commercial-use carveouts. This is genuinely open in the way that matters for enterprise adoption.
Architecture: Attention Worth Understanding
Beyond the MoE design, North Mini Code uses an interleaved attention pattern that's worth noting. It alternates between sliding-window self-attention (with RoPE positional embeddings) and global attention (without positional embeddings) at a 3:1 ratio. The sliding-window layers handle local context efficiently, while the global layers provide full-sequence reasoning without the cost of applying attention across the entire 128K context window at every layer.
The model was trained in two stages: supervised fine-tuning at 64K context, then extended to 128K in a second SFT stage. That second stage used 4.5 billion tokens of high-quality agentic and reasoning samples, with all code verified as executable or correct (per the Hugging Face technical post). This is followed by reinforcement learning with verifiable rewards (RLVR) using a custom algorithm called CISPO, trained across containerized environments spanning roughly 5,000 unique repositories and 70,000+ verifiable tasks.
My read: The two-stage SFT plus RLVR pipeline is becoming the standard playbook for coding models in 2026. What distinguishes North Mini Code is the scale of the RL environment — 5K repos and 70K tasks is a serious verification infrastructure, not a toy setup.
Benchmark Numbers: What the Scores Actually Say
Cohere published detailed benchmark results across several agentic coding evaluations (all numbers from the Hugging Face technical blog, evaluated at temperature=1.0, top_p=0.95 across 3 seeds):
| Benchmark | North Mini Code Score | Notes |
|---|---|---|
| SWE-Bench Verified | ~83% pass@10 | 80.2% from SFT + 3.0% from RLVR |
| Terminal-Bench v2 | ~63% pass@1 | 55.1% from SFT + 7.9% from RLVR |
| Terminal-Bench (mini-SWE-Agent) | 61.0% pass@1 | Cross-harness evaluation |
| Artificial Analysis Coding Index | 33.4 | Outperforms models up to 4x its active size |
The Artificial Analysis Coding Index score of 33.4 is the number that tells the competitive story most clearly. Per Cohere's published comparisons, North Mini Code outperforms Qwen3.5 (35B), Gemma 4 (26B), Devstral Small 2 (24B), Nemotron 3 Super (120B), Mistral Small 4 (119B), and Devstral 2 (123B) on that index. A 3B-active model outperforming 120B-class models on a coding benchmark is a strong efficiency claim.
The SWE-Bench Verified score deserves a caveat: 83% pass@10 means the model finds a working solution within 10 attempts, not on the first try. Pass@1 numbers would be lower and more representative of real-world single-shot usage. Cohere hasn't prominently featured pass@1 on SWE-Bench Verified in their materials, which is worth noting.
Built for Multi-Harness Agentic Work
The most technically interesting aspect of North Mini Code isn't any single benchmark score — it's the cross-harness robustness. Cohere trained the model across multiple agentic scaffolds simultaneously:
- SWE-Agent: Rich CLI with bash, str_replace_editor, and submit tools — the standard SWE-Bench harness.
- mini-SWE-Agent: Stripped down to a single bash tool. Forces the model to work with minimal tooling.
- OpenCode: Fine-grained typed tools including edit, grep, todowrite, and task management. Adding just 6% cross-harness data from OpenCode improved performance by 10%.
- Terminus 2: Plain-text chat format for Terminal-Bench. The model generalized to this format with less than 20% plain-text training data.
This matters because real-world agentic coding setups vary wildly. One team's agent framework looks nothing like another's. A model that only performs well within the exact harness it was trained on is fragile in production. Cohere's approach of training across multiple scaffolds and then demonstrating cross-harness transfer is a meaningful engineering contribution.
The RLVR stage also produced measurable behavioral improvements beyond raw scores: shorter trajectories (fewer wasted steps), fewer invalid tool calls, reduced repetitive looping, and more reliable solution submission. These are the practical quality-of-life improvements that determine whether a coding agent is actually usable versus just benchmark-impressive.
The Sovereign AI Angle
North Mini Code launched on June 9. Three days later, on June 12, US export controls forced Anthropic to disable Fable 5 and Mythos 5 for users in restricted jurisdictions. The timing wasn't planned, but it created a window that Cohere is clearly aware of.
Cohere has built its enterprise business around sovereign deployment — running AI models on-premises or within national cloud boundaries, outside the reach of US export controls. An Apache 2.0 model with open weights that can run on a single GPU is exactly the kind of offering that becomes more attractive when frontier closed models suddenly become unavailable in certain markets.
Per reporting from StartupFortune, Cohere has been experiencing accelerated enterprise inbound interest in the wake of the Anthropic export restrictions, with the company's leadership publicly noting the uptick. Whether this translates to sustained customer acquisition or a temporary spike depends on how long the export restrictions persist and whether Cohere can deliver enterprise-grade support at the scale these customers require.
The honest take: Export controls are creating a structural advantage for open-weight model providers. Every time a frontier closed model gets pulled from a market, the case for self-hosted open models gets stronger. Cohere's positioning here isn't opportunistic — it's the thesis they've been building toward for years. North Mini Code is just the latest proof point.
How It Compares to the Open Coding Model Field
North Mini Code enters a crowded space. Here's where it sits relative to other open-weight models targeting code:
| Model | Total Params | Active Params | Architecture | License |
|---|---|---|---|---|
| North Mini Code | 30B | 3B | 128-expert MoE | Apache 2.0 |
| Devstral Small 2 | 24B | 24B (dense) | Transformer | Mistral Research |
| Devstral 2 | 123B | ~22B | MoE | Mistral Research |
| Nemotron 3 Super | 120B | ~17B | MoE | NVIDIA Open |
| Qwen3.5 Coder | 35B | 35B (dense) | Transformer | Apache 2.0 |
North Mini Code's unique position is the combination of the smallest active parameter count with competitive benchmark performance. If Cohere's Artificial Analysis numbers hold up under independent evaluation, it's the most efficient open coding model available right now in terms of performance-per-active-parameter.
The Apache 2.0 license is also a genuine differentiator against Mistral's research licenses, which carry commercial-use restrictions. For enterprises building proprietary coding agents, license terms matter as much as benchmark scores.
What's Missing and What to Watch
A few open questions that Cohere hasn't fully addressed:
- Pass@1 on SWE-Bench Verified: The headline 83% is pass@10. Real-world agents typically get one or two shots, not ten. Independent pass@1 evaluations will tell a more honest story.
- Long-context coding performance: The model supports 128K context, but Cohere's published benchmarks focus on standard-length tasks. How it handles large codebases or long agent trajectories at the edges of that context window is unclear.
- Cohere API pricing: The model is available through Cohere's API, but specific per-token pricing for North Mini Code hasn't been prominently published yet. For teams evaluating self-hosted vs. API access, this is a key variable.
- Community adoption: Open weights are necessary but not sufficient. Whether the model gets integrated into popular agent frameworks (LangChain, CrewAI, AutoGen) and IDE tools will determine its practical reach beyond Cohere's own ecosystem.
The Bottom Line
North Mini Code is Cohere's first model built specifically for developers, and they chose an interesting entry point: not the biggest model, not the highest benchmark score, but the most deployable agentic coding model with genuinely open licensing. The 30B/3B MoE architecture means single-GPU deployment is realistic. The Apache 2.0 license means no legal friction for commercial use. The cross-harness training means it should generalize across different agent frameworks rather than only working in the exact setup it was trained on.
Whether North Mini Code becomes the default open coding model depends on independent benchmarks and community adoption over the coming weeks. But the timing — landing right as export controls remind everyone why open weights matter — gives it a tailwind that pure technical merit alone wouldn't provide. For teams that need a self-hosted, commercially licensed, agentic coding model they can actually run without a GPU cluster, this is the most compelling option available today.
Keep reading
Runway Now Inside ChatGPT: No More Tab Switching
Runway's official ChatGPT integration lets you generate and edit video mid-conversation. Here's what it changes for creators.
IREN Completes Nostrum Buy: Europe AI Cloud Push
IREN closes its Nostrum Group acquisition, adding 490MW of power capacity in Spain and entering Europe's booming AI data center market.
Mistral Eyes €3B Raise at €20B Valuation
Mistral is reportedly raising €3 billion at a €20 billion valuation, nearly doubling its worth and reshaping Europe's AI funding race.