North Mini Code preview
North Mini Code logo
Coding Free (open-source, Apache 2.0)

North Mini Code

Cohere's open-source 30B/3B MoE coding model with 256K context, interleaved thinking, and strong SWE-Bench scores — all under Apache 2.0.

8.2
AI Score / 10
Visit North Mini Code

Overview

North Mini Code 1.0 is Cohere's first open-weights model built specifically for coding, released on June 9, 2026. It's a sparse Mixture-of-Experts (MoE) architecture with 30 billion total parameters but only 3 billion active per token — 128 experts with 8 activated at inference time. That MoE design is the core trick: you get performance that punches well above the 3B active-parameter weight class while keeping inference costs and hardware requirements dramatically lower than a dense 30B model.

The model is purpose-built for agentic software engineering workflows. It supports interleaved thinking — reasoning steps interspersed with tool calls — so it can plan a multi-file code change, execute terminal commands, inspect results, and iterate. On SWE-Bench Verified it scores 67.6% (resolved) and on SWE-Bench Pro it hits 40.2%, putting it in competitive range with models several times its active parameter count. The 256K token context window is large enough to hold substantial codebases in a single pass, and it can generate up to 64K tokens of output.

The Apache 2.0 license makes this genuinely interesting for teams that want to self-host a capable coding agent. You can run it via vLLM, SGLang, Ollama, or Docker, and 26 quantized variants are available for running on consumer hardware. The tradeoff is clear: this is a code-specialist model, not a general-purpose assistant. It won't write your marketing copy or summarize your meeting notes — it's laser-focused on code generation, terminal tasks, and agentic SWE workflows. For that specific use case, it's one of the strongest open-source options available.

Key features

Agentic Coding

Designed for autonomous software engineering workflows. Handles multi-step coding tasks — editing files, running commands, inspecting results, and iterating — using a SWE-Agent-style harness with tool-calling capabilities.

256K Context Window

Supports up to 256K tokens of context with 64K max output, allowing ingestion of large codebases, long file chains, and extended multi-turn agent sessions without truncation.

Interleaved Thinking

Generates explicit reasoning content alongside tool calls. The model thinks through its approach step-by-step before and between actions, improving reliability on complex multi-step tasks.

Tool Use

Native support for function calling via JSON schema. Works with bash, file editing, and custom tools — designed to operate within agentic frameworks like SWE-Agent and ReAct harnesses.

Pricing

Free tier: Completely free and open-source under Apache 2.0. You pay only for your own compute or LLM API hosting costs.

Open Source Free

Full model weights under Apache 2.0 — self-host via vLLM, SGLang, Ollama, or Docker. 26 quantized variants available.

Pros & cons

Pros

  • Strong SWE-Bench scores (67.6% Verified, 40.2% Pro) from only 3B active parameters — exceptional efficiency
  • Apache 2.0 license with 26 quantized variants makes self-hosting on consumer hardware viable
  • 256K context window with 64K output handles large codebases in a single pass
  • Interleaved thinking with native tool use is purpose-built for agentic coding workflows

Cons

  • ×Code-specialist only — not a general-purpose model, so don't expect strong performance on non-coding tasks
  • ×Requires your own GPU infrastructure or API hosting — no managed cloud endpoint from Cohere yet
  • ×Relatively new (June 2026) with limited third-party evaluations beyond the team's own benchmarks
  • ×MoE architecture needs compatible serving infrastructure (vLLM, SGLang) — not a simple drop-in for all frameworks

How it compares

← More Coding tools