Command A+
Cohere's open-source 218B Mixture-of-Experts LLM built for agentic coding workflows, multilingual tasks, and document processing — runs on as few as 2 H100s.
Overview
Command A+ is Cohere's flagship open-weight model, a 218-billion parameter Mixture-of-Experts architecture released under Apache 2.0 in May 2026. The MoE design activates only a subset of parameters per inference pass, which is how Cohere gets a model this large running on just two H100 GPUs — a significant hardware efficiency advantage over similarly-sized dense models. It's built from the ground up for agentic workflows: multi-step tool use, code generation, retrieval-augmented generation, and structured document processing.
What sets Command A+ apart from generic LLMs in the coding category is Cohere's focus on enterprise agentic use cases. The model is designed to chain tool calls, parse complex documents with mixed modalities (text, tables, charts), and operate across 23+ languages. For teams building AI agents that need to reason over codebases, process documentation, or orchestrate multi-step workflows, this is purpose-built rather than adapted after the fact.
The open-weight angle is the real differentiator. While GPT-4 and Claude are API-only, Command A+ can be self-hosted — critical for regulated industries or sovereign AI deployments. Quantized versions on Hugging Face make it accessible on smaller GPU setups. The trade-off: Cohere's ecosystem is thinner than OpenAI's or Anthropic's, and the model's raw conversational ability doesn't match the polish of ChatGPT or Claude for general chat use.
Key features
Agentic Coding
Purpose-built for multi-step agentic workflows: tool chaining, code generation, debugging, and structured output. Benchmarks show strong performance on agentic coding tasks compared to models in its class.
MoE Architecture
218B total parameters using Mixture-of-Experts, activating only a fraction per forward pass. This enables frontier-class capability while running on as few as 2 H100 GPUs — dramatically lower hardware requirements than dense models of similar quality.
Multilingual
Supports 23+ languages with strong performance across them, making it suitable for global enterprise deployments and sovereign AI initiatives where local language support is non-negotiable.
Document Processing
Multimodal document understanding handles mixed content including text, tables, charts, and structured data. Designed for RAG pipelines and enterprise knowledge extraction workflows.
Pricing
Free tier: Free API tier with rate limits for development; open weights downloadable from Hugging Face under Apache 2.0
| Plan | Price | What's included |
|---|---|---|
| Cohere API — Free | Free | Rate-limited access for prototyping and evaluation |
| Cohere API — Production | Check website for current pricing | Higher rate limits, production SLAs, enterprise support available |
| Self-hosted | Free (Apache 2.0) | Open weights on Hugging Face. Quantized versions available. Minimum 2x H100 GPUs for full model |
Rate-limited access for prototyping and evaluation
Higher rate limits, production SLAs, enterprise support available
Open weights on Hugging Face. Quantized versions available. Minimum 2x H100 GPUs for full model
Pros & cons
Pros
- ✓Open-source Apache 2.0 license allows self-hosting, fine-tuning, and full data sovereignty
- ✓218B MoE runs on just 2 H100s — exceptional hardware efficiency for a model of this capability
- ✓Strong agentic and tool-use benchmarks make it a serious option for AI agent builders
- ✓23+ language support positions it well for global and sovereign AI deployments
Cons
- ×Cohere's developer ecosystem is much smaller than OpenAI's or Anthropic's — fewer integrations and community resources
- ×General chat and creative writing quality trails behind ChatGPT and Claude
- ×Self-hosting still requires H100-class hardware — not accessible to hobbyists or small teams without cloud GPU budgets
- ×Production API pricing is not clearly published — requires contacting sales or checking the console


