🐱 News

Meituan LongCat-2.0: 1.6T Open Model Tops Coding

Meituan open-sourced LongCat-2.0, a 1.6-trillion-parameter MoE coding model trained on Chinese chips. What the specs and benchmarks actually show.

The AI Dude · July 5, 2026 · 7 min read

The interesting part of the LongCat-2.0 launch isn't the parameter count, though 1.6 trillion is a big number. It's the sentence buried in the announcement: trained entirely on Chinese chips. Meituan — yes, the food-delivery giant — open-sourced a near-frontier agentic coding model in early July 2026, put it under a permissive license, and, per VentureBeat's reporting, watched it climb to the top of OpenRouter's usage charts within days. If that holds up, it's the clearest signal yet that the US export-control moat around frontier training is narrower than it looked six months ago.

Let me separate what's confirmed from what's hype, because a launch like this generates a lot of both.

What LongCat-2.0 actually is

According to VentureBeat's coverage and the official LongCat release page, LongCat-2.0 is a Mixture-of-Experts (MoE) model with roughly 1.6 trillion total parameters. Like every large MoE, only a fraction of those parameters activate on any given token — that's the whole point of the architecture, and it's why a 1.6T model can serve inference at a cost closer to a much smaller dense model. The headline number is the capacity ceiling, not the compute-per-token bill.

The pitch is squarely aimed at agentic coding: multi-step tasks where the model reads a repo, plans, edits files, runs tools, and iterates — not single-shot autocomplete. Meituan positions it against the models that own that workflow today, and the two specs it leads with are the ones that matter most for agents:

~1 million token context window — enough to hold a large codebase, dependency tree, and a long tool-call history in a single session without aggressive retrieval juggling.
Strong SWE-bench performance — the benchmark that tries to measure whether a model can resolve real GitHub issues end to end, which is the closest public proxy we have for "can this thing actually do agentic software work."

My read: the context window is the underrated spec here. A million tokens changes how you architect an agent — you can lean less on brittle retrieval and let the model keep more state in-window. That's a usability win that doesn't show up on a leaderboard.

The benchmark claim — and the honest caveat

Meituan's framing is that LongCat-2.0 is "near-frontier" on coding. The evidence being circulated is its SWE-bench results plus its OpenRouter ranking, where usage is a real-world signal — developers routing production traffic to a model tells you more than a marketing chart does.

But here's the caveat I'd hold onto: self-reported benchmark numbers on launch day are a starting point, not a verdict. We've watched a string of Chinese open-weight releases this year — GLM-5.2 from Z.ai, DeepSeek's V4 line, and Mistral's open mid-size models on the Western side — all claim leadership on some coding metric at launch. Some hold up under independent testing; some quietly slide once third parties run the harness themselves. Until Artificial Analysis, the SWE-bench maintainers, or a credible independent reviewer publishes numbers, treat "tops coding benchmarks" as Meituan's claim, not established fact.

The OpenRouter ranking is the number I'd watch. Benchmarks can be gamed or cherry-picked; sustained paid-inference volume is much harder to fake.

Trained on Chinese chips — why that's the real headline

The specs are impressive. The training story is strategic. Meituan says LongCat-2.0 was trained end to end on domestic Chinese accelerators — no NVIDIA H100s or GB200s in the loop. If accurate, that matters far beyond one model.

Since late 2022, the US export-control regime has been built on a simple thesis: restrict access to the highest-end NVIDIA silicon and you slow China's frontier training by years. We've covered how those controls work in The Fable 5 Export Ban. A 1.6T model trained without that silicon is a direct, public counterexample to the thesis. It doesn't prove the gap has closed — training a big MoE is not the same as matching frontier performance at frontier efficiency — but it moves the argument from "can they even do it" to "how close can they get, and at what cost."

The honest take: the meaningful metric isn't whether a Chinese-chip model can exist. It's the compute efficiency — how many chip-hours and how much power it took relative to a comparable NVIDIA run. Meituan hasn't published that, and until someone does, the "we don't need NVIDIA" narrative is running ahead of the disclosed data. That's the gap to watch.

The license: this is the part enterprises will actually care about

LongCat-2.0 ships as an open-weight release under a permissive license (reported as MIT-style in early coverage). If those terms hold, the practical consequences are large:

Self-hosting with no per-token vendor bill. You can run it on your own infra, which is exactly what regulated industries and cost-sensitive teams want.
No data leaving your perimeter. For agentic coding on proprietary repos, this is the whole ballgame — many enterprises won't send source to a third-party API at all.
Fork-and-fine-tune freedom. A permissive license means you can adapt weights to your stack without legal friction.

Contrast that with the closed frontier coding models. The most capable agentic coders — Claude's Sonnet line, OpenAI's GPT-5.x, Gemini 3.5 — are API-only. You rent them, you don't own them, and your code round-trips through someone else's servers. LongCat-2.0's bet is that "good enough, open, and self-hostable" beats "best, closed, and metered" for a large slice of the market. For a lot of enterprise buyers, that bet is correct.

How it stacks up on paper

Dimension	LongCat-2.0	Closed frontier coders (Claude / GPT-5.x / Gemini)
Weights	Open, permissive license	Closed, API-only
Context	~1M tokens (reported)	Up to ~1M (varies by model)
Deployment	Self-host or hosted	Vendor API only
Training hardware	Chinese domestic chips	NVIDIA (H100/GB200 class)
Cost model	Your infra / low hosted rates	Per-token, premium
Benchmark status	Strong self-reported; awaiting independent verification	Independently benchmarked over time

This table is a snapshot from public announcements, not a head-to-head test. The one cell that decides everything is the last one, and it's the one we can't yet fill in with confidence for LongCat-2.0.

Why a delivery company built a frontier model

It reads as strange only if you think of Meituan as "the app that brings me lunch." Meituan runs one of the largest real-time logistics operations on earth — routing, demand forecasting, dispatch across hundreds of millions of orders. That's an enormous applied-AI problem, and companies at that scale have deep ML benches and serious compute already provisioned. Building a large model is a smaller leap from there than it looks.

The strategic logic tracks with what we've seen across Chinese tech: Alibaba (Qwen), ByteDance, DeepSeek, and Moonshot have all pushed capable open-weight models. Open-sourcing is a distribution and mindshare play. You may not monetize the weights directly, but you seed an ecosystem, attract talent, and plant your model in the workflows of developers worldwide — including many who'll never touch a Chinese-hosted API but will happily run open weights on their own hardware.

What I'd verify before betting on it

If you're evaluating LongCat-2.0 for real work, here's the checklist I'd run rather than trust the launch post:

Independent SWE-bench numbers. Wait for Artificial Analysis or a maintainer-run harness, not the self-report.
Real serving cost. A 1.6T MoE needs substantial VRAM even at low active-parameter counts. Self-hosting isn't free just because the license is — price the GPUs.
License fine print. Confirm the exact terms and any acceptable-use or field-of-use restrictions before you build on it commercially.
Tooling and ecosystem fit. Does it plug into your agent framework — Cursor, Cline, Aider, your own harness — cleanly, or is integration a research project?
Latency under agentic load. Multi-step agents make dozens of calls per task; per-call latency compounds fast.

The bottom line

LongCat-2.0 is a genuinely notable release, and the reason is geopolitical as much as technical. A 1.6T open-weight agentic coder, permissively licensed, trained without NVIDIA hardware, gaining real OpenRouter traffic — that's a lot of firsts stacked into one launch. If the independent benchmarks confirm the coding claims, this becomes one of the most consequential open-model drops of 2026 and a live data point in the export-control debate.

What we don't yet have: verified third-party benchmarks, disclosed training efficiency, and a track record under production load. Those are exactly the things that separate a viral launch from a durable tool. My honest bet is that LongCat-2.0 is real and good, lands a notch below the closed frontier on the hardest agentic tasks, and wins anyway wherever "open and self-hostable" outranks "absolute best." For a large and growing part of the market, that's most of the time. I'll update this if the independent numbers say otherwise.

LongCat-2.0Meituan AIChinese open source LLMagentic codingMoE models

Share 𝕏 / Twitter Reddit LinkedIn

← Back to blog