// toolset / curated

AI coding agents for C++

Eight agents evaluated for C++ work. Premium open-source first. With AGENTS.md template, MCP servers worth wiring, and a decision tree.

The AI-coding-agent space saw a Cambrian explosion in 2024-2025. By May 2026 the field consolidated to roughly nine tools that own the daily-developer conversation. The May 2026 wave (Grok Build, Codex CLI v0.133, Cursor 2.5, Claude Code surface-agnostic release) pushed multi-agent and local-first patterns to the foreground. C++ adds three constraints that shift the rankings versus the generic “best agent” lists:

  1. Templates and metaprogramming need long context. A reflection-heavy refactor easily spans 5-10 headers; agents with smaller windows or aggressive context truncation produce subtly wrong specialisations.
  2. ABI / undefined behaviour means supervised only. Async-agent flows (“submit and review the PR”) are a poor fit. Default to pair-programming or supervised agent mode.
  3. Compliance. Defence, embedded, finance, and proprietary engines often forbid cloud egress. Local-only stacks (Continue.dev or Aider + a local model) are not a fallback — they are the requirement.

Each agent below shows licence, cost, MCP support, and our verdict for C++ work specifically. Premium open-source picks are flagged below the table.

The agents

Grok Build

proprietary (xAI)
Cost (self)
n/a
Cost (cloud)
$99/mo intro, $300/mo regular (SuperGrok Heavy)
MCP
yes (reuses Claude Code MCP + AGENTS.md conventions)
Open weights
no (grok-code-fast-1, 256K context)

Launched May 25, 2026. 8 parallel sub-agents, local-first (no source upload to servers). 70.8% SWE-Bench. Deliberately compatible with Claude Code conventions (AGENTS.md, MCP, hooks). Too new to evaluate on hard C++ template work.

  • Best when: you want multi-agent parallelism with local-first guarantees and already use Claude Code conventions.
  • Skip if: you need proven C++ template / reflection reliability (too new, limited track record).

Claude Code

proprietary (Anthropic)
Cost (self)
free CLI; usage billed via Claude.ai or API
Cost (cloud)
$20-200/mo Claude.ai or pay-per-token API
MCP
yes
Open weights
no

Best for hard multi-file C++ refactors, template debugging, ABI work, and undefined-behaviour hunts. Now surface-agnostic (CLI, VS Code, JetBrains, web, mobile). Fast mode on Opus 4.7 by default. Token-efficient (5.5x less than Cursor on identical tasks).

  • Best when: team works in the terminal or VS Code/JetBrains, MCP servers wired, supervised review is the norm.
  • Skip if: you need a fully-local stack.

Cursor

proprietary
Cost (self)
n/a
Cost (cloud)
$20 Pro / $60 Pro+ / $200 Ultra (heavy agent users typically need Pro+)
MCP
yes
Open weights
no

Best for inline pair-programming on routine C++ edits. v2.5 (May 2026): 79.8% SWE-Bench Multilingual, cloud-agent multi-task mode. Multi-provider routing (Claude / GPT / Gemini). Watch the credit pool on agent mode.

  • Best when: IDE-first workflow, solo or small team.
  • Skip if: compliance forbids cloud egress or you need terminal automation.

Aider

Apache-2.0
Cost (self)
free + token cost (BYOK or local)
MCP
no
Open weights
works with any (incl. local Ollama / vLLM)

Premium OSS pick. Git-native (auto-commits per change), terminal-only, fully transparent. Pair with local Qwen3-Coder-Next on Ollama for a fully-private C++ workflow. No MCP yet -- bring your own scripts.

  • Best when: you want every edit auto-committed, transparent context, and a local-only loop.
  • Skip if: you need MCP-driven tool use or autonomous task execution.

Continue.dev

Apache-2.0
Cost (self)
free
Cost (cloud)
free (or BYO model API)
MCP
yes
Open weights
works with any (Ollama / llama.cpp / vLLM endpoints)

Premium OSS pick. The compliance-grade local stack: VS Code / JetBrains plugin pointed at a local model. Continue + Qwen3-Coder-Next on a 24 GB GPU runs FIM autocomplete with zero data egress.

  • Best when: compliance-locked C++ shop, defence / embedded / proprietary engine work.
  • Skip if: you have no local hardware and prefer hosted Tab completion.

Cline (formerly Claude Dev)

Apache-2.0
Cost (self)
free + token cost
Cost (cloud)
n/a (BYOK)
MCP
yes
Open weights
works with any (model-agnostic)

VS Code-native autonomous agent. Lightweight alternative to Cursor's agent mode, with full transparency. C++ users like it for header / source dual-edit flows.

  • Best when: VS Code first, BYOK, want to see every diff before approving.
  • Skip if: you prefer a terminal flow.

OpenAI Codex CLI

MIT
Cost (self)
free CLI; OpenAI API costs
MCP
yes
Open weights
no (OpenAI models only)

Open-source CLI from OpenAI. v0.133 (May 2026): 22+ releases in 2 weeks, persisted /goal workflows, plugin marketplace, remote daemon, codex doctor diagnostics. Strong on routine refactors; weaker than Claude Code on long-context template metaprogramming.

  • Best when: you already pay for OpenAI API and want a free terminal harness with rapid iteration.
  • Skip if: you need top-tier reasoning for hard C++ template / lifetime issues.

Windsurf

proprietary
Cost (self)
n/a
Cost (cloud)
$15/mo Pro and up
MCP
yes
Open weights
no

Multi-agent IDE (5 parallel agents on different parts of the codebase). Useful for monorepo work where modules can be edited in parallel; less interesting for single-package libraries.

  • Best when: large monorepo, parallel agents per module.
  • Skip if: small library, single TU edits.

Devin

proprietary
Cost (self)
n/a
Cost (cloud)
$500/mo team and up
MCP
yes
Open weights
no

Async task delegation -- you submit, it runs end-to-end, you review the PR. C++ tolerance varies by task; do NOT use unsupervised on ABI-sensitive code.

  • Best when: well-scoped tasks with strong eval and rollback discipline.
  • Skip if: any safety-critical or undefined-behaviour-adjacent work.

Premium-OSS picks

Two tools earn the Premium-OSS flag for C++ work: Aider and Continue.dev. Both are Apache-2.0 licensed, both work with any model (including a fully-local Ollama / vLLM endpoint), and both put you in full control of your data. Pair either with Qwen3-Coder-Next on a 24+ GB GPU (or a Jetson AGX Thor, our reference hardware) and you have a complete loop with zero data egress. See the integration recipe at Local LLM for C++.

If your shop allows cloud agents but you still want OSS for the harness: Cline and OpenAI Codex CLI both work BYOK against any model API.

AGENTS.md template for C++ projects

The AGENTS.md convention (a sibling of CLAUDE.md / cursor.rules) tells coding agents how to operate inside your repository: build commands, conventions, things to avoid. Drop this at your repo root and tweak.

# AGENTS.md

## Build
- Generate: `cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -GNinja`
- Build:    `cmake --build build -j`
- Test:     `ctest --test-dir build --output-on-failure`
- Sanitize: rebuild with `-DCMAKE_CXX_FLAGS="-fsanitize=address,undefined"`.

## Standard / features
- C++26 (`-std=c++26`).
- Reflection enabled: clang-p2996 uses `-freflection-latest -stdlib=libc++`;
  GCC 16.1 uses `-freflection`.
- Modules NOT enabled. Headers + sources only.

## Conventions
- RAII for every owned resource; smart pointers preferred over raw owning.
- Headers in `include/<project>/`; sources in `src/`. One TU per public header.
- No exceptions in hot paths (use `std::expected`).
- All public APIs documented with Doxygen-style `///`.

## MCP servers (if your agent supports MCP)
- clangd: semantic queries, find-references, refactor.
- cmake-tools: build / configure / target-graph.
- compile-commands-reader: per-TU flags.

## Forbidden
- Do not modify `third_party/` (vendored).
- Do not edit `compile_commands.json` directly; regenerate via cmake.
- Do not commit `build/`, `*.o`, `*.so`, `compile_commands.json`.

## Review
- All PRs require sanitizer-clean CI before merge.
- ABI-affecting changes require explicit reviewer sign-off (header layout,
  virtual table layout, exported symbol set).

MCP servers worth wiring

The Model Context Protocol is now ubiquitous (over 10,000 public MCP servers as of Q1 2026). For C++ work the high-leverage ones are:

  • clangd (semantic engine) — exposes find-references, go-to-definition, rename, semantic-token queries. Wire your agent to clangd and it stops guessing about template instantiations.
  • cmake-tools-mcp — agents can ask “what are the compile flags for src/foo.cpp under target bar?” and get the right answer instead of inventing flags.
  • compile_commands.json reader — the cheapest possible MCP. Tiny, but turns “I think you should add -fno-rtti” into “this TU is built with -fno-rtti -fno-exceptions, so RTTI-using code will not link”.
  • sanitizer-output parser — structured access to ASan / UBSan / TSan reports so the agent can diff a known-good run against a regression.

Multi-agent patterns for C++ monorepos

Every major tool shipped multi-agent in February 2026 (Grok Build, Windsurf, Claude Code Agent Teams, Codex CLI, Devin). For a C++ monorepo, the pattern that actually works is agent-per-target: each agent owns one CMake target and never edits headers exposed by another target without a checkpoint. Headers in include/ shared between targets stay single-writer (one agent at a time, gated by a small lock file or a code-owner rule).

If you do not have explicit target ownership, multi-agent runs into write-write conflicts on shared headers within the first hour. Either invest in the ownership doc or stick to single-agent flows for that repo.

Decision tree

If you only read one section of this page, read this:

  1. Compliance forbids cloud egress? → Continue.dev or Aider + local model (see Local LLM for C++).
  2. Daily inline editing in an IDE? → Cursor (premium) or Continue.dev (premium-OSS).
  3. Hard multi-file refactor, template debug, ABI work? → Claude Code (terminal, deep reasoning).
  4. Want every change auto-committed and transparent? → Aider (premium-OSS).
  5. VS Code agent, BYOK, watch every diff? → Cline (premium-OSS).
  6. Large monorepo, parallel modules? → Windsurf or Grok Build (multi-agent).
  7. Multi-agent with local-first guarantee (no source upload)? → Grok Build (new May 2026; too early for hard C++ verdict).
  8. Async task delegation with strong eval? → Devin — and only with explicit rollback discipline.