Cursor Composer 2.5 vs Anthropic & OpenAI
Performance benchmarks and cost comparison — May 2026
Bottom line: Composer 2.5 matches frontier models on several coding benchmarks at roughly 1/10th the per-token cost (Standard tier) or ~1/10th the per-task cost (Fast vs max-effort frontier configs). Opus 4.7 and GPT-5.5 still lead on the Artificial Analysis Coding Agent Index at max reasoning; GPT-5.5 dominates terminal workflows. Composer 2.5 is available only inside Cursor.
Performance Benchmarks
| Benchmark | Composer 2.5 | Claude Opus 4.7 | GPT-5.5 | Notes |
|---|---|---|---|---|
| SWE-Bench Multilingual | 79.8% | 80.5% | 77.8% | Real GitHub issues; Opus leads by ~0.7 pts |
| Terminal-Bench 2.0 | 69.3% | 69.4% | 82.7% | Shell/terminal tasks; GPT-5.5 leads by ~13 pts |
| CursorBench v3.1 | 63.2% | 64.8% max / 61.6% default | 64.3% xhigh / 59.2% default | Cursor-internal; not independently reproducible |
| AA Coding Agent Index | 62 | 66 | 65 | Composite index (Artificial Analysis) |
| SWE-Bench-Pro-Hard-AA | 47% | ~comparable (max) | — | Up from 12% on Composer 2 (+35 pts) |
| Terminal-Bench v2 (AA) | 66% | — | — | Up from 64% on Composer 2 |
| SWE-Atlas-QnA (AA) | 72% | — | — | Up from 69% on Composer 2 |
| Mean time per task (AA) | 6.7 min Fast / 9.3 min Std | ~17.7 min (max) | — | Composer is faster on agent tasks |
| Availability | Cursor IDE/CLI only | API, Claude Code, Cursor | API, Codex, Cursor | Composer has no public API |
Sources: Cursor launch benchmarks (May 2026), Artificial Analysis. Cursor has cautioned that standard SWE-Bench scores can overstate ability when models retrieve known fixes from repo history.
Cost — Per-Token API Pricing per 1M tokens
| Model / Tier | Input | Output | vs Composer 2.5 Standard | vs Composer 2.5 Fast |
|---|---|---|---|---|
| Composer 2.5 Standard | $0.50 | $2.50 | — | 6× cheaper |
| Composer 2.5 Fast default | $3.00 | $15.00 | 6× more expensive | — |
| Claude Opus 4.7 | $5.00 | $25.00 | 10× input, 10× output | ~1.7× input/output |
| GPT-5.5 | $5.00 | $30.00 | 10× input, 12× output | ~1.7× input, 2× output |
Cost — Estimated Per Agent Task Artificial Analysis
| Model / Configuration | Cost per Task | Index Score | Cost Efficiency |
|---|---|---|---|
| Composer 2.5 Standard | $0.07 | 62 | Best cost/score ratio |
| Composer 2.5 Fast | $0.44 | 62 | ~30% faster than Standard |
| Claude Opus 4.7 max (Claude Code) | $4.10 | 66 | ~10× (Fast) to ~60× (Std) more |
| GPT-5.5 xhigh (Codex) | $4.82 | 65 | ~11× (Fast) to ~69× (Std) more |
Cost Multiples — Quick Reference
| Comparison | Input Tokens | Output Tokens | Per-Task (AA est.) |
|---|---|---|---|
| Composer 2.5 Standard vs Opus 4.7 | 10× cheaper | 10× cheaper | ~60× cheaper |
| Composer 2.5 Standard vs GPT-5.5 | 10× cheaper | 12× cheaper | ~69× cheaper |
| Composer 2.5 Fast vs Opus 4.7 | ~1.7× cheaper | ~1.7× cheaper | ~10× cheaper |
| Composer 2.5 Fast vs GPT-5.5 | ~1.7× cheaper | 2× cheaper | ~11× cheaper |
Leadership Presentation
Evolving Field Engineering as Cursor goes upmarket









