Cursor Composer 2.5 vs Anthropic & OpenAI

Performance benchmarks and cost comparison — May 2026

Bottom line: Composer 2.5 matches frontier models on several coding benchmarks at roughly 1/10th the per-token cost (Standard tier) or ~1/10th the per-task cost (Fast vs max-effort frontier configs). Opus 4.7 and GPT-5.5 still lead on the Artificial Analysis Coding Agent Index at max reasoning; GPT-5.5 dominates terminal workflows. Composer 2.5 is available only inside Cursor.

Performance Benchmarks

Benchmark Composer 2.5 Claude Opus 4.7 GPT-5.5 Notes
SWE-Bench Multilingual 79.8% 80.5% 77.8% Real GitHub issues; Opus leads by ~0.7 pts
Terminal-Bench 2.0 69.3% 69.4% 82.7% Shell/terminal tasks; GPT-5.5 leads by ~13 pts
CursorBench v3.1 63.2% 64.8% max / 61.6% default 64.3% xhigh / 59.2% default Cursor-internal; not independently reproducible
AA Coding Agent Index 62 66 65 Composite index (Artificial Analysis)
SWE-Bench-Pro-Hard-AA 47% ~comparable (max) Up from 12% on Composer 2 (+35 pts)
Terminal-Bench v2 (AA) 66% Up from 64% on Composer 2
SWE-Atlas-QnA (AA) 72% Up from 69% on Composer 2
Mean time per task (AA) 6.7 min Fast / 9.3 min Std ~17.7 min (max) Composer is faster on agent tasks
Availability Cursor IDE/CLI only API, Claude Code, Cursor API, Codex, Cursor Composer has no public API

Sources: Cursor launch benchmarks (May 2026), Artificial Analysis. Cursor has cautioned that standard SWE-Bench scores can overstate ability when models retrieve known fixes from repo history.

Cost — Per-Token API Pricing per 1M tokens

Model / Tier Input Output vs Composer 2.5 Standard vs Composer 2.5 Fast
Composer 2.5 Standard $0.50 $2.50 6× cheaper
Composer 2.5 Fast default $3.00 $15.00 6× more expensive
Claude Opus 4.7 $5.00 $25.00 10× input, 10× output ~1.7× input/output
GPT-5.5 $5.00 $30.00 10× input, 12× output ~1.7× input, 2× output

Cost — Estimated Per Agent Task Artificial Analysis

Model / Configuration Cost per Task Index Score Cost Efficiency
Composer 2.5 Standard $0.07 62 Best cost/score ratio
Composer 2.5 Fast $0.44 62 ~30% faster than Standard
Claude Opus 4.7 max (Claude Code) $4.10 66 ~10× (Fast) to ~60× (Std) more
GPT-5.5 xhigh (Codex) $4.82 65 ~11× (Fast) to ~69× (Std) more

Cost Multiples — Quick Reference

Comparison Input Tokens Output Tokens Per-Task (AA est.)
Composer 2.5 Standard vs Opus 4.7 10× cheaper 10× cheaper ~60× cheaper
Composer 2.5 Standard vs GPT-5.5 10× cheaper 12× cheaper ~69× cheaper
Composer 2.5 Fast vs Opus 4.7 ~1.7× cheaper ~1.7× cheaper ~10× cheaper
Composer 2.5 Fast vs GPT-5.5 ~1.7× cheaper 2× cheaper ~11× cheaper