
Overall Winner
Sora-2-Pro leads on Total Score: 74.63% vs 73.20% (+1.43pp).
Best for Control, Creativity & Multi-View
Kling-v2-5-Turbo performs better for controllability and creative direction, and is much stronger on multi-view stability: Controllability +7.30pp, Creativity +6.66pp, Multi-View Consistency +36.43pp.
Best for Human Realism & Identity Consistency
Sora-2-Pro performs better for realistic humans and keeping the same character consistent across shots: Human Fidelity +16.30pp, Human Identity +33.91pp, Human Anatomy +14.97pp.
Based on the latest VBench-IBench results, summarized by overall score and core dimensions.
| Metric | Sora-2-Pro | Kling-v2-5-Turbo | Winner |
|---|---|---|---|
| Total Score | 74.63% | 73.20% | Sora (+1.43pp) |
| Creativity | 77.41% | 84.07% | Kling (+6.66pp) |
| Commonsense | 88.89% | 83.33% | Sora (+5.56pp) |
| Controllability | 58.41% | 65.71% | Kling (+7.30pp) |
| Human Fidelity | 87.87% | 71.57% | Sora (+16.30pp) |
| Physics | 60.56% | 61.33% | Kling (+0.77pp) |
The biggest score gaps, broken down by fine-grained metrics — so you can see where the difference comes from.
| Fine-grained Metric | Sora-2-Pro | Kling-v2-5-Turbo | Δ (pp) | What it means |
|---|---|---|---|---|
| Multi-View Consistency | 20.00% | 56.43% | +36.43 | Consistency across multiple angles / camera views |
| Human Identity | 74.51% | 40.60% | +33.91 (Sora) | Whether the same person looks consistent |
| Material | 77.78% | 44.44% | +33.34 (Sora) | Realism of materials (fabric / metal / glass) |
| Dynamic Attribute | 55.56% | 88.89% | +33.33 | Changes in motion attributes (pose / expression) |
| Complex Plot | 68.89% | 37.78% | +31.11 (Sora) | Narrative coherence in complex scenes |
| Motion Order Understanding | 77.78% | 100.00% | +22.22 | Following step-by-step motion order |
* Δ(pp) is the percentage-point difference. The label (Sora/Kling) indicates the leading model.
A quick interpretation of the benchmark — what each model is better suited for in real projects.
Pick your goal — we recommend the best model based on the benchmark strengths.
Stronger results for human realism, identity consistency, and anatomy stability — ideal for close-ups and recurring characters.
Higher creativity score, better suited for bold art direction and stylized, attention-grabbing shots.
Better controllability and stronger action-order understanding — great when you need precise instruction-following.
Stronger commonsense and complex plot handling — better for coherent storytelling across multiple shots.
Large lead in multi-view consistency — best for switching camera angles while keeping the subject consistent.
Higher motion rationality — more stable physical behavior with fewer unnatural artifacts.
Quick answers to the most common questions about this comparison and the benchmark setup.
This page summarizes model strengths using the VBench benchmark framework and its public leaderboard, with additional context from the Ima Studio Arena review page.
You can review the official framework and the public leaderboard here: