Kimi K2.5 banner featuring an abstract background and large layered text reading 'Kimi K2.5'

Kimi K2.5 Model Benchmarks and Info

BestCodes
benchmarksaiollamaopensource
0

This article is a TL;DR of the Kimi K2.5 model benchmarks. For a more in-depth analysis of the model, check out Kimi's official blog post or the model on HuggingFace (respectively):

Model Architecture & Specs

Kimi K2.5 is a native multimodal agentic model built on a Mixture-of-Experts (MoE) architecture. It features a massive 1T parameter count with 32B active parameters per token, designed for high efficiency and reasoning depth.

FeatureSpecification
ArchitectureMixture-of-Experts (MoE)
Parameters1T Total / 32B Activated
Context Window256K Tokens
Vision EncoderMoonViT (400M Params)
Layers61 (1 Dense)
Experts384 Total (8 selected per token)
AttentionMLA with 7168 Hidden Dimension

Benchmark Highlights

Kimi K2.5 (Thinking Mode) was evaluated against top-tier models like GPT-5.2, Claude 4.5 Opus, and Gemini 3 Pro.

Bar charts comparing Kimi K2.5, GPT, Claude, Gemini AI models.

Reasoning & Knowledge

BenchmarkKimi K2.5GPT-5.2 (xhigh)Claude 4.5 OpusGemini 3 Pro
AIME 202596.110092.895.0
HMMT 2025 (Feb)95.499.492.997.3
GPQA-Diamond87.692.487.091.9
HLE-Full (w/ tools)50.245.543.245.8
BenchmarkKimi K2.5GPT-5.2Claude 4.5 OpusGemini 3 Pro
SWE-Bench Verified76.880.080.976.2
LiveCodeBench (v6)85.0-82.287.4
BrowseComp (Swarm)78.4---
DeepSearchQA77.171.376.163.2

Vision & Video

BenchmarkKimi K2.5GPT-5.2Claude 4.5 OpusGemini 3 Pro
MathVista (mini)90.182.880.289.8
VideoMME87.486.0-88.4
OCRBench92.380.786.590.3

Self-hosting

For self-hosting, Kimi K2.5 utilizes native INT4 quantization. It is compatible with the following inference engines (requires transformers >= 4.57.1):

  • vLLM
  • SGLang
  • KTransformers

For full deployment guides and the "Kimi Vendor Verifier," refer to the official repository.

Leave comment