LLM Performance Leaderboard
Interactive comparison of large language models across multiple benchmarks
xAIOpenAIGoogleDeepSeekAlibabaAnthropicZhipuStepFun01.AINexusflowTencentMetaRWKVTsinghuaOpenAssistantLMSYSDatabricksStability AIHuggingFaceMicrosoftUWTogether AIMistralAllen AIUC BerkeleyNomic AIMosaicMLStanfordNousResearchIBMCognitive ComputationsNvidiaUpstage AITII01 AIOpenChatSnowflakeDeepSeek AIAllenAI/UWAi2CohereReka AIInternLMZhipu AIAmazonAI21 LabsNexusFlowPrinceton
Filter by BenchmarkClick a benchmark to sort models
Rank | Model Name | Organization | License | arenaElo Score | Votes |
---|---|---|---|---|---|
1 | Grok-3-Preview-02-24 | xAI | Proprietary | 1412 | 3,364 |
2 | GPT-4.5-Preview | OpenAI | Proprietary | 1411 | 3,242 |
3 | Gemini-2.0-Flash-Thinking-Exp-01-21 | Proprietary | 1384 | 17,487 | |
4 | Gemini-2.0-Pro-Exp-02-05 | Proprietary | 1380 | 15,466 | |
5 | ChatGPT-4o-latest (2025-01-29) | OpenAI | Proprietary | 1377 | 17,221 |
6 | DeepSeek-R1 | DeepSeek | MIT | 1363 | 8,580 |
7 | Gemini-2.0-Flash-001 | Proprietary | 1357 | 13,257 | |
8 | o1-2024-12-17 | OpenAI | Proprietary | 1352 | 19,785 |
9 | Qwen2.5-Max | Alibaba | Proprietary | 1336 | 11,930 |
10 | o3-mini-high | OpenAI | Proprietary | 1329 | 9,102 |
11 | o3-mini-high | OpenAI | 1325 | 17,988 | |
12 | DeepSeek-V3 | DeepSeek | DeepSeek | 1318 | 22,007 |
13 | DeepSeek-V3 | DeepSeek | 1318 | 22,846 | |
14 | QwQ-32B | Alibaba | 1316 | 7,735 | |
15 | GLM-4-Plus-0111 | Zhipu | Proprietary | 1311 | 6,035 |
16 | Gemini-2.0-Flash-Lite | 1311 | 2,212 | ||
17 | Qwen-Plus-0125 | Alibaba | Proprietary | 1310 | 6,054 |
18 | GLM-4-Plus-0111 | Zhipu | 1310 | 6,026 | |
19 | Claude 3.7 Sonnet | Anthropic | Proprietary | 1309 | 4,254 |
20 | Gemini-2.0-Flash-Lite-Preview-02-05 | Proprietary | 1308 | 12,774 | |
21 | Step-2-16K-Exp | StepFun | Proprietary | 1305 | 5,132 |
22 | Qwen-Plus-0125 | Alibaba | 1305 | 6,055 | |
23 | o3-mini | OpenAI | 1305 | 24,877 | |
24 | o1-mini | OpenAI | Proprietary | 1304 | 54,923 |
25 | o3-mini | OpenAI | Proprietary | 1304 | 15,463 |
26 | Step-2-16K-Exp | StepFun | 1304 | 5,128 | |
27 | o1-mini | OpenAI | 1304 | 54,960 | |
28 | Command A. | Cohere | 1303 | 7,547 | |
29 | Gemini-1.5-Pro-002 | Proprietary | 1302 | 57,551 | |
30 | Hunyuan-TurboS | Tencent | 1302 | 2,452 | |
31 | Gemini-1.5-Pro-0902 | 1302 | 58,660 | ||
32 | Hunyuan-Turbo-0110 | Tencent | 1302 | 2,511 | |
33 | llama-3.3-Nemotron-Super-49B-v1 | Nvidia | 1296 | 1,236 | |
34 | Grok-2-08-13 | xAI | Proprietary | 1288 | 67,038 |
35 | Grok-2-08-13 | xAI | 1288 | 67,102 | |
36 | Yi-Lightning | 01.AI | Proprietary | 1287 | 28,946 |
37 | Yi-Lightning | 01 AI | 1287 | 28,972 | |
38 | GPT-4o (2024-05-13) | OpenAI | 1285 | 117,769 | |
39 | Claude 3.5 Sonnet (20241022) | Anthropic | Proprietary | 1284 | 59,139 |
40 | Claude 3.5 Sonnet | Anthropic | 1283 | 64,670 | |
41 | Qwen2.5-plus-1127 | Alibaba | 1282 | 10,723 | |
42 | Deepseek-v2.5-1210 | DeepSeek | DeepSeek | 1279 | 7,247 |
43 | Deepseek-v2.5-1210 | DeepSeek | 1279 | 7,246 | |
44 | Athene-v2-Chat-72B | Nexusflow | Athene V2 | 1275 | 26,092 |
45 | Athene-v2-Chat-72B | NexusFlow | 1275 | 26,093 | |
46 | GPT-4o-mini-2024-07-18 | OpenAI | Proprietary | 1272 | 66,710 |
47 | Hunyuan-Large-2025-02-10 | Tencent | 1272 | 3,859 | |
48 | GPT-4o-mini-2024-07-18 | OpenAI | 1272 | 71,388 | |
49 | Hunyuan-Large-2025-02-10 | Tencent | Proprietary | 1271 | 3,860 |
50 | Gemini-1.5-Flash-002 | Proprietary | 1271 | 36,979 | |
51 | Gemini-1.5-Flash-0902 | 1271 | 37,025 | ||
52 | llama-4-Maverick-17B-128F-Instruct | Meta | 1271 | 4,917 | |
53 | Llama-3.1-405B-Instruct-bf16 | Meta | Llama 3.1 | 1269 | 34,228 |
54 | Llama-3.1-Nemotron-70B-Instruct | Nvidia | 1269 | 7,580 | |
55 | Meta-llama-3.1-405B-Instruct-bf16 | Meta | 1269 | 43,795 | |
56 | Claude 3.5 Sonnet (20240620) | Anthropic | 1268 | 86,162 | |
57 | Meta-llama-3.1-405B-Instruct-fp8 | Meta | 1267 | 63,055 | |
58 | Gemini Advanced App. (2024-05-14) | 1267 | 52,143 | ||
59 | Grok-2-Mini-08-13 | xAI | 1266 | 55,452 | |
60 | GPT-4o-2024-08-06 | OpenAI | 1265 | 47,982 | |
61 | Qwen-Max-0919 | Alibaba | 1263 | 17,440 | |
62 | Gemini-1.5-Pro-0901 | 1260 | 82,436 | ||
63 | Hunyuan-Standard-2025-02-10 | Tencent | 1260 | 4,017 | |
64 | Deepseek-v2.5 | DeepSeek | 1258 | 26,344 | |
65 | Qwen2.5-72B-Instruct | Alibaba | 1257 | 41,532 | |
66 | Llama-3.3-70B-Instruct | Meta | 1257 | 38,101 | |
67 | GPT-4-Turbo-2024-04-09 | OpenAI | 1256 | 102,158 | |
68 | Mistral-Large-2407 | Mistral | 1251 | 48,212 | |
69 | Athene-70B | NexusFlow | 1250 | 20,584 | |
70 | GPT-4-1106-preview | OpenAI | 1250 | 103,746 | |
71 | Mistral-Large-2411 | Mistral | 1249 | 29,895 | |
72 | Meta-llama-3.1-70B-Instruct | Meta | 1248 | 58,654 | |
73 | Claude 3 Opus | Anthropic | 1247 | 202,697 | |
74 | GLM-4-Plus | Zhipu AI | 1246 | 27,784 | |
75 | Amazon-Nova-Pro-1.0 | Amazon | 1245 | 24,285 | |
76 | GPT-4-0125-preview | OpenAI | 1245 | 97,076 | |
77 | Llama-3.1-Tulu-3-70B | Ai2 | 1244 | 3,014 | |
78 | Claude 3.5 Haiku (20241022) | Anthropic | 1237 | 33,322 | |
79 | Reka-Core-20240904 | Reka AI | 1235 | 7,938 | |
80 | Gemini-1.5-Flash-0901 | 1227 | 65,565 | ||
81 | Jamba-1.5-Large | AI21 Labs | 1222 | 65,665 | |
82 | Gemma-2-27B-it | 1220 | 79,536 | ||
83 | Mistral-Small-24B-Instruct-2501 | Mistral | 1217 | 14,573 | |
84 | Amazon-Nova-Lite-1.0 | Amazon | 1217 | 20,648 | |
85 | Qwen2.5-Coder-32B-Instruct | Alibaba | 1217 | 5,729 | |
86 | Command R+ (08-2024) | Cohere | 1215 | 20,612 | |
87 | Gemini-1.5-Flash-88-091 | 1212 | 37,686 | ||
88 | Llama-3.1-Nemotron-518B-Instruct | Nvidia | 1211 | 3,887 | |
89 | Aya-Expanse-32B | Cohere | 1209 | 28,749 | |
90 | Gemma-2-9B-it-SimPO | Princeton | 1209 | 10,548 | |
91 | GLM-4-0529 | Zhipu AI | 1207 | 10,220 | |
92 | Llama-3-70B-Instruct | Meta | 1206 | 163,632 | |
93 | Reka-Flash-20240904 | Reka AI | 1205 | 8,138 | |
94 | Phi-4 | Microsoft | 1205 | 25,224 | |
95 | Claude 3 Sonnet | Anthropic | 1201 | 113,061 | |
96 | Nemotron-4-340B-Instruct | Nvidia | 1199 | 10,540 | |
97 | Amazon-Nova-Micro-1.0 | Amazon | 1198 | 20,663 | |
98 | Zephyr-ORPO-141b-A35b-v0.1 | HuggingFace | 1197 | 4,857 | |
99 | Gemma-2-9B-it | 1192 | 57,211 | ||
100 | Command R+ (04-2024) | Cohere | 1190 | 80,859 | |
101 | Hunyuan-Standard-256K | Tencent | 1189 | 2,901 | |
102 | Qwen2-72B-Instruct | Alibaba | 1187 | 38,877 | |
103 | GPT-4-0314 | OpenAI | 1186 | 55,978 | |
104 | Llama-3.1-Tulu-3-8B | Ai2 | 1185 | 3,076 | |
105 | Ministral-8B-2410 | Mistral | 1182 | 5,113 | |
106 | Aya-Expanse-8B | Cohere | 1180 | 10,391 | |
107 | Command R (08-2024) | Cohere | 1180 | 10,848 | |
108 | Claude 3 Haiku | Anthropic | 1179 | 122,304 | |
109 | DeepSeek-Coder-V2-Instruct | DeepSeek AI | 1178 | 15,754 | |
110 | Meta-llama-3.1-8B-Instruct | Meta | 1176 | 52,597 | |
111 | Jamba-1.5-Mini | AI21 Labs | 1176 | 9,272 | |
112 | GPT-4-0613 | OpenAI | 1163 | 91,640 | |
113 | Qwen1.5-110B-Chat | Alibaba | 1161 | 27,431 | |
114 | Yi-1.5-34B-Chat | 01 AI | 1157 | 25,137 | |
115 | Mistral-Large-2402 | Mistral | 1157 | 64,928 | |
116 | Reka-Flash-21B-online | Reka AI | 1156 | 16,038 | |
117 | QwQ-32B-Preview | Alibaba | 1153 | 3,411 | |
118 | Llama-3-8B-Instruct | Meta | 1152 | 109,106 | |
119 | Command R+ (04-2024) | Cohere | 1149 | 56,391 | |
120 | InternLM2.5-20B-chat | InternLM | 1149 | 10,597 | |
121 | Mixtral-8x22b-Instruct-v0.1 | Mistral | 1148 | 53,756 | |
122 | Mistral Medium | Mistral | 1148 | 35,561 | |
123 | Qwen1.5-72B-Chat | Alibaba | 1147 | 40,677 | |
124 | Reka-Flash-21B | Reka AI | 1147 | 25,809 | |
125 | Gemma-2-2b-it | 1144 | 48,905 | ||
126 | Granite-3.1-8B-Instruct | IBM | 1143 | 3,294 | |
127 | Gemini-1.0-Pro-0901 | 1131 | 18,803 | ||
128 | Qwen1.5-32B-Chat | Alibaba | 1125 | 22,759 | |
129 | Phi-3-Medium-4k-Instruct | Microsoft | 1123 | 26,109 | |
130 | Granite-3.1-2B-Instruct | IBM | 1120 | 3,383 | |
131 | Starling-LM-7B-beta | Nexusflow | 1119 | 16,672 | |
132 | Mixtral-8x7B-Instruct-v0.1 | Mistral | 1114 | 76,135 | |
133 | Yi-34B-Chat | 01 AI | 1111 | 15,914 | |
134 | Gemini Pro | 1111 | 6,556 | ||
135 | Qwen1.5-14B-Chat | Alibaba | 1109 | 18,680 | |
136 | WizardLM-70B-v1.0 | Microsoft | 1106 | 8,384 | |
137 | GPT-3.5-Turbo-0125 | OpenAI | 1106 | 68,878 | |
138 | DBRX-Instruct-Preview | Databricks | 1103 | 33,734 | |
139 | Meta-llama-3.2-3B-Instruct | Meta | 1103 | 8,399 | |
140 | Phi-3-Small-8k-Instruct | Microsoft | 1102 | 18,473 | |
141 | Tulu-2-DPO-70B | AllenAI/UW | 1098 | 6,662 | |
142 | Llama-2-70B-chat | Meta | 1093 | 39,599 | |
143 | Granite-3.0-8B-Instruct | IBM | 1093 | 7,002 | |
144 | OpenChat-3.5-0106 | OpenChat | 1091 | 12,987 | |
145 | Vicuna-33B | LMSYS | 1091 | 22,948 | |
146 | Snowflake Arctic Instruct | Snowflake | 1090 | 34,177 | |
147 | Starling-LM-7B-alpha | UC Berkeley | 1088 | 10,417 | |
148 | Granite-3.0-2B-Instruct | IBM | 1084 | 7,184 | |
149 | Gemma-1.4-7B-it | 1084 | 25,066 | ||
150 | NV-Llama2-70B-SteerLM-Chat | Nvidia | 1081 | 3,636 | |
151 | DeepSeek-LLM-67B-Chat | DeepSeek AI | 1077 | 4,988 | |
152 | OpenChat-3.5 | OpenChat | 1076 | 8,106 | |
153 | OpenHermes-2.5-Mistral-7B | NousResearch | 1074 | 5,089 | |
154 | Mistral-7B-Instruct-v0.2 | Mistral | 1072 | 20,065 | |
155 | Qwen1.5-7B-Chat | Alibaba | 1070 | 4,871 | |
156 | Phi-3-Mini-4K-Instruct-June-24 | Microsoft | 1070 | 12,803 | |
157 | GPT-3.5-Turbo-1106 | OpenAI | 1067 | 17,035 | |
158 | Phi-3-Mini-4K-Instruct | Microsoft | 1066 | 21,098 | |
159 | Dolphin-2.2.1-Mistral-7B | Cognitive Computations | 1063 | 1,713 | |
160 | Llama-2-13b-chat | Meta | 1063 | 19,721 | |
161 | SOLAR-10.7B-Instruct-v1.0 | Upstage AI | 1062 | 4,288 | |
162 | Nous-Hermes-2-Mixtral-8x7B-DPO | NousResearch | 1061 | 3,839 | |
163 | WizardLM-13b-v1.2 | Microsoft | 1059 | 7,175 | |
164 | Meta-llama-3.2-13B-Instruct | Meta | 1054 | 8,518 | |
165 | Vicuna-13B | LMSYS | 1054 | 19,776 | |
166 | Zephyr-7B-beta | HuggingFace | 1053 | 11,318 | |
167 | SmoLLM2-1.7B-Instruct | HuggingFace | 1046 | 2,376 | |
168 | MPT-30B-chat | MosaicML | 1045 | 2,645 | |
169 | falcon-180b-chat | TII | 1044 | 1,328 | |
170 | CodeLlama-34B-instruct | Meta | 1043 | 7,512 | |
171 | CodeLlama-70B-instruct | Meta | 1042 | 1,190 | |
172 | Zephyr-7B-alpha | HuggingFace | 1040 | 1,811 | |
173 | Gemma-7B-it | 1037 | 9,176 | ||
174 | Phi-3-Mini-128k-Instruct | Microsoft | 1037 | 21,628 | |
175 | Llama-2-7B-chat | Meta | 1037 | 14,535 | |
176 | Qwen-14B-Chat | Alibaba | 1035 | 5,068 | |
177 | Guanaco-33B | UW | 1033 | 2,998 | |
178 | Gemma-1.1-2b-it | 1021 | 11,354 | ||
179 | StripedHyena-Nous-7B | Together AI | 1017 | 5,278 | |
180 | OLMo-7B-instruct | Allen AI | 1015 | 6,499 | |
181 | Mistral-7B-Instruct-v0.1 | Mistral | 1008 | 9,145 | |
182 | Vicuna-7B | LMSYS | 1005 | 7,014 | |
183 | PaLM-Chat-Bison-001 | 1003 | 8,712 | ||
184 | Gemma-2B-it | 989 | 4,922 | ||
185 | Qwen1.5-4B-Chat | Alibaba | 988 | 7,819 | |
186 | Koala-13B | UC Berkeley | 964 | 7,021 | |
187 | ChatGLM3-6B | Tsinghua | 955 | 4,763 | |
188 | GPT4All-13B-Snoozy | Nomic AI | 932 | 1,788 | |
189 | MPT-7B-Chat | MosaicML | 928 | 3,997 | |
190 | ChatGLM2-6B | Tsinghua | 924 | 2,711 | |
191 | RWKV-4-Raven-14B | RWKV | 922 | 4,920 | |
192 | Alpaca-13B | Stanford | 901 | 5,865 | |
193 | OpenAssistant-Pythia-12B | OpenAssistant | 893 | 6,368 | |
194 | ChatGLM-6B | Tsinghua | 879 | 4,987 | |
195 | FastChat-T5-3B | LMSYS | 868 | 4,290 | |
196 | StableLM-Tuned-Alpha-7B | Stability AI | 840 | 3,339 | |
197 | Dolly-V2-12B | Databricks | 822 | 3,481 | |
198 | LLaMA-13B | Meta | 799 | 2,445 |
🚀 Real-time updates | 🔍 Interactive visualizations | 📊 Data-driven insights
Data aggregated from multiple benchmark sources • Last updated: March 2025