UIGEN Benchmark - Latest Results
Date Tested: 05/17/2025 | Prompt Set: fullpage_challenge
Leaderboard: 05/17/2025
Rank | Model Name | Overall (%) | Avg. Prompt (%) | TQ (Earned/Max) | ADH (Earned/Max) | Details Page |
---|---|---|---|---|---|---|
1 | Gemini-2.5-Flash | 79.57 | 78.93 | 1448.2 / 1900.0 | 197.0 / 245.0 | View Model |
2 | Groq-DeepSeek-70b | 76.67 | 75.73 | 1513.7 / 1900.0 | 186.0 / 245.0 | View Model |
3 | Groq-Llama3.3-70b | 76.30 | 75.47 | 1416.5 / 1900.0 | 188.0 / 245.0 | View Model |
4 | Groq-Llama4-Scout | 75.92 | 74.98 | 1349.1 / 1900.0 | 189.0 / 245.0 | View Model |
5 | Groq-Gemma2-9b | 67.32 | 66.45 | 1308.5 / 1900.0 | 164.0 / 245.0 | View Model |