UIGEN Benchmark - Latest Results
Date Tested: 07/16/2025 by Joseph Ma | Prompt Set: fullpage_challenge
Leaderboard: 07/16/2025
Rank | Model Name | Overall (%) | Avg. Prompt (%) | TQ (Earned/Max) | ADH (Earned/Max) | Details Page |
---|---|---|---|---|---|---|
1 | Gemini-2.5-Flash | 78.57 | 78.09 | 697.0 / 990.0 | 183.0 / 223.0 | View Model |
2 | Groq-Llama3.3-70b | 72.10 | 71.37 | 674.0 / 1100.0 | 188.0 / 245.0 | View Model |
3 | Groq-DeepSeek-70b | 71.11 | 70.27 | 711.0 / 1100.0 | 181.0 / 245.0 | View Model |
4 | Groq-Llama4-Scout | 69.71 | 68.89 | 576.0 / 1100.0 | 189.0 / 245.0 | View Model |
5 | Groq-Gemma2-9b | 62.40 | 61.63 | 570.0 / 1100.0 | 164.0 / 245.0 | View Model |