Latest Benchmark Benchmark Archive Tesslate AI

Community

Hugging Face Discord

TFrameX

Docs GitHub

UIGEN Benchmark - Latest Results

Date Tested: 07/16/2025 by Joseph Ma | Prompt Set: fullpage_challenge

Leaderboard: 07/16/2025

Rank Model Name Overall (%) Avg. Prompt (%) TQ (Earned/Max) ADH (Earned/Max) Details Page
1 Gemini-2.5-Flash 78.57 78.09 697.0 / 990.0 183.0 / 223.0 View Model
2 Groq-Llama3.3-70b 72.10 71.37 674.0 / 1100.0 188.0 / 245.0 View Model
3 Groq-DeepSeek-70b 71.11 70.27 711.0 / 1100.0 181.0 / 245.0 View Model
4 Groq-Llama4-Scout 69.71 68.89 576.0 / 1100.0 189.0 / 245.0 View Model
5 Groq-Gemma2-9b 62.40 61.63 570.0 / 1100.0 164.0 / 245.0 View Model

Suggest a New Prompt