Latest Benchmark Benchmark Archive Tesslate AI

Community

Hugging Face Discord

TFrameX

Docs GitHub

UIGEN Benchmark - Latest Results

Date Tested: 05/17/2025 | Prompt Set: fullpage_challenge

Leaderboard: 05/17/2025

Rank Model Name Overall (%) Avg. Prompt (%) TQ (Earned/Max) ADH (Earned/Max) Details Page
1 Gemini-2.5-Flash 79.57 78.93 1448.2 / 1900.0 197.0 / 245.0 View Model
2 Groq-DeepSeek-70b 76.67 75.73 1513.7 / 1900.0 186.0 / 245.0 View Model
3 Groq-Llama3.3-70b 76.30 75.47 1416.5 / 1900.0 188.0 / 245.0 View Model
4 Groq-Llama4-Scout 75.92 74.98 1349.1 / 1900.0 189.0 / 245.0 View Model
5 Groq-Gemma2-9b 67.32 66.45 1308.5 / 1900.0 164.0 / 245.0 View Model

Suggest a New Prompt